Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIR-V to MSL conversion error: Argument buffer resource base type could not be determined #2271

Closed
patrick-han opened this issue Jul 10, 2024 · 22 comments · Fixed by #2409
Closed
Labels
Bug Completed Issue has been fixed, or enhancement implemented.

Comments

@patrick-han
Copy link

[mvk-error] SPIR-V to MSL conversion error: Argument buffer resource base type could not be determined. When padding argument buffer elements, all descriptor set resources must be supplied with a base type by the app.
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Fragment shader function could not be compiled into pipeline. See previous logged error.

validation layer: VK_ERROR_INVALID_SHADER_NV: Fragment shader function could not be compiled into pipeline. See previous logged error.

This seems to have cropped up at various points before, but unsure if it's related. I have the latest VulkanSDK as of yesterday (1.3.283.0) so in theory I have the fix for these issues:

#2216
#2016

After enabling MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS programmatically I also disabled shader validation at the advice of one of the recent Vulkanised 2024 talks.

The specific descriptor indexing features I enabled are:
descriptorBindingSampledImageUpdateAfterBind, descriptorBindingPartiallyBound, runtimeDescriptorArray with the appropriate pool/binding/layout flags:
VK_DESCRIPTOR_POOL_CREATE_UPDATE_AFTER_BIND_BIT_EXT, VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT, VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT_EXT, VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT_EXT

Although in my fragment shader I'm still just declaring my texture normally:

layout (set = 0, binding = 0) uniform texture2D texture0;
layout (set = 0, binding = 1) uniform sampler linearSampler;
@billhollings
Copy link
Contributor

billhollings commented Jul 11, 2024

It's likely that PR #2260 and followups may have fixed this.

Please build from the latest MoltenVK, and try again. Or you can wait for the next SDK, which should be released in a couple of weeks.

With this updated code, MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS is enabled by default, so you no longer need to set it explicitly.

@patrick-han
Copy link
Author

I just tried building the latest now, building the macOS library. Unfortunately I'm running into another error now with a crash + artifacts:

Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
[mvk-error] VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

validation layer: VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
Screenshot 2024-07-11 at 8 44 56 AM

@patrick-han
Copy link
Author

patrick-han commented Jul 11, 2024

It seems that reducing my descriptor counts to something very small (and therby my maxSets) avoids the crash:

    constexpr uint32_t maxBindlessResourceCount = 100; // Stops crashing if I change this to something like 25
    constexpr uint32_t maxSamplerCount = 2;

    std::array<VkDescriptorPoolSize, 2> bindlessDescriptorPoolSizes {{
        { VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, maxBindlessResourceCount},
        { VK_DESCRIPTOR_TYPE_SAMPLER, maxSamplerCount}
    }};
    VkDescriptorPoolCreateInfo poolCreateInfo = {
        .sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO,
        .pNext = nullptr,
        .flags = VK_DESCRIPTOR_POOL_CREATE_UPDATE_AFTER_BIND_BIT_EXT,
        .maxSets = maxBindlessResourceCount * static_cast<uint32_t>(bindlessDescriptorPoolSizes.size()),
        .poolSizeCount = static_cast<uint32_t>(bindlessDescriptorPoolSizes.size()),
        .pPoolSizes = bindlessDescriptorPoolSizes.data()
    };

Although this seems too small?

@billhollings
Copy link
Contributor

This might be caused by the same problem as in issue #2246, which was fixed with PR #2273.

Please retest with the latest MoltenVK, and if that fixes it, close this issue?

@patrick-han
Copy link
Author

Just pulled and built @ edbdcf0

Still getting the same crash unfortunately

[mvk-error] VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
[VULKAN] Debug.h(18): validation layer: VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

Lowering maxBindlessResourceCount to something low again like 50 no longer triggers the crash, but the textures are incorrectly applied (even in the absence of validation errors):

Screenshot 2024-07-22 at 11 35 28 AM

I've verified everything renders correctly on a Windows system (Nvidia).

@billhollings
Copy link
Contributor

Thanks for testing.

Can you test again with Metal validation enabled, and report any validation errors that are logged, or that trip an assertion, please? You can do this using the following environment variable:

METAL_DEVICE_WRAPPER_TYPE=1

To avoid the assertion, and just log all Metal validation errors, you can also add the following environment variables:

METAL_ERROR_MODE=3
METAL_DEBUG_ERROR_MODE=3

@patrick-han
Copy link
Author

I am not super familiar with XCode, but since I am building+running through it, I added the env variables to my scheme like this (Hopefully this is correct):
Screenshot 2024-07-22 at 12 42 25 PM

And now my output looks like this when it breaks on vkQueuePresentKHR. The only additional info seems to be the second entry didCompleteWithStartTime:endTime:

FAULT: <NSRemoteView: 0x13b76d780 com.apple.TextInputUI.xpc.CursorUIViewService TUICursorUIViewService> determined it was necessary to configure <TUINSWindow: 0x13d0781e0> to support remote view vibrancy
CLIENT ERROR: TUINSRemoteViewController does not override -viewServiceDidTerminateWithError: and thus cannot react to catastrophic errors beyond logging them


-[_MTLCommandBuffer didCompleteWithStartTime:endTime:error:], line 1047: error 'Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)'

Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

[mvk-error] VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

[VULKAN] Debug.h(18): validation layer: VK_ERROR_OUT_OF_DEVICE_MEMORY: MTLCommandBuffer "vkQueueSubmit MTLCommandBuffer on Queue 0-0" execution failed (code 3): Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)

Not sure if relevant but also wanted to add the ohter extensions I am using:
VK_KHR_dynamic_rendering,VK_KHR_buffer_device_address,VK_EXT_scalar_block_layout

@patrick-han
Copy link
Author

patrick-han commented Jul 23, 2024

Ah I forgot I am using VK_STRUCTURE_TYPE_DESCRIPTOR_SET_VARIABLE_DESCRIPTOR_COUNT_ALLOCATE_INFO_EXT as well, just wanted to add that. Another issue seems to be the same problem, and this seems to be the common thread: #2278

@billhollings
Copy link
Contributor

billhollings commented Sep 7, 2024

I'm wondering if there's something disconnecting within MoltenVK between your descriptor set allocations and the descriptor pool.

Can you run your app with the following env vars set, please?

MVK_CONFIG_DEBUG=1
MVK_CONFIG_LOG_LEVEL=4

You should seem some entries in the logs, similar to the following:

[mvk-debug] Created VkDescriptorSetLayout 0x600000e26700 with 1 bindings:
	0: VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE              with up to 1000000 elements at binding 0

[mvk-debug] Created VkDescriptorPool 0x126412200 with 2 descriptor sets, and pooled descriptors:
	VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE:            2112  (2112 remaining)

and then copy them and post them here.

Or you can just ZIP up a copy of the log file and post it here as a file.

And can you confirm that you have updated the descriptors with the full set of resources that are accessed statically or dynamically from the shader?

Finally, is there any way you can post a minimal app that demonstrates the problem, so we can evaluate it here?

@billhollings billhollings removed the Bug label Sep 7, 2024
@patrick-han
Copy link
Author

@billhollings The log file with those 2 env vars enabled is attached:
logfile.log

And can you confirm that you have updated the descriptors with the full set of resources that are accessed statically or dynamically from the shader?

If I understand your question correctly, yes. I do my vkUpdateDescriptorSets() for the bindless textures all at once before the rendering loop begins, and I don't load any additional resources once rendering starts.

Finally, is there any way you can post a minimal app that demonstrates the problem, so we can evaluate it here?

The repo for this app is public on my profile magic-red, but I can try to pair it down if it's too large.

@stuartcarnie
Copy link

I'm seeing this in Godot when setting MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS=1 running MoltenVK 1.3.290

[mvk-error] SPIR-V to MSL conversion error: Argument buffer resource base type could not be determined. When padding argument buffer elements, all descriptor set resources must be supplied with a base type by the app.
[mvk-error] VK_ERROR_INVALID_SHADER_NV: Compute shader function could not be compiled into pipeline. See previous logged error.

@billhollings
Copy link
Contributor

I'm seeing this in Godot when setting MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS=1 running MoltenVK 1.3.290

There have been a number of MoltenVK updates around this since SDK 1.3.290.

Can you try building the latest MoltenVK from this repo, and retesting? If not, a new SDK will be released at the end of September that will have the latest MoltenVK in it.

@stuartcarnie
Copy link

@billhollings thanks for the feedback – once I get further along with my work in Godot, that requires these improvements, I'll either compile from source or pull down the official release.

@billhollings
Copy link
Contributor

@patrick-han

The repo for this app is public on my profile magic-red, but I can try to pair it down if it's too large.

Thanks for this. It was very helpful in getting to the bottom of this.

The crash is being caused by a bug in the BitArray class. As a result, I've overhauled that class to fix a number of problems, and will be releasing it in the next few days. In the meantime, you can use PR #2349 from @etang-cw, who caught and fixed the same bug. Unfortunately, neither fix will make it into the upcoming SDK due to timing.

The issue of textures not being used correctly is due to your app treating the sampler descriptor binding as an array of length 2 (only one of which is used), but the SPIR-V shaders treating the descriptor binding as a single sampler (not an array).

If this is working correctly on other platforms, then we should patch SPIRV-Cross to deal with extra padding in the descriptors. In the meantime, in your app, if you change the value of maxSamplerCount to 1, it fixes your particular problem.

BTW...with MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS enabled (the default now), you can leave maxBindlessResourceCount set to 16536. However, if you disable MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS, you will indeed be limited to fewer textures, and the step-down to 120 is appropriate.

@patrick-han
Copy link
Author

@patrick-han

The repo for this app is public on my profile magic-red, but I can try to pair it down if it's too large.

Thanks for this. It was very helpful in getting to the bottom of this.

The crash is being caused by a bug in the BitArray class. As a result, I've overhauled that class to fix a number of problems, and will be releasing it in the next few days. In the meantime, you can use PR #2349 from @etang-cw, who caught and fixed the same bug. Unfortunately, neither fix will make it into the upcoming SDK due to timing.

The issue of textures not being used correctly is due to your app treating the sampler descriptor binding as an array of length 2 (only one of which is used), but the SPIR-V shaders treating the descriptor binding as a single sampler (not an array).

If this is working correctly on other platforms, then we should patch SPIRV-Cross to deal with extra padding in the descriptors. In the meantime, in your app, if you change the value of maxSamplerCount to 1, it fixes your particular problem.

BTW...with MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS enabled (the default now), you can leave maxBindlessResourceCount set to 16536. However, if you disable MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS, you will indeed be limited to fewer textures, and the step-down to 120 is appropriate.

Thank you for the information, and with the patch I'm able to work with the large texture count now :)

Just to be extra clear for my own understanding, my usage of only 1 sampler in the array despite it being of length 2 is not incorrect usage is it? I'm not too familiar with things once they get down to SPIR-V. It does work properly on my Windows PC.

@billhollings
Copy link
Contributor

billhollings commented Sep 27, 2024

The crash is being caused by a bug in the MVKBitArray class. As a result, I've overhauled that class to fix a number of problems, and will be releasing it in the next few days.

PR #2355 fixes the crash.

Just to be extra clear for my own understanding, my usage of only 1 sampler in the array despite it being of length 2 is not incorrect usage is it? I'm not too familiar with things once they get down to SPIR-V. It does work properly on my Windows PC.

I believe what you are doing is correct, particularly if it is working on other platforms.

GPU's have indexed descriptors. Unfortunately, despite the [[id(n)]] nomenclature, these are not actually descriptor indexes, and MSL does not allow gaps in the structure of a descriptor set. This is why you'll sometimes see MoltenVK/SPIRV-Cross add padding members to an MSL descriptor set structure.

In this case, we'll have to add padding in the MSL structure after a SPIR-V scalar that is actually the first element of a fixed-length array.

I'm going to leave this issue open, to track that fix.

@patrick-han
Copy link
Author

patrick-han commented Sep 28, 2024

The crash is being caused by a bug in the MVKBitArray class. As a result, I've overhauled that class to fix a number of problems, and will be releasing it in the next few days.

PR #2355 fixes the crash.

Just to be extra clear for my own understanding, my usage of only 1 sampler in the array despite it being of length 2 is not incorrect usage is it? I'm not too familiar with things once they get down to SPIR-V. It does work properly on my Windows PC.

I believe what you are doing is correct, particularly if it is working on other platforms.

GPU's have indexed descriptors. Unfortunately, despite the [[id(n)]] nomenclature, these are not actually descriptor indexes, and MSL does not allow gaps in the structure of a descriptor set. This is why you'll sometimes see MoltenVK/SPIRV-Cross add padding members to an MSL descriptor set structure.

In this case, we'll have to add padding in the MSL structure after a SPIR-V scalar that is actually the first element of a fixed-length array.

I'm going to leave this issue open, to track that fix.

I did end up filling that other "slot" with another sampler s.t. I now have 2 samplers in the array: patrick-han/magic-red@d3ea2d7

But I'm still getting incorrect rendering on my Mac, where it's working on my PC. From what I understand the issue with the padding should only occur if I try to "leave a gap" though, right? No validation errors to report either.

@billhollings
Copy link
Contributor

I did end up filling that other "slot" with another sampler s.t. I now have 2 samplers in the array: patrick-han/magic-red@d3ea2d7

But I'm still getting incorrect rendering on my Mac, where it's working on my PC. From what I understand the issue with the padding should only occur if I try to "leave a gap" though, right? No validation errors to report either.

When I test patrick-han/magic-red@d3ea2d7, I am still seeing the same mismatch between the descriptor set layouts declaring an array of 2 samplers, and the SPIR-V variable definitions declaring a single scalar sampler:

Created VkDescriptorSetLayout 0x600001e589c0 with 2 bindings:
	0: VK_DESCRIPTOR_TYPE_SAMPLER                    with 2 elements at binding 0
	2: VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE              with up to 16536 elements at binding 
struct spvDescriptorSetBuffer0
{
    sampler linearSampler [[id(0)]];
    array<texture2d<float>, 1> textures [[id(2)]];
};

@billhollings
Copy link
Contributor

PR #2409 fixes the remaining issue, where a shader reads the first element of an array as a scalar.

@billhollings billhollings added the Completed Issue has been fixed, or enhancement implemented. label Dec 10, 2024
@DUOLabs333
Copy link

This should also fix #2374, right?

@billhollings
Copy link
Contributor

This should also fix #2374, right?

Sadly, no.

The issue there is the use of a runtime array of VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, which currently results in two Metal runtime arrays, one for the textures and one for the samplers. It's not possible to have two runtime arrays in one descriptor set or Metal argument buffer.

Fixing this will require an intrusive update to SPIRV-Cross, as explained in this comment.

@DUOLabs333
Copy link

I see --- I just thought the two issues were related, as they had the same error messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Completed Issue has been fixed, or enhancement implemented.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants