-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interleaved multi-queue #3096
Interleaved multi-queue #3096
Conversation
Added 'D3D12BufferCreator::InitialState(D3D12_RESOURCE_STATES)' to allow user code to avoid adding unnecessary transition barriers after resource creation. Added 'D3D12TextureCreator::ClearVal(const D3D12_CLEAR_VALUE &)' to avoid a PIX performance warning/tip about clearing resources that haven't had their clear value set on creation.
Thanks for this, yes I added in syncs between queue submission changes which should handle at least some case of async queue usage. Regarding your second point, combined tests makes sense if there is some interaction of features but there's no relation between multiple queue testing and SM6.6 testing so I wouldn't want that included in this test. There's also already a test for SM6.6 in D3D12_Descriptor_Indexing which is specifically designed for testing that kind of functionality. |
Thanks. I see it now. I think I'll have searched for |
The requested change was made, by the way. I stripped out the 'dynamic resource' / 'descriptor heap indexing' aspect and squashed it into the one commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks mostly fine, I've left a couple of minor nitpick style comments.
As an overall thing though I wanted to clarify - it's ambiguous to me what exactly you're trying to test here. The way the test is constructed this looks like it will replay in RenderDoc, however it won't be replaying precisely as described because RenderDoc does not correctly handle wait-before-signal. If you for example had the compute queues read some data filled out by the graphics queue in the 'clear RTs' step, they would not see it because each submit is processed in CPU-submission order, without respect to waits and signals.
If you're only testing to make sure that overlap without wait-before-signal is handled then that's fine, and this test will do it, but with the way you've constructed it I'm not clear exactly what case you're trying to test.
In addition to that, this demo is definitely useful but by itself it is missing the testing logic that would actually cause it to run in the nightly CI. That would involve writing a python script to run the test, grab a capture, and test whatever you're looking to test. In this case I think that would involve looking for the right colours to be produced.
{// DescriptorTable / D3D12_ROOT_DESCRIPTOR_TABLE1 | ||
sizeof(tableDescriptorRanges) / sizeof(tableDescriptorRanges[0]), // NumDescriptorRanges | ||
&tableDescriptorRanges[0]}, | ||
D3D12_SHADER_VISIBILITY_ALL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor formatting nit - putting trailing commas in these arrays will make clang format lay them out a bit clearer. Not required but it would probably improve things.
Also just to check - is it not possible to formulate these bindings with the existing tableParam()
helper? It would require two root parameters instead of one with two ranges, but does that affect the test at all? If not it would be nice to remove the boilerplate here by using that helper even if it's not optimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the change from a single root descriptor table of 2-ranges to 2 root tables (using tableParam
).
I can understand if you prefer to use multiple instances of tableParam
, especially when root parameters aren't at a premium, which is the case here. I did it that way initially just because that seemed the most usual way of doing it and with the slightly unusual way tableParam
builds up hidden internal table ranges, I couldn't actually get the exact descriptor table I had in mind ... but I'm fine with changing the structure a little to cut down on a few lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The autotest project primarily focusses on testing what we aim to test with the simplest/most basic code possible so that when reading the code you focus on what the test is stressing. It's definitely not optimal but unless the test is specifically about descriptor ranges it's the most straightforward way to express it yeh.
.DSV() | ||
.NoSRV() | ||
.InitialState(D3D12_RESOURCE_STATE_DEPTH_WRITE) | ||
.ClearVal(clearVal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is totally fine to go to the lengths of setting the clear value, but I typically disable/ignore that warning as extremely noisy and extremely low value.
ID3D12GraphicsCommandListPtr cmd = GetCommandBuffer(); | ||
Reset(cmd); | ||
ResourceBarrier(cmd, vb, D3D12_RESOURCE_STATE_COMMON, | ||
D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - there is an overload of this ResourceBarrier
helper that doesn't take a command buffer and does the reset/submit dance for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. That's nicer. Thanks.
util/test/demos/demos.vcxproj
Outdated
@@ -477,4 +478,7 @@ | |||
<Visible>false</Visible> | |||
</Content> | |||
</ItemGroup> | |||
<ItemGroup> | |||
<None Include="..\..\..\.clang-format" /> | |||
</ItemGroup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for including the .clang-format file here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question: I missed that. Don't know. I think it may have been me trying to get VS(2022) to adopt that formatting style (or me investigating whether VS would auto-format differently).
Have now removed it.
|
||
// Transition resources back to their expected states at the start of the next frame | ||
ResourceBarrier(cmd, buffer, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, | ||
D3D12_RESOURCE_STATE_UNORDERED_ACCESS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to clear the buffer here as well, since otherwise after the first loop the test doesn't actually demonstrate that the compute data was written properly before the graphics work ran - it's overwriting the same data over and over so it could produce the same output even if the replay is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the first loop the test doesn't actually demonstrate that the compute data was written properly before the graphics work ran
The test now first resets the buffer to a pure blue from the gfx queue, then the compute queues read and add to the colours, adding brown and green so that the gfx pixel shading colour the triangles purple and sky blue. That's ultimately what we'd like the replay to show but, for now at least, the replay draws both tris pure blue, as expected because of the current behaviour you've described.
Thanks for the feedback. |
I see. This was the kind of thing that I wasn't sure about if/how Renderdoc would handle this submission style (entire frame's sequence for each GPU queue at a time) and so thought this test would either fail in some way and reveal to me the bits of Rd code involved, what we have available in order to more appropriately serialise and spoon-feed the GPU with captured queue sequences for replaying, and what options look best to handle this and correctly replay as it would on Pix (or how the GPU would consume the work). Since this style of submission is normal/legitimate (and also quite clean and convenient from the gfx application's point of view), the real goal I have in mind is to aim towards Renderdoc having correct multi-queue dependency replay serialisation, which you point out is not yet sufficiently tested by these changes, so I've added an initial gfx buffer write which we then read-modify-write in subsequent compute queue work. In this case, the GPU correctly renders a sky-blue and purple triangle but because of what you've explained Rdoc does, the capture replay outputs two pure blue triangles. So you can see the individual changes for each issue you mention, I've kept the changes separate for now and will probably squash them all into the first commit (if you agree), since they all fall under the 'add interleaved multi-queue test'. |
Am I right in thinking that the current CI (I'm just looking at I'm not yet familiar with the pythonised pixel testing side of things but is that something I could add as a separate PR and is this kind of thing fairly straightforward to do through the 'util\test\tests\D3D12\D3D12_...py' test files? Initially, because I wouldn't want to add a failing test and because I would expect Renderdoc's replay, in its current state, not to result in the sky blue and purple triangles, I'd aim to have a test that currently expects the pure blue 'reset buffer' colours for now. |
The github CI is only unit tests because it doesn't run on a machine with a capable GPU so it can't run any kind of functional tests. I run the functional tests nightly and upload the results as part of the nightly builds, on a machine with a GPU. Indeed the separate python scripts are basically little harnesses to look at a capture and do something things like e.g. check that the triangles are rendered and in the correct colours. It would be simplest not to commit one of those yet, but you could also submit it disabled so it doesn't run yet (each test has a function to check for support, in case of e.g. hardware requirements, which could be set to return false for now). It sounds like what you're specifically wanting to support and test then is wait-before-signal? Indeed that is not supported by RenderDoc because it is awkward to handle and confusing to display to the user since you no longer have a linear view of the commands in the frame. It would be best to rename the test & commit to explicitly state that you are testing wait-before-signal, since that is a specific case and not the same as generalised async compute/interleaved queue operations. Other than that yes if you squash this all into one commit it can be merged then now that I understand what you're wanting to test. |
This tests submitting commands, signals, and waits to graphics and multiple compute queues where the entire frame's submission for each queue depends on the other queues and the submission order is not the order in which the GPU (and replay) need to be processed. This is to ensure any capture or replaying deserialises adequately.
Thanks. (I thought it useful to still keep the 'multi' in the name, just in case people who aren't familiar with the tests and who might just be browsing them can get a hint that there's multiple queue submission work going on too; 'wait-before-signal' doesn't immediately convey 'interleaved, out-of-render-order, multiple queues' to me. Anyone else wanting to do similar multi-queue testing may more quickly locate and understand what's going on. For example, I was hunting around for the kind of test boilerplate required to do this new multi-queue test amongst the existing tests before realising they don't have what I needed... And if that ever happens, I think that would be the time to factor out any of the shared utility from stuff like the little |
Description
I remember privately sending renderdoc feedback a few years ago after discovering that it couldn't handle submission to multiple queues (graphics and multiple async compute) where the commands, waits, and signals for each queue are submitted in the entirety of the frame's work, one queue at a time and where there were interdependencies between queues. I.e. work doesn't get carefully submitted to each queue in an order that avoids the GPU ever getting stuck on a wait.
I'm please to see that this appears to no longer be an issue with renderdoc (at least with this simple new test
d3d12_interleaved_multi_queue
) but since I can't see any other multi-queue d3d12 tests (I could only see only single graphics or single compute queue tests), I thought it might be worth adding this to the set of d3d12 demos/tests.I don't know how you feel about requiring clean separation of different test elements but, because it was relatively easy to add (and something that also doesn't appear to be covered in the existing d3d12 demos/tests), this new test includes optional support for using dynamic descriptor heap reasources (suitably checking for hardware support at run-time). If this is a big deal, I can separate just that out into a different test, if that's something you'd like.