Experimental workgraph - Phase 4 #2288

HansKristian-Work · 2025-01-15T11:23:06Z

This is the first one with some real meat to it.

Adds a missing meta .md document I forgot to include in the first rebase.
Adds meta operations for workgraphs. The high-level details are explained in the .md, but the TLDR is that we have four shaders:

Setup GPU input

Only used when processing a GPU_INPUT entry point. Just generates indirect dispatches.

distribute_workgroups

Counts number of records produced by each node, prefix sums it and allocates space for the expanded offsets.
Sets up indirect command for each node. TODO: Potentially allocate a EXT_dgc entry for non-empty nodes.
Sets up one indirect command for payload expander.

distribute_payload_offsets

Loops over { node index, payload offset, count } entries and expands this into a simple array of u32 payload offsets which will be consumed by the next stage's nodes. The space for these offsets were allocated by distribute_workgroups.

complete_compaction

This was recently added. Adjusts the broadcast amplification rate based on the data we analyzed in payload_offsets, once per node.

docs/workgraphs.md

libs/vkd3d/shaders/cs_workgraph_distribute_workgroups.comp

doitsujin · 2025-01-15T12:31:51Z

libs/vkd3d/shaders/cs_workgraph_distribute_payload_offsets.comp

+	{
+		uint lane = subgroupBallotFindLSB(ballot);
+
+		// TODO: Is there a more elegant way that is just as fast and fully portable?


only thing that would come to mind is rewriting the loop as something like

bool thread_mask = count != 0; while (subgroupAny(thread_mask)) { uint lane = subgroupBallotFindLSB(subgroupBallot(thread_mask)); thread_mask = thread_mask && gl_SubgroupInvocationID > lane; ... }

I think that comparison would require vector ops.

libs/vkd3d/meta.c

Signed-off-by: Hans-Kristian Arntzen <[email protected]>

Details of implementation strategy is explained in docs/workgraphs.md. Signed-off-by: Hans-Kristian Arntzen <[email protected]>

doitsujin

Should be good now.

HansKristian-Work requested a review from doitsujin January 15, 2025 11:23

HansKristian-Work force-pushed the workgraph-rebase-phase-4 branch from 8c4f4ff to 55c14f6 Compare January 15, 2025 11:23

Josh015 reviewed Jan 15, 2025

View reviewed changes

docs/workgraphs.md Show resolved Hide resolved

doitsujin reviewed Jan 15, 2025

View reviewed changes

libs/vkd3d/meta.c Outdated Show resolved Hide resolved

HansKristian-Work added 3 commits January 21, 2025 14:24

meta: Add more numbers for NV donut workgraph sample.

863445c

Signed-off-by: Hans-Kristian Arntzen <[email protected]>

meta: Add way of using explicit subgroup size for meta shaders.

5a0578e

Signed-off-by: Hans-Kristian Arntzen <[email protected]>

meta: Add meta-ops for workgraph emulation.

97a35a4

Details of implementation strategy is explained in docs/workgraphs.md. Signed-off-by: Hans-Kristian Arntzen <[email protected]>

HansKristian-Work force-pushed the workgraph-rebase-phase-4 branch from 55c14f6 to 97a35a4 Compare January 21, 2025 13:28

doitsujin approved these changes Jan 21, 2025

View reviewed changes

HansKristian-Work merged commit a1d6937 into master Jan 21, 2025
6 checks passed

HansKristian-Work deleted the workgraph-rebase-phase-4 branch January 21, 2025 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental workgraph - Phase 4 #2288

Experimental workgraph - Phase 4 #2288

HansKristian-Work commented Jan 15, 2025 •

edited

Loading

doitsujin Jan 15, 2025

HansKristian-Work Jan 15, 2025

doitsujin left a comment

Experimental workgraph - Phase 4 #2288

Experimental workgraph - Phase 4 #2288

Conversation

HansKristian-Work commented Jan 15, 2025 • edited Loading

Setup GPU input

distribute_workgroups

distribute_payload_offsets

complete_compaction

doitsujin Jan 15, 2025

Choose a reason for hiding this comment

HansKristian-Work Jan 15, 2025

Choose a reason for hiding this comment

doitsujin left a comment

Choose a reason for hiding this comment

HansKristian-Work commented Jan 15, 2025 •

edited

Loading