Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental workgraph - Phase 4 #2288

Merged
merged 3 commits into from
Jan 21, 2025
Merged

Conversation

HansKristian-Work
Copy link
Owner

@HansKristian-Work HansKristian-Work commented Jan 15, 2025

This is the first one with some real meat to it.

  • Adds a missing meta .md document I forgot to include in the first rebase.
  • Adds meta operations for workgraphs. The high-level details are explained in the .md, but the TLDR is that we have four shaders:

Setup GPU input

Only used when processing a GPU_INPUT entry point. Just generates indirect dispatches.

distribute_workgroups

  • Counts number of records produced by each node, prefix sums it and allocates space for the expanded offsets.
  • Sets up indirect command for each node. TODO: Potentially allocate a EXT_dgc entry for non-empty nodes.
  • Sets up one indirect command for payload expander.

distribute_payload_offsets

Loops over { node index, payload offset, count } entries and expands this into a simple array of u32 payload offsets which will be consumed by the next stage's nodes. The space for these offsets were allocated by distribute_workgroups.

complete_compaction

This was recently added. Adjusts the broadcast amplification rate based on the data we analyzed in payload_offsets, once per node.

libs/vkd3d/shaders/cs_workgraph_distribute_workgroups.comp Outdated Show resolved Hide resolved
{
uint lane = subgroupBallotFindLSB(ballot);

// TODO: Is there a more elegant way that is just as fast and fully portable?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only thing that would come to mind is rewriting the loop as something like

bool thread_mask = count != 0;

while (subgroupAny(thread_mask)) {
  uint lane = subgroupBallotFindLSB(subgroupBallot(thread_mask));
  thread_mask = thread_mask && gl_SubgroupInvocationID > lane;
  ...
}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that comparison would require vector ops.

libs/vkd3d/meta.c Outdated Show resolved Hide resolved
Details of implementation strategy is explained in docs/workgraphs.md.

Signed-off-by: Hans-Kristian Arntzen <[email protected]>
Copy link
Collaborator

@doitsujin doitsujin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good now.

@HansKristian-Work HansKristian-Work merged commit a1d6937 into master Jan 21, 2025
6 checks passed
@HansKristian-Work HansKristian-Work deleted the workgraph-rebase-phase-4 branch January 21, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants