-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental workgraph - Phase 4 #2288
Conversation
8c4f4ff
to
55c14f6
Compare
{ | ||
uint lane = subgroupBallotFindLSB(ballot); | ||
|
||
// TODO: Is there a more elegant way that is just as fast and fully portable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only thing that would come to mind is rewriting the loop as something like
bool thread_mask = count != 0;
while (subgroupAny(thread_mask)) {
uint lane = subgroupBallotFindLSB(subgroupBallot(thread_mask));
thread_mask = thread_mask && gl_SubgroupInvocationID > lane;
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that comparison would require vector ops.
Signed-off-by: Hans-Kristian Arntzen <[email protected]>
Signed-off-by: Hans-Kristian Arntzen <[email protected]>
Details of implementation strategy is explained in docs/workgraphs.md. Signed-off-by: Hans-Kristian Arntzen <[email protected]>
55c14f6
to
97a35a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good now.
This is the first one with some real meat to it.
Setup GPU input
Only used when processing a GPU_INPUT entry point. Just generates indirect dispatches.
distribute_workgroups
distribute_payload_offsets
Loops over { node index, payload offset, count } entries and expands this into a simple array of u32 payload offsets which will be consumed by the next stage's nodes. The space for these offsets were allocated by distribute_workgroups.
complete_compaction
This was recently added. Adjusts the broadcast amplification rate based on the data we analyzed in payload_offsets, once per node.