-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][LLPC] Always move non-uniform descriptor loads inside the waterfall loop #2859
Conversation
Test summary for commit fe0df0bCTS tests (Failed: 0/138378)
Ubuntu navi3x, SrdcvkUbuntu navi2x, Srdcvk |
fe0df0b
to
aaa28a5
Compare
aaa28a5 Jenkins build error. |
aaa28a5
to
f1f1070
Compare
Test summary for commit f1f1070CTS tests (Failed: 0/138378)
Ubuntu navi3x, SrdcvkUbuntu navi2x, Srdcvk |
The first patch (#2759) has an initial implementation for the scalarization of descriptor loads. This patch enables the scalarization of the non-uniform descriptor loads even if one of the other operand is uniform. |
ping |
f1f1070
to
bd31f59
Compare
Test summary for commit bd31f59CTS tests (Failed: 0/138378)
Ubuntu navi3x, SrdcvkUbuntu navi2x, Srdcvk |
ping |
// @param nonUniformIndex : the non-uniform index of the non-uniform operand of the image call | ||
// @return : the 32-bit value of the nonUniformIndex | ||
Value *get32BitNonUniformIndex(Value *nonUniformIndex) { | ||
if (nonUniformIndex->getType()->isIntegerTy(64)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only invocation of get32BitNonUniformIndex
is guarded with the same check nonUniformIndex->getType()->isIntegerTy(64)
. Either remove it there or keep it and add an assert here.
lgc/builder/BuilderImpl.cpp
Outdated
DenseMap<Value *, Value *> nonUniformIndex32BitVal; | ||
for (auto nonUniformIndex : nonUniformIndices) { | ||
// Start the waterfall loop using the waterfall index. | ||
if (nonUniformIndex->getType()->isIntegerTy(64)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can nonUniformIndex
be a 16-bit value? If so, then it needs to be handled too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgc/builder/BuilderImpl.cpp
Outdated
// @param nonUniformInst : image call | ||
// @return : the 32-bit value of the nonUniformIndex | ||
Value *getSharedIndex(ArrayRef<Value *> nonUniformIndices, DenseMap<Value *, Value *> &nonUniformIndex32BitVal, | ||
TraceNonUniformIndex &traceNonUniformIndex, Instruction *nonUniformInst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused nonUniformInst
?
(However the function declares a local variable nonUniformInstr).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for spotting it. Now, I renamed this function to calculateSharedIndex() and it is a method of TraceNonUniformIndex. As a result, I do not need to pass all these arguments.
lgc/builder/BuilderImpl.cpp
Outdated
@@ -688,109 +741,95 @@ Instruction *BuilderImpl::createWaterfallLoop(Instruction *nonUniformInst, Array | |||
SmallVector<Value *, 2> nonUniformIndices; | |||
// Maps the nonUniformIndex that is returned by traceNonUniformIndex() to the nonUniformInst. | |||
DenseMap<Value *, std::pair<Value *, unsigned>> nonUniformIndexImageCallOperand; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is nonUniformIndexImageCallOperand
map being actually used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Good catch :)
lgc/builder/BuilderImpl.cpp
Outdated
@@ -688,109 +741,95 @@ Instruction *BuilderImpl::createWaterfallLoop(Instruction *nonUniformInst, Array | |||
SmallVector<Value *, 2> nonUniformIndices; | |||
// Maps the nonUniformIndex that is returned by traceNonUniformIndex() to the nonUniformInst. | |||
DenseMap<Value *, std::pair<Value *, unsigned>> nonUniformIndexImageCallOperand; | |||
TraceNonUniformIndex traceNonUniformIndex(nonUniformInst, scalarizeDescriptorLoads, 64); | |||
TraceNonUniformIndex traceNonUniformIndex(nonUniformInst, scalarizeDescriptorLoads); | |||
DenseMap<unsigned, Value *> operandIdxnonUniformIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think the naming convention is operandIdxNonUniformIndex, although I dislike having both Idx
and Index
at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
|
||
if (origNonUniformVal == nonUniformImageCallOperand) | ||
continue; | ||
for (unsigned operandIdx : operandIdxs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to loop through operandIdxs
rather than nonUniformIndices
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amdgcn.waterfall.begin intrinsic has the nonUniformIndex as a parameter. The amdgcn.waterfall.readfirstlane has two parameters:
- the first argument is the amdgcn.waterfall.begin intrinsic
- in case of scalarization, the second argument is the nonUniformIndex. But, if there is not a nonUniformIndex or if the scalrarization is disabled, then the second argument is the operand of the non-uniform instruction. The later is calculated with the help operandIdxs.
attributes #2 = { nounwind memory(none) } | ||
attributes #3 = { nounwind memory(write) } | ||
|
||
!lgc.client = !{!0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can you remove the metadata? I think most of them are irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove them, then the code does not run through llpc pipeline and the scalarization is not enabled.
; Function Attrs: nounwind | ||
define dllexport spir_func void @lgc.shader.VS.main() local_unnamed_addr #0 !spirv.ExecutionModel !14 !lgc.shaderstage !15 { | ||
.entry: | ||
%0 = call <4 x i32> (...) @lgc.create.read.generic.input.v4i32(i32 2, i32 0, i32 0, i32 0, i32 0, i32 poison) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our tests normally have names for the instructions rather than numbers (that limits the potential diffs in the test in the future). You can do the automatic conversion by doing "opt -instnamer" on a test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
@@ -0,0 +1,113 @@ | |||
; NOTE: Assertions have been autogenerated by tool/update_llpc_test_checks.py UTC_ARGS: --tool lgc | |||
; RUN: lgc -mcpu=gfx1010 -print-after=lgc-builder-replayer -o - %s 2>&1 | FileCheck --check-prefixes=CHECK %s | |||
; ModuleID = 'lgcPipeline' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be really useful if you could add a comment that says what each test does, for example: "Make sure that there is a waterfall loop where..". Alternatively, use a more descriptive test name (file name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments at the top of the tests.
bd31f59
to
6ffd47f
Compare
Test summary for commit 6ffd47fCTS tests (Failed: 568/138443)
Ubuntu navi3x, SrdcvkUbuntu navi2x, Srdcvk |
6ffd47f
to
ef07012
Compare
ef07012 Jenkins build error.
|
Currently, we bail-out scalarization if one of the operands of the image call is uniform. In this patch, we enable the scalarization only for the non-uniform operands. To do this, I refactored the createWaterfallLoop() .