[LLPC] Scalarize non-uniform loads inside the waterfall loop #2759

kmitropoulou · 2023-10-12T06:48:49Z

This patch refactors the code that pushes the loads inside the waterfall loop:

removes the optimization for scalarized loads
scalarize non-uniform instructions when it is possible.

jayfoad · 2023-10-12T06:57:43Z

Can you add a description please? Is this PR identical to #2645 or have you fixed something?

kmitropoulou · 2023-10-12T07:06:42Z

Yes, this patch is an update for #2645

amdvlk-admin · 2023-10-12T07:25:48Z

Test summary for commit `7033caf`

CTS tests (Failed: 31/139021)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 41128/69517 (59.2%)

Failed: 16/69517 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
Stack trace: Flake

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28373/69517 (40.8%)
Warnings: 0/69517 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 41163/69504 (59.2%)

Failed: 15/69504 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28326/69504 (40.8%)
Warnings: 0/69504 (0.0%)

kmitropoulou · 2023-10-12T07:47:27Z

test this please

perlfu · 2023-10-12T07:54:40Z

Technically this is a revert for #2705 as well?
As it brings back the optional_bool parts, etc.

At a high level, GFX 10.3.2 codegen is just modifying the image descriptor after it is loaded as a workaround.
This code seems to explicitly match the sequence of that modification?
Is it possible to make the code more generic so it would be robust to changes in that modification sequence?

kmitropoulou · 2023-10-12T08:00:05Z

Is it possible to make the code more generic so it would be robust to changes in that modification sequence?

As we discussed, this was my intention. Any suggestions are welcome :)

kmitropoulou · 2023-10-12T08:03:21Z

Technically this is a revert for #2705 as well?

#2645 should have never been committed. So, I had to revert it in #2705.

amdvlk-admin · 2023-10-12T08:42:19Z

Test summary for commit `7033caf`

CTS tests (Failed: 31/139021)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 41129/69517 (59.2%)

Failed: 15/69517 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28373/69517 (40.8%)
Warnings: 0/69517 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 41162/69504 (59.2%)

Failed: 16/69504 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Flake

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28326/69504 (40.8%)
Warnings: 0/69504 (0.0%)

nhaehnle

The use of matching feels sort of backwards. If we know the code sequences that should go into the waterfall loops, shouldn't we create them like that directly? I'll try to see if I can expand on this, but I think the gist of it is that we should put the waterfall loop creation upside-down: instead of creating the image instruction and then calling createWaterfallLoop, we should call createWaterfallLoop first, and then insert instructions into the loop.

nhaehnle · 2023-10-12T14:20:11Z

lgc/builder/BuilderImpl.cpp

+    Value *OrigLoad = nullptr;
+    Value *OrigAnd = nullptr;
+    Value *OrigGep = nullptr;
+    Value *OrigExtract = nullptr;


Coding style in LLPC is to use lowerCamelCase.

nhaehnle

I remember now what the difficulty was with just creating waterfall loops directly. The descriptors load instructions are created separately from when the actual image access instructions are created. I wonder whether we shouldn't change this.

nhaehnle

I suppose with some fixes this PR can be at least made correct.

nhaehnle · 2023-10-12T23:51:01Z

lgc/include/lgc/builder/BuilderImpl.h

+  llvm::Value *emitWaterfallBegin(llvm::Instruction *nonUniformInst, llvm::ArrayRef<unsigned> operandIdxs,
+                                  llvm::ArrayRef<llvm::Value *> nonUniformIndices, bool useVgprForOperands = false,
+                                  const llvm::Twine &instName = "");
+
+  llvm::Value *emitWaterfallBeginForScalarizedLoops(llvm::Instruction *nonUniformInst,


These two methods should be private.

nhaehnle · 2023-10-12T23:52:13Z

lgc/builder/BuilderImpl.cpp

  if (nonUniformIndices.empty())
    return nonUniformInst;

+  if (nonUniformInst->getType()->isVoidTy())
+    scalarizeDescriptorLoads = false;


I am sorry I missed this. This is for the case that we have the following intrinsic: @llvm.amdgcn.struct.buffer.store.format.v4f32()

Can you please comment on that?

This still doesn't make sense though. The "scalarization" is a function of the descriptor inputs to the intrinsic, not a function of its output. This should be removed.

I remove it and I added two tests : scalarizationOfDescriptorLoadTest1.ll and scalarizationOfDescriptorLoadTest2.ll

nhaehnle · 2023-10-12T23:56:53Z

lgc/builder/BuilderImpl.cpp

+      newLoad->insertBefore(nonUniformInst);
+      nonUniformInst->setOperand(operandIdx, newLoad);
+      newGep->insertBefore(newLoad);
+      newGep->setOperand(1, readFirstLane);


Need to check that operand 1 is actually the firstIndexInst.

I think we should not do it in the new version of the code.

nhaehnle · 2023-10-12T23:57:01Z

lgc/builder/BuilderImpl.cpp

+      auto *loadOp = dyn_cast<LoadInst>(OrigLoad);
+      auto *gepOp = dyn_cast<GetElementPtrInst>(OrigGep);
+      auto *AndOp = dyn_cast<Instruction>(OrigAnd);
+      auto *extractOp = dyn_cast<ExtractElementInst>(OrigExtract);


These should all just be cast. No reason to use dyn_cast after previously doing all the match checks.

Thank you for pointing out :) Not needed any more.

nhaehnle · 2023-10-13T03:35:59Z

Yet another follow-up to this... wouldn't it be possible to collect the relevant instructions in traceNonUniformIndex and then optionally clone them into the waterfall loop in a generic fashion? That seems preferable over hard-coding the specific code sequences.

perlfu · 2023-10-13T09:18:58Z

Yet another follow-up to this... wouldn't it be possible to collect the relevant instructions in traceNonUniformIndex and then optionally clone them into the waterfall loop in a generic fashion? That seems preferable over hard-coding the specific code sequences.

I agree capturing the visited instruction chain from traceNonUniformIndex seems like a good way to avoid this being brittle.

Longer term I have been suggesting that we move this lowering to separate pass later in the compiler and simply annotate the output of non-uniform instructions as %val = call @lgc.create.waterfall(%ret) or similar.
Your point about earlier parts of the compiler knowing potentially knowing which instructions are non-uniform is valid.
We could be preserving those annotations out of the SPIRVReader for use in later lowering -- although I suspect in some cases it might not tell us much more than operandIdxs does already.

kmitropoulou · 2023-10-13T16:03:38Z

@nhaehnle : Thank you for the suggestion. I will try to capture the instruction chain from traceNonUniformIndex .

tsymalla-AMD · 2023-10-16T14:22:08Z

lgc/builder/BuilderImpl.cpp

@@ -35,9 +35,11 @@
 #include "lgc/state/TargetInfo.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/PatternMatch.h"


Can you use the namespace directly into the method emitWaterfallBeginForScalarizedLoops?

kmitropoulou · 2023-10-18T07:08:33Z

@tsymalla-AMD : I removed all the pattern match code :)

amdvlk-admin · 2023-10-18T07:32:49Z

Test summary for commit `23cd027`

CTS tests (Failed: 32/138994)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 41128/69503 (59.2%)

Failed: 16/69503 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28359/69503 (40.8%)
Warnings: 0/69503 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 41162/69491 (59.2%)

Failed: 16/69491 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28313/69491 (40.7%)
Warnings: 0/69491 (0.0%)

tsymalla-AMD

Looks like some CTS are failing, but I did not check if those are related.

lgc/builder/BuilderImpl.cpp

amdvlk-admin · 2023-10-18T23:46:22Z

Test summary for commit `d180bcf`

CTS tests (Failed: 32/138994)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 41128/69503 (59.2%)

Failed: 16/69503 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28359/69503 (40.8%)
Warnings: 0/69503 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 41162/69491 (59.2%)

Failed: 16/69491 (0.0%)

Failures:

FAILURE: dEQP-VK.descriptor_indexing.sampled_image
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampled_image_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_after_bind_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_in_loop_with_lod
Stack trace: Fail

FAILURE: dEQP-VK.descriptor_indexing.sampler_with_lod
Stack trace: Fail

Not Supported: 28313/69491 (40.7%)
Warnings: 0/69491 (0.0%)

amdvlk-admin · 2023-11-15T06:40:05Z

Test summary for commit `43a7a13`

CTS tests (Failed: 0/138443)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 35211/69228 (50.9%)
Failed: 0/69228 (0.0%)
Not Supported: 34017/69228 (49.1%)
Warnings: 0/69228 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 35242/69215 (50.9%)
Failed: 0/69215 (0.0%)
Not Supported: 33973/69215 (49.1%)
Warnings: 0/69215 (0.0%)

tsymalla · 2023-11-15T07:41:09Z

lgc/builder/BuilderImpl.cpp

+    // Initialization of instrToIndex and indexToInstr.
+    if (scalarizeDescriptorLoads) {
+      unsigned cnt = 0;
+      for (Instruction *I = nonUniformInst->getPrevNode(); I != nullptr && cnt < 64; I = I->getPrevNode(), ++cnt) {


Is the maximum number of steps something we should parametrize (or at least provide it as easily parametrizable value)?

tsymalla · 2023-11-15T07:44:16Z

lgc/builder/BuilderImpl.cpp

+    return;
+  }
+  assert(instrDeps.contains(currentVisitedInsn) && "The current visited instruction should have been in the map.");
+  auto it1 = instrDeps.try_emplace(newValue).first;


Can you use a structured binding so this is a bit cleaner?

I am not sure if it is needed because try_empace() returns a pair of an interator and bool. The latter is true if the newValue was inserted in the map and it is false if it was already existed.

Yes, but you can check for the boolean value directly. However, it is just personal taste.

I am not using the boolean. That's why I did not add it.

tsymalla · 2023-11-15T07:45:08Z

lgc/builder/BuilderImpl.cpp

+  auto &setOfInsns = it1->second;
+  auto it2 = instrDeps.find(currentVisitedInsn);
+  const auto &set = it2->second;
+  for (auto it3 = set.begin(indexToInstr), ite = set.end(indexToInstr); it3 != ite; ++it3) {


Can you use a range based for loop instead?

I did not do it because begin() and end() of TinyInstructionSet have an argument. Please let me know if there is a way to do it.

I see, you originally stored indexToInstr as a member reference, right? With that approach, it would have worked

How about not storing the reference to indexToInstr and instead add a separate member function that returns an iterator_range on which you can iterate?

iterator_range<IteratorT> getIteratorRange(IndexToInstrMap &indexToInstr) { return iterator_range(set.begin(indexToInstr), set.end(indexToInstr)); }

That should allow you to make use of the range based for loop and not increase the storage size of the TinyInstructionSet?

amdvlk-admin · 2023-11-20T06:17:14Z

Test summary for commit `bbe0a57`

CTS tests (Failed: 0/138443)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 35211/69228 (50.9%)
Failed: 0/69228 (0.0%)
Not Supported: 34017/69228 (49.1%)
Warnings: 0/69228 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 35242/69215 (50.9%)
Failed: 0/69215 (0.0%)
Not Supported: 33973/69215 (49.1%)
Warnings: 0/69215 (0.0%)

amdvlk-admin · 2023-11-20T22:12:16Z

Test summary for commit `8179bc4`

CTS tests (Failed: 0/138378)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 35162/69163 (50.8%)
Failed: 0/69163 (0.0%)
Not Supported: 34001/69163 (49.2%)
Warnings: 0/69163 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 35242/69215 (50.9%)
Failed: 0/69215 (0.0%)
Not Supported: 33973/69215 (49.1%)
Warnings: 0/69215 (0.0%)

amdvlk-admin · 2023-11-20T22:14:50Z

Test summary for commit `504d64b`

CTS tests (Failed: 0/138443)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 35211/69228 (50.9%)
Failed: 0/69228 (0.0%)
Not Supported: 34017/69228 (49.1%)
Warnings: 0/69228 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 35241/69215 (50.9%)
Failed: 0/69215 (0.0%)
Not Supported: 33974/69215 (49.1%)
Warnings: 0/69215 (0.0%)

piotrAMD · 2023-11-21T11:18:11Z

lgc/builder/BuilderImpl.cpp

+void TraceNonUniformIndex::insertNewValueInInstrDeps(Value *newValue, Instruction *currentVisitedInstr) {
+  if (!instrToIndex.contains(currentVisitedInstr)) {
+    // The instruction is either outside of 64 limit or in a different basic block. So, we bail-out scalarization.
+    scalarizeDescriptorLoads = false;


It seems setting TraceNonUniformIndex.scalarizeDescriptorLoads to false here means that the next time TraceNonUniformIndex::run is invoked it will not try to scalarize. That means if there are two non-uniform operands of, say, image.sample, if the first one could not be scalarized, then the second one would not be either.

This is how the existing code (before my changes) works. I am working to change it.

Fair enough. Can you extend the comment to make it clear that this will also affect scalarization of other operands?

Implementation-wise I guess it is fine to address it in a follow-up commit to keep the scope of the changes smaller.

Done in #2859

piotrAMD · 2023-11-21T11:29:33Z

lgc/builder/BuilderImpl.cpp

+
+void TraceNonUniformIndex::insertNewValueInInstrDeps(Value *newValue, Instruction *currentVisitedInstr) {
+  if (!instrToIndex.contains(currentVisitedInstr)) {
+    // The instruction is either outside of 64 limit or in a different basic block. So, we bail-out scalarization.


or in a different basic block

I know this is a part of the design of this patch, but do we really have to give up the scalarization if an instruction is in a different block?

A descriptor load in a different basic block than the image op is quite common, and with the changes here we would stop scalarizing it.

It is not part of the design of this patch. This is how the existing code (before my changes) works because there is not backend support for it.

Yes, sorry, I misspoke - this is not really related to your patch.

Done in #2877

nhaehnle

I think the code changes are good. I think Piotr's points are fair but let's address them in separate PRs that you can work on while we're working out the last remaining kinks here.

There's still a thing I noticed inline, and the LGC tests ought to be simplified. As-is, they're touching a bunch of unrelated things like reading VS inputs. Please take a look at other tests, especially in lgc/test/Transforms/{Combine,Lower}CooperativeMatrix for examples of tests that are focused on shorter instruction sequences.

lgc/include/lgc/builder/BuilderImpl.h

Dismiss

kmitropoulou · 2023-12-01T08:38:39Z

@nhaehnle : I am working on refactoring the tests.

amdvlk-admin · 2023-12-01T08:48:11Z

Test summary for commit `627712f`

CTS tests (Failed: 0/138443)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 35211/69228 (50.9%)
Failed: 0/69228 (0.0%)
Not Supported: 34017/69228 (49.1%)
Warnings: 0/69228 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 35242/69215 (50.9%)
Failed: 0/69215 (0.0%)
Not Supported: 33973/69215 (49.1%)
Warnings: 0/69215 (0.0%)

kmitropoulou · 2023-12-02T01:16:23Z

@nhaehnle : I am working on refactoring the tests.

Done :)

amdvlk-admin · 2023-12-02T01:25:30Z

Test summary for commit `195b936`

CTS tests (Failed: 0/138443)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 35211/69228 (50.9%)
Failed: 0/69228 (0.0%)
Not Supported: 34017/69228 (49.1%)
Warnings: 0/69228 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 35242/69215 (50.9%)
Failed: 0/69215 (0.0%)
Not Supported: 33973/69215 (49.1%)
Warnings: 0/69215 (0.0%)

kmitropoulou · 2023-12-02T18:19:02Z

I refactored the tests as much as I could.

kmitropoulou · 2023-12-06T06:33:08Z

ping

kmitropoulou · 2023-12-12T08:00:40Z

ping

kmitropoulou requested a review from a team as a code owner October 12, 2023 06:48

kmitropoulou requested review from jayfoad, perlfu, nhaehnle, Flakebi, dstutt and tsymalla-AMD October 12, 2023 06:50

nhaehnle reviewed Oct 12, 2023

View reviewed changes

tsymalla-AMD reviewed Oct 16, 2023

View reviewed changes

kmitropoulou force-pushed the scalarizeloads branch from 7033caf to 23cd027 Compare October 18, 2023 07:04

kmitropoulou requested a review from nhaehnle October 18, 2023 07:08

tsymalla-AMD previously requested changes Oct 18, 2023

View reviewed changes

kmitropoulou force-pushed the scalarizeloads branch from 23cd027 to d180bcf Compare October 18, 2023 23:19

kmitropoulou requested a review from tsymalla-AMD October 18, 2023 23:25

kmitropoulou requested review from tsymalla and nhaehnle November 15, 2023 06:16

tsymalla reviewed Nov 15, 2023

View reviewed changes

kmitropoulou force-pushed the scalarizeloads branch from 43a7a13 to bbe0a57 Compare November 20, 2023 05:47

kmitropoulou changed the title ~~[WIP][LLPC] Scalarize non-uniform loads inside the waterfall loop~~ [LLPC] Scalarize non-uniform loads inside the waterfall loop Nov 20, 2023

kmitropoulou requested a review from tsymalla November 20, 2023 05:57

kmitropoulou force-pushed the scalarizeloads branch 2 times, most recently from 504d64b to 8179bc4 Compare November 20, 2023 21:46

piotrAMD reviewed Nov 21, 2023

View reviewed changes

nhaehnle reviewed Nov 22, 2023

View reviewed changes

lgc/include/lgc/builder/BuilderImpl.h Outdated Show resolved Hide resolved

kmitropoulou force-pushed the scalarizeloads branch from 8179bc4 to 627712f Compare December 1, 2023 08:23

[LLPC] Scalarize non-uniform loads inside the waterfall loop

195b936

kmitropoulou force-pushed the scalarizeloads branch from 627712f to 195b936 Compare December 2, 2023 00:57

piotrAMD mentioned this pull request Dec 5, 2023

[WIP][LLPC] Always move non-uniform descriptor loads inside the waterfall loop #2859

Closed

kmitropoulou closed this Jan 11, 2024

[LLPC] Scalarize non-uniform loads inside the waterfall loop #2759

[LLPC] Scalarize non-uniform loads inside the waterfall loop #2759

Conversation

kmitropoulou commented Oct 12, 2023 • edited Loading

jayfoad commented Oct 12, 2023

kmitropoulou commented Oct 12, 2023 • edited Loading

amdvlk-admin commented Oct 12, 2023

Test summary for commit 7033caf

kmitropoulou commented Oct 12, 2023

perlfu commented Oct 12, 2023

kmitropoulou commented Oct 12, 2023

kmitropoulou commented Oct 12, 2023

amdvlk-admin commented Oct 12, 2023

Test summary for commit 7033caf

nhaehnle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nhaehnle left a comment • edited Loading

Choose a reason for hiding this comment

nhaehnle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nhaehnle commented Oct 13, 2023

perlfu commented Oct 13, 2023

kmitropoulou commented Oct 13, 2023

tsymalla-AMD Oct 16, 2023 • edited Loading

Choose a reason for hiding this comment

kmitropoulou commented Oct 18, 2023

amdvlk-admin commented Oct 18, 2023

Test summary for commit 23cd027

tsymalla-AMD left a comment

Choose a reason for hiding this comment

amdvlk-admin commented Oct 18, 2023

Test summary for commit d180bcf

amdvlk-admin commented Nov 15, 2023

Test summary for commit 43a7a13

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmitropoulou Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsymalla Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

amdvlk-admin commented Nov 20, 2023

Test summary for commit bbe0a57

amdvlk-admin commented Nov 20, 2023

Test summary for commit 8179bc4

amdvlk-admin commented Nov 20, 2023

Test summary for commit 504d64b

Choose a reason for hiding this comment

kmitropoulou Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmitropoulou Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nhaehnle left a comment

Choose a reason for hiding this comment

kmitropoulou commented Dec 1, 2023

kmitropoulou commented Oct 12, 2023 •

edited

Loading

kmitropoulou commented Oct 12, 2023 •

edited

Loading

Test summary for commit `7033caf`

Test summary for commit `7033caf`

nhaehnle left a comment •

edited

Loading

tsymalla-AMD Oct 16, 2023 •

edited

Loading

Test summary for commit `23cd027`

Test summary for commit `d180bcf`

Test summary for commit `43a7a13`

kmitropoulou Nov 20, 2023 •

edited

Loading

tsymalla Nov 22, 2023 •

edited

Loading

Test summary for commit `bbe0a57`

Test summary for commit `8179bc4`

Test summary for commit `504d64b`

kmitropoulou Nov 22, 2023 •

edited

Loading

kmitropoulou Nov 22, 2023 •

edited

Loading

Test summary for commit `627712f`

Test summary for commit `195b936`