llpc: GFX11 flat/custom parameter loads need to be strict WQM #2817

perlfu · 2023-11-14T09:14:42Z

These already use WQM annotation to ensure fetch lanes are active, but this does not work if there are discards or divergence as the WQM exec mask will not truly WQM (some helper invocations killed).

Hence:

Generate strict.wqm call for flat/custom interpolation loads
Add test coverage for this code path

Ideally this could be resolved by using the FI bit on the mov_dpp, but there is no backend support for this currently.

These already use WQM annotation to ensure fetch lanes are active, but this does not work if there are discards or divergence as the WQM exec mask will not truly WQM (some helper invocations killed). Hence: - Generate strict.wqm call for flat/custom interpolation loads - Add test coverage for this code path Ideally this could be resolved by using the FI bit on the mov_dpp, but there is no backend support for this currently.

perlfu · 2023-11-14T09:25:11Z

As noted in the description this could also be solved by adding support for FI bit on the mov_dpp to the backend.
But that will take much longer and not necessarily yield any better results.

The risk with this solution is that it effectively asks for another strict WQM register to be allocated -- potentially wasting a physical VGPR.
However in practice the VGPR allocated for the lds_param_load will only be live until the mov_dpp, so this will always reuse an existing strict WQM register allocation.
This solution will only add register pressure if:

the result of the mov_dpp (the parameter) lives for a long time,
another lds_param_load or strict WQM could be using reuse the reserved physical register.

amdvlk-admin · 2023-11-14T10:17:05Z

Test summary for commit `bcfbd1c`

CTS tests (Failed: 0/138362)

Built with version 1.3.5.2

Ubuntu navi3x, Srdcvk

Passed: 35162/69147 (50.9%)
Failed: 0/69147 (0.0%)
Not Supported: 33985/69147 (49.1%)
Warnings: 0/69147 (0.0%)

Ubuntu navi2x, Srdcvk

Passed: 35242/69215 (50.9%)
Failed: 0/69215 (0.0%)
Not Supported: 33973/69215 (49.1%)
Warnings: 0/69215 (0.0%)

nhaehnle

Yeah, the problem with supporting FI is that it's hard to model in IR what that even means...

The assessment of register re-use depends on how instructions are scheduled when there are multiple param loads. But I can't think of a better correctness check either. Ultimately we may have to figure out how to do better register allocation for WWM/WQM registers.

perlfu · 2023-11-16T02:45:12Z

Thanks!

Ultimately we may have to figure out how to do better register allocation for WWM/WQM registers.

Yes, I am actively working on incremental improvements on this specifically to deal with param loads.

perlfu requested a review from a team as a code owner November 14, 2023 09:14

nhaehnle approved these changes Nov 15, 2023

View reviewed changes

perlfu merged commit c78b273 into GPUOpen-Drivers:dev Nov 16, 2023
9 checks passed

perlfu deleted the flat-param-dpp-wqm branch November 16, 2023 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llpc: GFX11 flat/custom parameter loads need to be strict WQM #2817

llpc: GFX11 flat/custom parameter loads need to be strict WQM #2817

perlfu commented Nov 14, 2023

perlfu commented Nov 14, 2023

amdvlk-admin commented Nov 14, 2023

nhaehnle left a comment

perlfu commented Nov 16, 2023

llpc: GFX11 flat/custom parameter loads need to be strict WQM #2817

llpc: GFX11 flat/custom parameter loads need to be strict WQM #2817

Conversation

perlfu commented Nov 14, 2023

perlfu commented Nov 14, 2023

amdvlk-admin commented Nov 14, 2023

Test summary for commit bcfbd1c

nhaehnle left a comment

Choose a reason for hiding this comment

perlfu commented Nov 16, 2023

Test summary for commit `bcfbd1c`