Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llpc: GFX11 flat/custom parameter loads need to be strict WQM #2817

Merged
merged 1 commit into from
Nov 16, 2023

Conversation

perlfu
Copy link
Contributor

@perlfu perlfu commented Nov 14, 2023

These already use WQM annotation to ensure fetch lanes are active, but this does not work if there are discards or divergence as the WQM exec mask will not truly WQM (some helper invocations killed).

Hence:

  • Generate strict.wqm call for flat/custom interpolation loads
  • Add test coverage for this code path

Ideally this could be resolved by using the FI bit on the mov_dpp, but there is no backend support for this currently.

These already use WQM annotation to ensure fetch lanes are active,
but this does not work if there are discards or divergence as the
WQM exec mask will not truly WQM (some helper invocations killed).

Hence:
- Generate strict.wqm call for flat/custom interpolation loads
- Add test coverage for this code path

Ideally this could be resolved by using the FI bit on the mov_dpp,
but there is no backend support for this currently.
@perlfu perlfu requested a review from a team as a code owner November 14, 2023 09:14
@perlfu
Copy link
Contributor Author

perlfu commented Nov 14, 2023

As noted in the description this could also be solved by adding support for FI bit on the mov_dpp to the backend.
But that will take much longer and not necessarily yield any better results.

The risk with this solution is that it effectively asks for another strict WQM register to be allocated -- potentially wasting a physical VGPR.
However in practice the VGPR allocated for the lds_param_load will only be live until the mov_dpp, so this will always reuse an existing strict WQM register allocation.
This solution will only add register pressure if:

  1. the result of the mov_dpp (the parameter) lives for a long time,
  2. another lds_param_load or strict WQM could be using reuse the reserved physical register.

@amdvlk-admin
Copy link

Test summary for commit bcfbd1c

CTS tests (Failed: 0/138362)
  • Built with version 1.3.5.2
  • Ubuntu navi3x, Srdcvk
    • Passed: 35162/69147 (50.9%)
    • Failed: 0/69147 (0.0%)
    • Not Supported: 33985/69147 (49.1%)
    • Warnings: 0/69147 (0.0%)
    Ubuntu navi2x, Srdcvk
    • Passed: 35242/69215 (50.9%)
    • Failed: 0/69215 (0.0%)
    • Not Supported: 33973/69215 (49.1%)
    • Warnings: 0/69215 (0.0%)

Copy link
Member

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem with supporting FI is that it's hard to model in IR what that even means...

The assessment of register re-use depends on how instructions are scheduled when there are multiple param loads. But I can't think of a better correctness check either. Ultimately we may have to figure out how to do better register allocation for WWM/WQM registers.

@perlfu
Copy link
Contributor Author

perlfu commented Nov 16, 2023

Thanks!

Ultimately we may have to figure out how to do better register allocation for WWM/WQM registers.

Yes, I am actively working on incremental improvements on this specifically to deal with param loads.

@perlfu perlfu merged commit c78b273 into GPUOpen-Drivers:dev Nov 16, 2023
9 checks passed
@perlfu perlfu deleted the flat-param-dpp-wqm branch November 16, 2023 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants