Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Correct scheduling information for flag manipulation instructions in Neoverse-V2 #122124

Open
Rin18 opened this issue Jan 8, 2025 · 1 comment

Comments

@Rin18
Copy link

Rin18 commented Jan 8, 2025

Some instructions have incorrect scheduling information when compared to the Neoverse-V2 Software optimisation Guide(link to V2 SWOG: https://developer.arm.com/documentation/109898/latest/) :

Instruction Group AArch64 Instructions Exec Latency Exec Throughput Utilised Pipelines
Flag manipulation instructions SETF8, SETF16,RMIF, CFINV 1 1 F

For example:

rmif
cfinv
setf8 w1
setf16 w1

Running llvm-mca -mtriple=aarch64 -mcpu=neoverse-v2 -instruction-tables on the above instructions gives the following output:

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.17                  U     rmif  #0, #0, #0
 1      1     0.06                  U     cfinv
 1      1     0.17                  U     setf8 w1
 1      1     0.17                  U     setf16        w1


Resources:
[0.0] - V2UnitB
[0.1] - V2UnitB
[1.0] - V2UnitD
[1.1] - V2UnitD
[2]   - V2UnitL2
[3.0] - V2UnitL01
[3.1] - V2UnitL01
[4]   - V2UnitM0
[5]   - V2UnitM1
[6]   - V2UnitS0
[7]   - V2UnitS1
[8]   - V2UnitS2
[9]   - V2UnitS3
[10]  - V2UnitV0
[11]  - V2UnitV1
[12]  - V2UnitV2
[13]  - V2UnitV3


Resource pressure per iteration:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   
 -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -      -      -      -     

Resource pressure by instruction:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     rmif     #0, #0, #0
 -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     cfinv
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf8    w1
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf16   w1

The output shows that every instruction has latency 1, throughput 6 and uses pipeline I. This is incorrect and should be fixed in the Neoverse-V2 scheduling model to match the SWOG:

// Flag manipulation instructions
def : WriteRes<WriteSys, []> { let Latency = 1; }

@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Ash Dobrescu (Rin18)

Some instructions have incorrect scheduling information when compared to the Neoverse-V2 Software optimisation Guide(link to V2 SWOG: https://developer.arm.com/documentation/109898/latest/) :
Instruction Group AArch64 Instructions Exec Latency Exec Throughput Utilised Pipelines
Flag manipulation instructions SETF8, SETF16,RMIF, CFINV 1 1 F

For example:

rmif
cfinv
setf8 w1
setf16 w1

Running llvm-mca -mtriple=aarch64 -mcpu=neoverse-v2 -instruction-tables on the above instructions gives the following output:

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.17                  U     rmif  #<!-- -->0, #<!-- -->0, #<!-- -->0
 1      1     0.06                  U     cfinv
 1      1     0.17                  U     setf8 w1
 1      1     0.17                  U     setf16        w1


Resources:
[0.0] - V2UnitB
[0.1] - V2UnitB
[1.0] - V2UnitD
[1.1] - V2UnitD
[2]   - V2UnitL2
[3.0] - V2UnitL01
[3.1] - V2UnitL01
[4]   - V2UnitM0
[5]   - V2UnitM1
[6]   - V2UnitS0
[7]   - V2UnitS1
[8]   - V2UnitS2
[9]   - V2UnitS3
[10]  - V2UnitV0
[11]  - V2UnitV1
[12]  - V2UnitV2
[13]  - V2UnitV3


Resource pressure per iteration:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   
 -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -      -      -      -     

Resource pressure by instruction:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     rmif     #<!-- -->0, #<!-- -->0, #<!-- -->0
 -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     cfinv
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf8    w1
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf16   w1

The output shows that every instruction has latency 1, throughput 6 and uses pipeline I. This is incorrect and should be fixed in the Neoverse-V2 scheduling model to match the SWOG:

// Flag manipulation instructions
def : WriteRes<WriteSys, []> { let Latency = 1; }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants