[AArch64] Correct scheduling information for flag manipulation instructions in Neoverse-V2 #122124

Rin18 · 2025-01-08T15:21:59Z

Some instructions have incorrect scheduling information when compared to the Neoverse-V2 Software optimisation Guide(link to V2 SWOG: https://developer.arm.com/documentation/109898/latest/) :

Instruction Group	AArch64 Instructions	Exec Latency	Exec Throughput	Utilised Pipelines
Flag manipulation instructions	SETF8, SETF16,RMIF, CFINV	1	1	F

For example:

rmif
cfinv
setf8 w1
setf16 w1

Running llvm-mca -mtriple=aarch64 -mcpu=neoverse-v2 -instruction-tables on the above instructions gives the following output:

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.17                  U     rmif  #0, #0, #0
 1      1     0.06                  U     cfinv
 1      1     0.17                  U     setf8 w1
 1      1     0.17                  U     setf16        w1


Resources:
[0.0] - V2UnitB
[0.1] - V2UnitB
[1.0] - V2UnitD
[1.1] - V2UnitD
[2]   - V2UnitL2
[3.0] - V2UnitL01
[3.1] - V2UnitL01
[4]   - V2UnitM0
[5]   - V2UnitM1
[6]   - V2UnitS0
[7]   - V2UnitS1
[8]   - V2UnitS2
[9]   - V2UnitS3
[10]  - V2UnitV0
[11]  - V2UnitV1
[12]  - V2UnitV2
[13]  - V2UnitV3


Resource pressure per iteration:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   
 -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -      -      -      -     

Resource pressure by instruction:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     rmif     #0, #0, #0
 -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     cfinv
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf8    w1
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf16   w1

The output shows that every instruction has latency 1, throughput 6 and uses pipeline I. This is incorrect and should be fixed in the Neoverse-V2 scheduling model to match the SWOG:

llvm-project/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td

Lines 1139 to 1140 in f37bee1

    
           // Flag manipulation instructions 
        
           def : WriteRes<WriteSys, []> { let Latency = 1; }

The text was updated successfully, but these errors were encountered:

llvmbot · 2025-01-08T15:22:19Z

@llvm/issue-subscribers-backend-aarch64

Author: Ash Dobrescu (Rin18)

Some instructions have incorrect scheduling information when compared to the Neoverse-V2 Software optimisation Guide(link to V2 SWOG: https://developer.arm.com/documentation/109898/latest/) :

Instruction Group	AArch64 Instructions	Exec Latency	Exec Throughput	Utilised Pipelines
Flag manipulation instructions	SETF8, SETF16,RMIF, CFINV	1	1	F

For example:

rmif
cfinv
setf8 w1
setf16 w1

Running llvm-mca -mtriple=aarch64 -mcpu=neoverse-v2 -instruction-tables on the above instructions gives the following output:

Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     0.17                  U     rmif  #<!-- -->0, #<!-- -->0, #<!-- -->0
 1      1     0.06                  U     cfinv
 1      1     0.17                  U     setf8 w1
 1      1     0.17                  U     setf16        w1


Resources:
[0.0] - V2UnitB
[0.1] - V2UnitB
[1.0] - V2UnitD
[1.1] - V2UnitD
[2]   - V2UnitL2
[3.0] - V2UnitL01
[3.1] - V2UnitL01
[4]   - V2UnitM0
[5]   - V2UnitM1
[6]   - V2UnitS0
[7]   - V2UnitS1
[8]   - V2UnitS2
[9]   - V2UnitS3
[10]  - V2UnitV0
[11]  - V2UnitV1
[12]  - V2UnitV2
[13]  - V2UnitV3


Resource pressure per iteration:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   
 -      -      -      -      -      -      -     0.50   0.50   0.50   0.50   0.50   0.50    -      -      -      -     

Resource pressure by instruction:
[0.0]  [0.1]  [1.0]  [1.1]  [2]    [3.0]  [3.1]  [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]   [12]   [13]   Instructions:
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     rmif     #<!-- -->0, #<!-- -->0, #<!-- -->0
 -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -     cfinv
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf8    w1
 -      -      -      -      -      -      -     0.17   0.17   0.17   0.17   0.17   0.17    -      -      -      -     setf16   w1

The output shows that every instruction has latency 1, throughput 6 and uses pipeline I. This is incorrect and should be fixed in the Neoverse-V2 scheduling model to match the SWOG:

llvm-project/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td

Lines 1139 to 1140 in f37bee1

    
           // Flag manipulation instructions 
        
           def : WriteRes<WriteSys, []> { let Latency = 1; }

Rin18 added the backend:AArch64 label Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Correct scheduling information for flag manipulation instructions in Neoverse-V2 #122124

[AArch64] Correct scheduling information for flag manipulation instructions in Neoverse-V2 #122124

Rin18 commented Jan 8, 2025

llvmbot commented Jan 8, 2025

[AArch64] Correct scheduling information for flag manipulation instructions in Neoverse-V2 #122124

[AArch64] Correct scheduling information for flag manipulation instructions in Neoverse-V2 #122124

Comments

Rin18 commented Jan 8, 2025

llvmbot commented Jan 8, 2025