Skip to content

Commit

Permalink
Arm basics, minor changes in riscv (#160)
Browse files Browse the repository at this point in the history
* Initial push

Major CodeGen cleanup; most static globals move into the CodeGen object.

* Minor progress bug with mutual recursion

* Add test from pr#148

* First cut x86 "port"

Only "return 0;", no registers, no encoding, but yah gotta start somewhere

* Update README.md

* Update README.md

* Update ListScheduler.java

* command line launcher for simple

Parser bug fref in args
Some new test cases.

* Add basic RegMask for more than 64 bits of register mask.

Add calling convention basics to X86; con+ret RegMasks.
GCM computes CFG for all users.
Drop unused MultiUse.
Pick up reg-pressure aware ListScheduler.
Re-layout CodeGen file to get code & data chunks nearer each other.
minor Ary extension

* ASM Printer

InstSel handles folding 2+ ideal ops into 1 machine op

* handle instSel for some control flow

Bool,If,CProj,Region,Phi
Handle 2-op expansions
Shuffle instSel graph walk back again.

* Basic if-block inst selects and asm prints

* Add mul-by-coinstant to shift

* left shift codegen basic

* exclude RSP from *write* mask

* 2-arg add (not immediate)

mul-by-small constant opt.  Some utilities.

* Handle first loop

Change inst sel walk again to pre-order, to set an early visit bit to stop cycles.
CFGNode copies dom/loop info.
Cmp not-immediate form.
Ret/Fun lazy updates

* inst select for new struct allocation

* Float, bitwise and arithmetic operands codegen x86-64 (#150)

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* Remove non-exsting FP+imm ops

* Handle first loop

Change inst sel walk again to pre-order, to set an early visit bit to stop cycles.
CFGNode copies dom/loop info.
Cmp not-immediate form.
Ret/Fun lazy updates

* float ops without imm values

* merge

* merge

* inst select for new struct allocation

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* Remove non-exsting FP+imm ops

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* float ops without imm values

* merge

* merge

* Rebased on ch 19

* Update tests

---------

Co-authored-by: Cliff Click <[email protected]>

* cleanup after merge

alpha-sort helper fcns
implicit test vs zero/null

* Add LEA op

asm print works on ideal nodes for all tests

* Minor cleanup lea

* merge2

* merge3

* GCM for float ops fixed, other cleanup - small extensions

* x86 addressing modes during inst select

Drop New taking inits; just follow with initializing stores.  Simplifies inst selelection which otherwise needs to undo this optimization and emit following init stores.
Basic load/store for now, op-to-mem comes later
add same becomes Shl by 1.
Drop DivF-immiedate

* call setup

* extended ch13, 14,15,16 (#153)

* merge2

* merge3

* Fix a few minor fuzzer bugs

* Force array layout

ld/st get size (but not signed/unsigned)
Add unsigned LT for later range checks.

* Fix a bunch of float issues

was deleting required inputs

* simpler invariant

defensive copy inputs for simpler invariant in complex sharing patterns
inc/dec form (just the opcode now).

* Call & CallR X86 instructions

* merge2

* Add-from-memory op

Remove FP immediate forms.

* Remove name from TFP

Never shoulda been there (but was convenient for awhile).
Moved into FunNode, found by checking the linker table with a TFP.

* remove debug prints

add check for bad call convention
fix 3 tests (bad merge?)

* cleanup call convention abi

Needs more love at some point...

* one more cleanup ABI

* New (#154)

* merge2

* merge3

* GCM for float ops fixed, other cleanup - small extensions

* call setup

* merge2

* remove debug prints

add check for bad call convention
fix 3 tests (bad merge?)

* cleanup call convention abi

Needs more love at some point...

* one more cleanup ABI

---------

Co-authored-by: Cliff Click <[email protected]>

* Add CmpMem form

Common addressing mode print
LEA can skip a base

* Add a AddFMemX86

A bunch of mem op patterns are missing, hopefully they are just cut-n-paste from the existing patterns.

* Docs

* XMM reg mask fix

Minor README updates

* Allow inverted cmp/mem

narrowing stores can bypass an AndMask
Array length loads do not need control
Some missing print info

* Support "*ptr op= val"

at least for MemAdd

* missed golden rule update

* Array len is u32

load-after-store zero/sign-extends if the store is truncating

* One more missed case

* riscv init

* added not imm form bitwise shifts and ops plus test cases

* addressing modes(high chance it'll get deleted)

* fix minor bug

* delete non existing matches

* replace RegMask.Empty with null

* riscv handle store and load, fltRisch uses risc reg now

* load float into GPR and then later in RA do hard split.

* minor changes

* turn off ch20 tests

* Update Chapter20Test.java

* add riscv port to readme

* arm init

* work in progress

* arm2 bs

* work in progress

* clearnup arm

* fixed main pr problems

* regmask extra comments

---------

Co-authored-by: Cliff Click <[email protected]>
  • Loading branch information
Hels15 and cliffclick authored Feb 16, 2025
1 parent bb4bc9a commit c175e76
Show file tree
Hide file tree
Showing 61 changed files with 2,267 additions and 26 deletions.
97 changes: 96 additions & 1 deletion chapter19/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,99 @@ the `Cmp/SetXX` opcode sequence. The X86 `Jxx` takes in `FLAGS` and not a
`0/1`, so the `Jxx` will skip the `SetXX` and directly use the `Cmp`. Using
the ideal `Bool` directly will require the `SetXX`. i.e. `return a==3` will
match to a `CmpX86 RAX,#3; SetEQ RAX; ret;` whereas a `if( a==3 )` will match
to a `CmpX86 RAX,#3; jeq;`
to a `CmpX86 RAX,#3; jeq;`

## RISC-V port

In the `node/cpus/riscv` directory is a `riscv.java` port to a RISC-V.
This port supports 32 64-bit GPRs, and 32 floating-point registers.
There is a pattern matcher for matching riscv ops from idealized Simple
Sea-of-Nodes. Currently, we are targeting `RVA23U64`.

### Normal instruction selection

Works the same way as **X86_64_V2**, since riscv doesn't have *complex* addressing modes, greedy
pattern matching doesn't apply to most of the cases. Ops without any special behavior simply do the lookup.

### Restricted ISA
RISC-V is a restricted ISA with simple addressing modes and overloads.
E.g `lea` is not present.


### Branching
In `x86_64_v2`, branching requires a comparison instruction to set flags,
followed by a jump instruction that uses those flags.
In contrast, RISC-V includes all necessary operands within the
branch instruction itself, eliminating the need for a separate
comparison step.

E.g
```
beq x10, x11, some_label
```

`CBranchRISC` implements this behavior.
### No R-Flags
No R-flags exist in RiSC-V. To obtain similar functionality:

We can set the value of registers based on the result of the comparison this way explicitly.
```
xor a0, a0, a1
seqz a0, a0
```
After the first `xor` - `a0` will be zero if `a0` and `a1` are equal.
The `seqz` instruction will set `a0` to 1 if `a0` is zero from the previous operation.
This matches `SetRISC` in the `nodes/cpus/riscv` port.

### Fixed instruction length
RISC-V instructions have fixed length of 32 bits(in most cases) which means that we can benefit
from displacements in the encoding side.

E.g
```
flw fa5, 0(a0)
flw fa4, 12(a0)
fadd.s fa0, fa5, fa4
```
(No load is needed prior to `flw`)

### ABI names(registers)
In `RISC-V`, general-purpose registers (`GPRs`) range from `x0` to `x31`, and floating-point registers range
from `f0` to `f31`.

To enhance readability and align with calling conventions, RISC-V defines ABI names for registers, providing more intuitive aliases
instead of raw register numbers.



| Register | Alias | Usage | Saved By |
|----------|-------|------------------------------|----------|
| x10 | a0 | Function arguments/return values | Caller |
| x11 | a1 | Function arguments/return values | Caller |
| x12 | a2 | Function arguments | Caller |
| x13 | a3 | Function arguments | Caller |
| x14 | a4 | Function arguments | Caller |
| x15 | a5 | Function arguments | Caller |
| x16 | a6 | Function arguments | Caller |
| x17 | a7 | Function arguments | Caller |

When passing in arguments to a function, the first 8 arguments from the caller are passed in registers `a0` to `a7`.

### Load floats
Currently, floating-point constants are first loaded into an `integer GPR` before being converted into a floating-point register.
This process is rewritten in the RA stage as:


````
lui a0, 262144
fmv.w.x fa0, a0
````
For non-constant values, we use a broader register mask that
supports both GPRs and FPRs. This implementation can be found in:
`nodes/cpus/riscv/LoadRISC`.






Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ short firstColor() {

boolean test( int reg ) { return ((_bits >> reg)&1) != 0; }

// checks if the 2 masks have at least 1 bit in common
boolean overlap( RegMask mask ) { return (_bits & mask._bits)!=0; }


// Defensive writable copy
public RegMaskRW copy() { return new RegMaskRW( _bits ); }

Expand All @@ -64,8 +64,10 @@ public SB toString(SB sb) {
}
}

// Mutable regmask(writable)
class RegMaskRW extends RegMask {
public RegMaskRW(long x) { super(x); }
// clears bit at position r. Returns true if the mask is still not empty.
public boolean clr(int r) { _bits &= ~(1L<<r); return _bits!=0; }
@Override RegMaskRW and( RegMask mask ) {
if( mask==null ) return this;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,19 @@
// Upcast (join) the input to a t. Used after guard test to lift an input.
// Can also be used to make a type-assertion if ctrl is null.
public class CastNode extends Node {
private final Type _t;
public Type _t;
public CastNode(Type t, Node ctrl, Node in) {
super(ctrl, in);
_t = t;
setType(compute());
}

public CastNode(CastNode c) {
super(c.in(0), c.in(1)); // Call parent constructor
this._t = c._t; // Copy the Type field
setType(compute()); // Ensure type is recomputed
}

@Override public String label() { return "("+_t.str()+")"; }

@Override
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
package com.seaofnodes.simple.node.cpus.arm;

import com.seaofnodes.simple.*;
import com.seaofnodes.simple.codegen.CodeGen;
import com.seaofnodes.simple.codegen.RegMask;
import com.seaofnodes.simple.node.*;
import com.seaofnodes.simple.type.TypeInteger;
import java.io.ByteArrayOutputStream;

public class AddARM extends MachConcreteNode implements MachNode {

AddARM( Node add) {super(add); }

// Register mask allowed on input i.
@Override public RegMask regmap(int i) {
assert i== 1 || i == 2 || i == 3;
return arm.RMASK;
}
// Register mask allowed as a result. 0 for no register.
@Override public RegMask outregmap() { return arm.RMASK; }
// Output is same register as input#1
@Override public int twoAddress() { return 0; }

// Encoding is appended into the byte array; size is returned
@Override public int encoding(ByteArrayOutputStream bytes) {
throw Utils.TODO();
}

// General form: "rd = rs1 + rs2"
@Override public void asm(CodeGen code, SB sb) {
sb.p(code.reg(this)).p(" = ").p(code.reg(in(1))).p(" + ").p(code.reg(in(2)));
}

@Override public String op() {
return "add";}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package com.seaofnodes.simple.node.cpus.arm;


import com.seaofnodes.simple.*;
import com.seaofnodes.simple.codegen.CodeGen;
import com.seaofnodes.simple.codegen.RegMask;
import com.seaofnodes.simple.node.*;
import com.seaofnodes.simple.type.TypeInteger;
import java.io.ByteArrayOutputStream;

public class AddFARM extends MachConcreteNode implements MachNode {
AddFARM(Node addf) { super(addf); }

// Register mask allowed on input i.
@Override public RegMask regmap(int i) { assert i==1 || i==2; return arm.DMASK; }
// Register mask allowed as a result. 0 for no register.
@Override public RegMask outregmap() { return arm.DMASK; }
// Output is same register as input#1
@Override public int twoAddress() { return 1; }

// Encoding is appended into the byte array; size is returned
@Override public int encoding(ByteArrayOutputStream bytes) {
throw Utils.TODO();
}

// Default on double precision for now(64 bits)
// General form: "vadd.32 rd = src1 + src2
@Override public void asm(CodeGen code, SB sb) {
sb.p(code.reg(this)).p(" = ").p(code.reg(in(1))).p(" + ").p(code.reg(in(2)));
}

@Override public String op() { return "vadd"; }
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
package com.seaofnodes.simple.node.cpus.arm;

import com.seaofnodes.simple.*;
import com.seaofnodes.simple.codegen.CodeGen;
import com.seaofnodes.simple.codegen.RegMask;
import com.seaofnodes.simple.node.*;
import com.seaofnodes.simple.type.TypeInteger;
import java.io.ByteArrayOutputStream;

public class AddIARM extends MachConcreteNode implements MachNode {
final TypeInteger _ti;

AddIARM(Node add, TypeInteger ti) {
super(add);
_inputs.pop();
_ti = ti;
}

// Register mask allowed on input i.
@Override
public RegMask regmap(int i) {
// assert i== i;
return arm.RMASK;
}

// Register mask allowed as a result. 0 for no register.
@Override
public RegMask outregmap() {
return arm.RMASK;
}

// Output is same register as input#1
@Override
public int twoAddress() {
return 0;
}

// Encoding is appended into the byte array; size is returned
@Override
public int encoding(ByteArrayOutputStream bytes) {
throw Utils.TODO();
}

// General form: "addi rd = rs1 + imm"
@Override
public void asm(CodeGen code, SB sb) {
sb.p(code.reg(this)).p(" = ").p(code.reg(in(1))).p(" + #");
_ti.print(sb);
}

@Override
public String op() {
return _ti.value() == 1 ? "inc" : (_ti.value() == -1 ? "dec" : "addi");

}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
package com.seaofnodes.simple.node.cpus.arm;

import com.seaofnodes.simple.SB;
import com.seaofnodes.simple.Utils;
import com.seaofnodes.simple.codegen.CodeGen;
import com.seaofnodes.simple.codegen.RegMask;
import com.seaofnodes.simple.node.MachConcreteNode;
import com.seaofnodes.simple.node.MachNode;
import com.seaofnodes.simple.node.Node;

import java.io.ByteArrayOutputStream;

public class AndARM extends MachConcreteNode implements MachNode {
AndARM(Node and) {
super(and);
}

// Register mask allowed on input i.
// This is the normal calling convention
@Override public RegMask regmap(int i) {
assert i==1 || i==2;
return arm.RMASK;
}

// Register mask allowed as a result. 0 for no register.
@Override public RegMask outregmap() { return arm.RMASK; }

// Output is same register as input#1
@Override public int twoAddress() { return 0; }

// Encoding is appended into the byte array; size is returned
@Override public int encoding(ByteArrayOutputStream bytes) {
throw Utils.TODO();
}

// General form
// General form: #rd = rs1 & rs2
@Override public void asm(CodeGen code, SB sb){
sb.p(code.reg(this)).p(" = ").p(code.reg(in(1))).p(" & ").p(code.reg(in(2)));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
package com.seaofnodes.simple.node.cpus.arm;


import com.seaofnodes.simple.*;
import com.seaofnodes.simple.codegen.CodeGen;
import com.seaofnodes.simple.codegen.RegMask;
import com.seaofnodes.simple.node.*;
import com.seaofnodes.simple.type.Type;
import com.seaofnodes.simple.type.TypeInteger;
import java.io.ByteArrayOutputStream;
import java.util.BitSet;
import java.lang.StringBuilder;

public class AndIARM extends MachConcreteNode implements MachNode {
final TypeInteger _ti;

AndIARM(AndNode and, TypeInteger ti) {
super(and);
_inputs.pop();
_ti = ti;
}

// Register mask allowed on input i.
// This is the normal calling convention
@Override
public RegMask regmap(int i) {
assert i == 1 || i == 2;
return arm.RMASK;
}

// Register mask allowed as a result. 0 for no register.
@Override
public RegMask outregmap() {
return arm.RMASK;
}

// Output is same register as input#1
@Override
public int twoAddress() {
return 0;
}

// Encoding is appended into the byte array; size is returned
@Override
public int encoding(ByteArrayOutputStream bytes) {
throw Utils.TODO();
}

// General form
// General form: "andi rd = rs1 & imm"
@Override
public void asm(CodeGen code, SB sb) {
sb.p(code.reg(this)).p(" = ").p(code.reg(in(1))).p(" & #");
_ti.print(sb);
}

@Override
public String op() {
return "andi";
}

}
Loading

0 comments on commit c175e76

Please sign in to comment.