Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch20 reg #166

Open
wants to merge 59 commits into
base: main
Choose a base branch
from
Open

Ch20 reg #166

wants to merge 59 commits into from

Conversation

cliffclick
Copy link
Collaborator

Register Allocation, with ports to x86, arm64, risc5

cliffclick and others added 30 commits February 7, 2025 10:02
RegMask.EMPTY -> null
BuildLRG for Phi
Add CmpFX86
CallEndX86 needed to tell fp return from gpr ret
Call has no output regs.
Building IFG.
Coloring.
Some nested spilling
Optimize divf/con -> mul times inverse
* Initial push

Major CodeGen cleanup; most static globals move into the CodeGen object.

* Minor progress bug with mutual recursion

* Add test from pr#148

* First cut x86 "port"

Only "return 0;", no registers, no encoding, but yah gotta start somewhere

* Update README.md

* Update README.md

* Update ListScheduler.java

* command line launcher for simple

Parser bug fref in args
Some new test cases.

* Add basic RegMask for more than 64 bits of register mask.

Add calling convention basics to X86; con+ret RegMasks.
GCM computes CFG for all users.
Drop unused MultiUse.
Pick up reg-pressure aware ListScheduler.
Re-layout CodeGen file to get code & data chunks nearer each other.
minor Ary extension

* ASM Printer

InstSel handles folding 2+ ideal ops into 1 machine op

* handle instSel for some control flow

Bool,If,CProj,Region,Phi
Handle 2-op expansions
Shuffle instSel graph walk back again.

* Basic if-block inst selects and asm prints

* Add mul-by-coinstant to shift

* left shift codegen basic

* exclude RSP from *write* mask

* 2-arg add (not immediate)

mul-by-small constant opt.  Some utilities.

* Handle first loop

Change inst sel walk again to pre-order, to set an early visit bit to stop cycles.
CFGNode copies dom/loop info.
Cmp not-immediate form.
Ret/Fun lazy updates

* inst select for new struct allocation

* Float, bitwise and arithmetic operands codegen x86-64 (#150)

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* Remove non-exsting FP+imm ops

* Handle first loop

Change inst sel walk again to pre-order, to set an early visit bit to stop cycles.
CFGNode copies dom/loop info.
Cmp not-immediate form.
Ret/Fun lazy updates

* float ops without imm values

* merge

* merge

* inst select for new struct allocation

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* Remove non-exsting FP+imm ops

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* float ops without imm values

* merge

* merge

* Rebased on ch 19

* Update tests

---------

Co-authored-by: Cliff Click <[email protected]>

* cleanup after merge

alpha-sort helper fcns
implicit test vs zero/null

* Add LEA op

asm print works on ideal nodes for all tests

* Minor cleanup lea

* merge2

* merge3

* GCM for float ops fixed, other cleanup - small extensions

* x86 addressing modes during inst select

Drop New taking inits; just follow with initializing stores.  Simplifies inst selelection which otherwise needs to undo this optimization and emit following init stores.
Basic load/store for now, op-to-mem comes later
add same becomes Shl by 1.
Drop DivF-immiedate

* call setup

* extended ch13, 14,15,16 (#153)

* merge2

* merge3

* Fix a few minor fuzzer bugs

* Force array layout

ld/st get size (but not signed/unsigned)
Add unsigned LT for later range checks.

* Fix a bunch of float issues

was deleting required inputs

* simpler invariant

defensive copy inputs for simpler invariant in complex sharing patterns
inc/dec form (just the opcode now).

* Call & CallR X86 instructions

* merge2

* Add-from-memory op

Remove FP immediate forms.

* Remove name from TFP

Never shoulda been there (but was convenient for awhile).
Moved into FunNode, found by checking the linker table with a TFP.

* remove debug prints

add check for bad call convention
fix 3 tests (bad merge?)

* cleanup call convention abi

Needs more love at some point...

* one more cleanup ABI

* New (#154)

* merge2

* merge3

* GCM for float ops fixed, other cleanup - small extensions

* call setup

* merge2

* remove debug prints

add check for bad call convention
fix 3 tests (bad merge?)

* cleanup call convention abi

Needs more love at some point...

* one more cleanup ABI

---------

Co-authored-by: Cliff Click <[email protected]>

* Add CmpMem form

Common addressing mode print
LEA can skip a base

* Add a AddFMemX86

A bunch of mem op patterns are missing, hopefully they are just cut-n-paste from the existing patterns.

* Docs

* XMM reg mask fix

Minor README updates

* Allow inverted cmp/mem

narrowing stores can bypass an AndMask
Array length loads do not need control
Some missing print info

* Support "*ptr op= val"

at least for MemAdd

* missed golden rule update

* Array len is u32

load-after-store zero/sign-extends if the store is truncating

* One more missed case

* riscv init

* added not imm form bitwise shifts and ops plus test cases

* addressing modes(high chance it'll get deleted)

* fix minor bug

* delete non existing matches

* replace RegMask.Empty with null

* riscv handle store and load, fltRisch uses risc reg now

* load float into GPR and then later in RA do hard split.

* minor changes

* turn off ch20 tests

* Update Chapter20Test.java

---------

Co-authored-by: Cliff Click <[email protected]>
RISC5 changes.
Some missing x86 ops.
Still needs some calling convention love.
Biased coloring; tracking splits around a LRG.
Separate split for self-conflicts.
postColor to remove junk splits.
Some minor riscv fixes.
If SplitEmptyMask does not apply, fall back to normal split.
Missing CmpFRISC.
Renamed riscv flags.
RetRISC handls FPR
remove extra RegMask constructurs
Handle multi-node projections.
A little smarter about 2-addr scheduling
RegAlloc must-have single registers removes from other masks instead of interfering; had the mask removal backwards in a few cases.
LRG/RegMask COW expand masks before mutating.
Clone small constants instead of spilling.
Alloc & ProjX86 get output reg masks.
Full print large reg masks with sp+offset
avoid writing to RSP
Call unlinks before reg alloc.
CallEnd tracks RPC
CallX86 requires TFP for correct ABI
Bias Color tracks 2-addr.
RPC gets a mask/stack-slot
Clean up call/fun ABI
Fix mask RET, RET_F
Fix print stack slot
Remove isMultiHead, isMultiTail
Better all-asm print
New fcn print upgrades several golden rule tests.
Move the unlink-all from the walk::err call (which IS used during IterPeeps) to GCM.
CallEnd upgrades return type based on constant call.
Fix unlink to also unlink CallEnd
Constant TFP print checks constant
Tested an untested Load
* Initial push

Major CodeGen cleanup; most static globals move into the CodeGen object.

* Minor progress bug with mutual recursion

* Add test from pr#148

* First cut x86 "port"

Only "return 0;", no registers, no encoding, but yah gotta start somewhere

* Update README.md

* Update README.md

* Update ListScheduler.java

* command line launcher for simple

Parser bug fref in args
Some new test cases.

* Add basic RegMask for more than 64 bits of register mask.

Add calling convention basics to X86; con+ret RegMasks.
GCM computes CFG for all users.
Drop unused MultiUse.
Pick up reg-pressure aware ListScheduler.
Re-layout CodeGen file to get code & data chunks nearer each other.
minor Ary extension

* ASM Printer

InstSel handles folding 2+ ideal ops into 1 machine op

* handle instSel for some control flow

Bool,If,CProj,Region,Phi
Handle 2-op expansions
Shuffle instSel graph walk back again.

* Basic if-block inst selects and asm prints

* Add mul-by-coinstant to shift

* left shift codegen basic

* exclude RSP from *write* mask

* 2-arg add (not immediate)

mul-by-small constant opt.  Some utilities.

* Handle first loop

Change inst sel walk again to pre-order, to set an early visit bit to stop cycles.
CFGNode copies dom/loop info.
Cmp not-immediate form.
Ret/Fun lazy updates

* inst select for new struct allocation

* Float, bitwise and arithmetic operands codegen x86-64 (#150)

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* Remove non-exsting FP+imm ops

* Handle first loop

Change inst sel walk again to pre-order, to set an early visit bit to stop cycles.
CFGNode copies dom/loop info.
Cmp not-immediate form.
Ret/Fun lazy updates

* float ops without imm values

* merge

* merge

* inst select for new struct allocation

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* Remove non-exsting FP+imm ops

* left shift codegen basic

* sar codegen basic

* merge

* some basic ops

* addf

* divf, subf, mulf

* rsp allowed in bitwise input, not output

* formatting

* float ops without imm values

* merge

* merge

* Rebased on ch 19

* Update tests

---------

Co-authored-by: Cliff Click <[email protected]>

* cleanup after merge

alpha-sort helper fcns
implicit test vs zero/null

* Add LEA op

asm print works on ideal nodes for all tests

* Minor cleanup lea

* merge2

* merge3

* GCM for float ops fixed, other cleanup - small extensions

* x86 addressing modes during inst select

Drop New taking inits; just follow with initializing stores.  Simplifies inst selelection which otherwise needs to undo this optimization and emit following init stores.
Basic load/store for now, op-to-mem comes later
add same becomes Shl by 1.
Drop DivF-immiedate

* call setup

* extended ch13, 14,15,16 (#153)

* merge2

* merge3

* Fix a few minor fuzzer bugs

* Force array layout

ld/st get size (but not signed/unsigned)
Add unsigned LT for later range checks.

* Fix a bunch of float issues

was deleting required inputs

* simpler invariant

defensive copy inputs for simpler invariant in complex sharing patterns
inc/dec form (just the opcode now).

* Call & CallR X86 instructions

* merge2

* Add-from-memory op

Remove FP immediate forms.

* Remove name from TFP

Never shoulda been there (but was convenient for awhile).
Moved into FunNode, found by checking the linker table with a TFP.

* remove debug prints

add check for bad call convention
fix 3 tests (bad merge?)

* cleanup call convention abi

Needs more love at some point...

* one more cleanup ABI

* New (#154)

* merge2

* merge3

* GCM for float ops fixed, other cleanup - small extensions

* call setup

* merge2

* remove debug prints

add check for bad call convention
fix 3 tests (bad merge?)

* cleanup call convention abi

Needs more love at some point...

* one more cleanup ABI

---------

Co-authored-by: Cliff Click <[email protected]>

* Add CmpMem form

Common addressing mode print
LEA can skip a base

* Add a AddFMemX86

A bunch of mem op patterns are missing, hopefully they are just cut-n-paste from the existing patterns.

* Docs

* XMM reg mask fix

Minor README updates

* Allow inverted cmp/mem

narrowing stores can bypass an AndMask
Array length loads do not need control
Some missing print info

* Support "*ptr op= val"

at least for MemAdd

* missed golden rule update

* Array len is u32

load-after-store zero/sign-extends if the store is truncating

* One more missed case

* riscv init

* added not imm form bitwise shifts and ops plus test cases

* addressing modes(high chance it'll get deleted)

* fix minor bug

* delete non existing matches

* replace RegMask.Empty with null

* riscv handle store and load, fltRisch uses risc reg now

* load float into GPR and then later in RA do hard split.

* minor changes

* turn off ch20 tests

* Update Chapter20Test.java

* add riscv port to readme

* arm init

* work in progress

* arm2 bs

* work in progress

* clearnup arm

* fixed main pr problems

* regmask extra comments

---------

Co-authored-by: Cliff Click <[email protected]>
since no range checks... test cannot fail (pass)
Error adding 'null' and '5'
Fix bug missing Find.
Fix bug setDef does unordered removal from basic blocks.
Use cloned constants for spilling.
SplitSelfConflict MUST split before uses and after defs.
Split global constants across functions
Missing find in union;
Inserting split after new inserted wrong place
Pre-remove null LRGs before color.
Bias color hunts through Phis
Shorten InstructionSelection.
Rename !unified() to leader()
AddMemX86 is 2-addr
Add CastXXX to preserve register across cast.
Bugfix scheduler.
Bugfix spiller - self-conflict order depended on hashtable iteration order, which depended on System.hashCode which would vary run to run.
Improvements to coloring
Improvements to spill quality testing
Fix bug LRG union stats
Adjust self-conflict splits to be slightly less aggressive
In every use-side split, dont split-after-split same block.
NewNode is a multi, does not define a register
Attempt to commute to keep reg-masks compatible.
Clones with kills (x86 XOR killing flags to zero) fail before the flag-def.
Bugfix failed to color last lrg.
More smarts on picking trivial LRGs during coloring
Drop isSplit, add SplitNode
Pass regalloc round to all splits for debugging
Self-conflict no split on loop backedge.
Handle split that gets made goes dead
No-reg-mask spilling handles case with many def/uses.
Add DivIX86
BrainFuck, MergeSort now allocate as testing
Finish Split bypass check
ARM: rsp is also a fixed zero, in R31.  R0 is a valid register.
CallArm prints args, passes TFP for arg selection.  Does not return a register.  All CallEndARM which DOES return a register.  Parms missing tfp for float args.  ProjARM after a New missing regs.  RetARM missing RPC.
Remove hopeful DivIX86
small constants not cloning
One More Go-Round with Better Biased Colors
Fix bug picking deepest spills to bias.
ARM: remove default 2-address.  Fix regmask overflow.  Rename some ops to be canonical.
Port does not have to be in a Simple named directory.
Call convention is just a string, no enum
Missed some tests running RISC, ARM.
Expanding public API slightly so alternative (non-Simple repo) ports can happen.
RegMask supports single-bit-set and a long bitmask.
Bugfix GCM splitting multiple global constants.
Bugfix LCM grabbing a projection vs multi
Bugfix RegAlloc when avoiding splitting a clonable
Source code in release
More timing in CodeGen
Everybody uses the same names, same code shapes
...not callee save yet
Collect callee and caller save masks (inverts of each other, except for spills).
Reorg layout to be closer to each other; remove dead mask definitions; add final keyword
Callee-save registers and masks and inserting LRGs for same and spilling and picking good spills.  Insert CalleeSaveNodes and edges to RetXXXs and RegMasks for those edges.
Iterators for regmask.
Better printing post-allocation
All regs available arm, risc.
MergeSort compiles & runs
No flags on risc5.
Branch takes 2 inputs (one can be imm12).
Float compare only sets a GPR to 0/1.
Bool has a isFloat
Common risc5 imm-form handling
Arm split op can spill flags
bugfix RegAlloc 1st real >=64bit reg num
which are broken for some time now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants