-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ch20 reg #166
Open
cliffclick
wants to merge
59
commits into
main
Choose a base branch
from
ch20_reg
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Ch20 reg #166
+9,451
−679
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RegMask.EMPTY -> null
BuildLRG for Phi Add CmpFX86
CallEndX86 needed to tell fp return from gpr ret Call has no output regs.
Building IFG. Coloring. Some nested spilling Optimize divf/con -> mul times inverse
* Initial push Major CodeGen cleanup; most static globals move into the CodeGen object. * Minor progress bug with mutual recursion * Add test from pr#148 * First cut x86 "port" Only "return 0;", no registers, no encoding, but yah gotta start somewhere * Update README.md * Update README.md * Update ListScheduler.java * command line launcher for simple Parser bug fref in args Some new test cases. * Add basic RegMask for more than 64 bits of register mask. Add calling convention basics to X86; con+ret RegMasks. GCM computes CFG for all users. Drop unused MultiUse. Pick up reg-pressure aware ListScheduler. Re-layout CodeGen file to get code & data chunks nearer each other. minor Ary extension * ASM Printer InstSel handles folding 2+ ideal ops into 1 machine op * handle instSel for some control flow Bool,If,CProj,Region,Phi Handle 2-op expansions Shuffle instSel graph walk back again. * Basic if-block inst selects and asm prints * Add mul-by-coinstant to shift * left shift codegen basic * exclude RSP from *write* mask * 2-arg add (not immediate) mul-by-small constant opt. Some utilities. * Handle first loop Change inst sel walk again to pre-order, to set an early visit bit to stop cycles. CFGNode copies dom/loop info. Cmp not-immediate form. Ret/Fun lazy updates * inst select for new struct allocation * Float, bitwise and arithmetic operands codegen x86-64 (#150) * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * Remove non-exsting FP+imm ops * Handle first loop Change inst sel walk again to pre-order, to set an early visit bit to stop cycles. CFGNode copies dom/loop info. Cmp not-immediate form. Ret/Fun lazy updates * float ops without imm values * merge * merge * inst select for new struct allocation * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * Remove non-exsting FP+imm ops * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * float ops without imm values * merge * merge * Rebased on ch 19 * Update tests --------- Co-authored-by: Cliff Click <[email protected]> * cleanup after merge alpha-sort helper fcns implicit test vs zero/null * Add LEA op asm print works on ideal nodes for all tests * Minor cleanup lea * merge2 * merge3 * GCM for float ops fixed, other cleanup - small extensions * x86 addressing modes during inst select Drop New taking inits; just follow with initializing stores. Simplifies inst selelection which otherwise needs to undo this optimization and emit following init stores. Basic load/store for now, op-to-mem comes later add same becomes Shl by 1. Drop DivF-immiedate * call setup * extended ch13, 14,15,16 (#153) * merge2 * merge3 * Fix a few minor fuzzer bugs * Force array layout ld/st get size (but not signed/unsigned) Add unsigned LT for later range checks. * Fix a bunch of float issues was deleting required inputs * simpler invariant defensive copy inputs for simpler invariant in complex sharing patterns inc/dec form (just the opcode now). * Call & CallR X86 instructions * merge2 * Add-from-memory op Remove FP immediate forms. * Remove name from TFP Never shoulda been there (but was convenient for awhile). Moved into FunNode, found by checking the linker table with a TFP. * remove debug prints add check for bad call convention fix 3 tests (bad merge?) * cleanup call convention abi Needs more love at some point... * one more cleanup ABI * New (#154) * merge2 * merge3 * GCM for float ops fixed, other cleanup - small extensions * call setup * merge2 * remove debug prints add check for bad call convention fix 3 tests (bad merge?) * cleanup call convention abi Needs more love at some point... * one more cleanup ABI --------- Co-authored-by: Cliff Click <[email protected]> * Add CmpMem form Common addressing mode print LEA can skip a base * Add a AddFMemX86 A bunch of mem op patterns are missing, hopefully they are just cut-n-paste from the existing patterns. * Docs * XMM reg mask fix Minor README updates * Allow inverted cmp/mem narrowing stores can bypass an AndMask Array length loads do not need control Some missing print info * Support "*ptr op= val" at least for MemAdd * missed golden rule update * Array len is u32 load-after-store zero/sign-extends if the store is truncating * One more missed case * riscv init * added not imm form bitwise shifts and ops plus test cases * addressing modes(high chance it'll get deleted) * fix minor bug * delete non existing matches * replace RegMask.Empty with null * riscv handle store and load, fltRisch uses risc reg now * load float into GPR and then later in RA do hard split. * minor changes * turn off ch20 tests * Update Chapter20Test.java --------- Co-authored-by: Cliff Click <[email protected]>
RISC5 changes. Some missing x86 ops. Still needs some calling convention love.
Biased coloring; tracking splits around a LRG. Separate split for self-conflicts. postColor to remove junk splits. Some minor riscv fixes.
If SplitEmptyMask does not apply, fall back to normal split. Missing CmpFRISC. Renamed riscv flags. RetRISC handls FPR remove extra RegMask constructurs
Handle multi-node projections. A little smarter about 2-addr scheduling RegAlloc must-have single registers removes from other masks instead of interfering; had the mask removal backwards in a few cases. LRG/RegMask COW expand masks before mutating. Clone small constants instead of spilling. Alloc & ProjX86 get output reg masks. Full print large reg masks with sp+offset
avoid writing to RSP
Call unlinks before reg alloc. CallEnd tracks RPC CallX86 requires TFP for correct ABI Bias Color tracks 2-addr. RPC gets a mask/stack-slot Clean up call/fun ABI Fix mask RET, RET_F Fix print stack slot Remove isMultiHead, isMultiTail Better all-asm print
New fcn print upgrades several golden rule tests. Move the unlink-all from the walk::err call (which IS used during IterPeeps) to GCM. CallEnd upgrades return type based on constant call. Fix unlink to also unlink CallEnd Constant TFP print checks constant Tested an untested Load
* Initial push Major CodeGen cleanup; most static globals move into the CodeGen object. * Minor progress bug with mutual recursion * Add test from pr#148 * First cut x86 "port" Only "return 0;", no registers, no encoding, but yah gotta start somewhere * Update README.md * Update README.md * Update ListScheduler.java * command line launcher for simple Parser bug fref in args Some new test cases. * Add basic RegMask for more than 64 bits of register mask. Add calling convention basics to X86; con+ret RegMasks. GCM computes CFG for all users. Drop unused MultiUse. Pick up reg-pressure aware ListScheduler. Re-layout CodeGen file to get code & data chunks nearer each other. minor Ary extension * ASM Printer InstSel handles folding 2+ ideal ops into 1 machine op * handle instSel for some control flow Bool,If,CProj,Region,Phi Handle 2-op expansions Shuffle instSel graph walk back again. * Basic if-block inst selects and asm prints * Add mul-by-coinstant to shift * left shift codegen basic * exclude RSP from *write* mask * 2-arg add (not immediate) mul-by-small constant opt. Some utilities. * Handle first loop Change inst sel walk again to pre-order, to set an early visit bit to stop cycles. CFGNode copies dom/loop info. Cmp not-immediate form. Ret/Fun lazy updates * inst select for new struct allocation * Float, bitwise and arithmetic operands codegen x86-64 (#150) * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * Remove non-exsting FP+imm ops * Handle first loop Change inst sel walk again to pre-order, to set an early visit bit to stop cycles. CFGNode copies dom/loop info. Cmp not-immediate form. Ret/Fun lazy updates * float ops without imm values * merge * merge * inst select for new struct allocation * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * Remove non-exsting FP+imm ops * left shift codegen basic * sar codegen basic * merge * some basic ops * addf * divf, subf, mulf * rsp allowed in bitwise input, not output * formatting * float ops without imm values * merge * merge * Rebased on ch 19 * Update tests --------- Co-authored-by: Cliff Click <[email protected]> * cleanup after merge alpha-sort helper fcns implicit test vs zero/null * Add LEA op asm print works on ideal nodes for all tests * Minor cleanup lea * merge2 * merge3 * GCM for float ops fixed, other cleanup - small extensions * x86 addressing modes during inst select Drop New taking inits; just follow with initializing stores. Simplifies inst selelection which otherwise needs to undo this optimization and emit following init stores. Basic load/store for now, op-to-mem comes later add same becomes Shl by 1. Drop DivF-immiedate * call setup * extended ch13, 14,15,16 (#153) * merge2 * merge3 * Fix a few minor fuzzer bugs * Force array layout ld/st get size (but not signed/unsigned) Add unsigned LT for later range checks. * Fix a bunch of float issues was deleting required inputs * simpler invariant defensive copy inputs for simpler invariant in complex sharing patterns inc/dec form (just the opcode now). * Call & CallR X86 instructions * merge2 * Add-from-memory op Remove FP immediate forms. * Remove name from TFP Never shoulda been there (but was convenient for awhile). Moved into FunNode, found by checking the linker table with a TFP. * remove debug prints add check for bad call convention fix 3 tests (bad merge?) * cleanup call convention abi Needs more love at some point... * one more cleanup ABI * New (#154) * merge2 * merge3 * GCM for float ops fixed, other cleanup - small extensions * call setup * merge2 * remove debug prints add check for bad call convention fix 3 tests (bad merge?) * cleanup call convention abi Needs more love at some point... * one more cleanup ABI --------- Co-authored-by: Cliff Click <[email protected]> * Add CmpMem form Common addressing mode print LEA can skip a base * Add a AddFMemX86 A bunch of mem op patterns are missing, hopefully they are just cut-n-paste from the existing patterns. * Docs * XMM reg mask fix Minor README updates * Allow inverted cmp/mem narrowing stores can bypass an AndMask Array length loads do not need control Some missing print info * Support "*ptr op= val" at least for MemAdd * missed golden rule update * Array len is u32 load-after-store zero/sign-extends if the store is truncating * One more missed case * riscv init * added not imm form bitwise shifts and ops plus test cases * addressing modes(high chance it'll get deleted) * fix minor bug * delete non existing matches * replace RegMask.Empty with null * riscv handle store and load, fltRisch uses risc reg now * load float into GPR and then later in RA do hard split. * minor changes * turn off ch20 tests * Update Chapter20Test.java * add riscv port to readme * arm init * work in progress * arm2 bs * work in progress * clearnup arm * fixed main pr problems * regmask extra comments --------- Co-authored-by: Cliff Click <[email protected]>
since no range checks... test cannot fail (pass)
Fix bug missing Find. Fix bug setDef does unordered removal from basic blocks. Use cloned constants for spilling.
SplitSelfConflict MUST split before uses and after defs. Split global constants across functions Missing find in union; Inserting split after new inserted wrong place Pre-remove null LRGs before color. Bias color hunts through Phis Shorten InstructionSelection. Rename !unified() to leader() AddMemX86 is 2-addr Add CastXXX to preserve register across cast.
Bugfix scheduler. Bugfix spiller - self-conflict order depended on hashtable iteration order, which depended on System.hashCode which would vary run to run. Improvements to coloring Improvements to spill quality testing
Fix bug LRG union stats Adjust self-conflict splits to be slightly less aggressive In every use-side split, dont split-after-split same block. NewNode is a multi, does not define a register
Attempt to commute to keep reg-masks compatible. Clones with kills (x86 XOR killing flags to zero) fail before the flag-def. Bugfix failed to color last lrg. More smarts on picking trivial LRGs during coloring Drop isSplit, add SplitNode Pass regalloc round to all splits for debugging Self-conflict no split on loop backedge. Handle split that gets made goes dead No-reg-mask spilling handles case with many def/uses. Add DivIX86 BrainFuck, MergeSort now allocate as testing
Finish Split bypass check ARM: rsp is also a fixed zero, in R31. R0 is a valid register. CallArm prints args, passes TFP for arg selection. Does not return a register. All CallEndARM which DOES return a register. Parms missing tfp for float args. ProjARM after a New missing regs. RetARM missing RPC. Remove hopeful DivIX86
small constants not cloning
One More Go-Round with Better Biased Colors Fix bug picking deepest spills to bias. ARM: remove default 2-address. Fix regmask overflow. Rename some ops to be canonical.
Port does not have to be in a Simple named directory. Call convention is just a string, no enum Missed some tests running RISC, ARM.
Expanding public API slightly so alternative (non-Simple repo) ports can happen. RegMask supports single-bit-set and a long bitmask. Bugfix GCM splitting multiple global constants. Bugfix LCM grabbing a projection vs multi Bugfix RegAlloc when avoiding splitting a clonable Source code in release More timing in CodeGen
Everybody uses the same names, same code shapes
...not callee save yet
Collect callee and caller save masks (inverts of each other, except for spills). Reorg layout to be closer to each other; remove dead mask definitions; add final keyword
Callee-save registers and masks and inserting LRGs for same and spilling and picking good spills. Insert CalleeSaveNodes and edges to RetXXXs and RegMasks for those edges. Iterators for regmask. Better printing post-allocation
All regs available arm, risc. MergeSort compiles & runs
No flags on risc5. Branch takes 2 inputs (one can be imm12). Float compare only sets a GPR to 0/1. Bool has a isFloat Common risc5 imm-form handling
Arm split op can spill flags bugfix RegAlloc 1st real >=64bit reg num
which are broken for some time now
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Register Allocation, with ports to x86, arm64, risc5