Skip to content

Commit

Permalink
Progress reg alloc
Browse files Browse the repository at this point in the history
Handle multi-node projections.
A little smarter about 2-addr scheduling
RegAlloc must-have single registers removes from other masks instead of interfering; had the mask removal backwards in a few cases.
LRG/RegMask COW expand masks before mutating.
Clone small constants instead of spilling.
Alloc & ProjX86 get output reg masks.
Full print large reg masks with sp+offset
  • Loading branch information
cliffclick committed Feb 13, 2025
1 parent b1756ef commit 9951333
Show file tree
Hide file tree
Showing 18 changed files with 156 additions and 75 deletions.
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,32 +29,38 @@ the backend starts with levering Java: the Evaluator (first appears in Chapter

The following is a rough plan, subject to change.

Each chapter will be self-sufficient and complete; in the sense that each chapter will fully implement
a subset of the Simple language, and include everything that was created in the previous chapter.
Each chapter will also include a detailed commentary on relevant aspects of the
Sea Of Nodes intermediate representation.
Each chapter will be self-sufficient and complete; in the sense that each
chapter will fully implement a subset of the Simple language, and include
everything that was created in the previous chapter. Each chapter will also
include a detailed commentary on relevant aspects of the Sea Of Nodes
intermediate representation.

The Simple language will be styled after a subset of C or Java
The Simple language is styled after a subset of C or Java.

* [Chapter 1](chapter01/README.md): Script that returns an integer literal, i.e., an empty function that takes no arguments and returns a single integer value. The `return` statement.
* [Chapter 2](chapter02/README.md): Simple binary arithmetic such as addition, subtraction, multiplication, division
with constants. Peephole optimization / simple constant folding.
* [Chapter 3](chapter03/README.md): Local variables, and assignment statements. Read on RHS, SSA, more peephole optimization if local is a
constant.
* [Chapter 4](chapter04/README.md): A non-constant external variable input named `arg`. Binary and Comparison operators involving constants and `arg`. Non-zero values will be truthy. Peephole optimizations involving algebraic simplifications.
* [Chapter 4](chapter04/README.md): A non-constant external variable input
named `arg`. Binary and Comparison operators involving constants and `arg`.
Non-zero values will be truthy. Peephole optimizations involving algebraic
simplifications.
* [Chapter 5](chapter05/README.md): `if` statement. CFG construction.
* [Chapter 6](chapter06/README.md): Peephole optimization around dead control flow.
* [Chapter 7](chapter07/README.md): `while` statement. Looping construct - eager phi approach.
* [Chapter 8](chapter08/README.md): Looping construct continued, lazy phi creation, `break` and `continue` statements.
* [Chapter 7](chapter07/README.md): `while` statement; looping constructs - eager phi approach.
* [Chapter 8](chapter08/README.md): Looping constructs continued, lazy phi creation, `break` and `continue` statements.
* [Chapter 9](chapter09/README.md): Global Value Numbering. Iterative peepholes to fixpoint. Worklists.
* [Chapter 10](chapter10/README.md): User defined Struct types. Memory effects: general memory edges in SSA. Equivalence class aliasing. Null pointer analysis. Peephole optimization around load-after-store/store-after-store.
* [Chapter 10](chapter10/README.md): User defined Struct types. Memory effects:
general memory edges in SSA. Equivalence class aliasing. Null pointer
analysis. Peephole optimization around load-after-store/store-after-store.
* [Chapter 11](chapter11/README.md): Global Code Motion - Scheduling.
* [Chapter 12](chapter12/README.md): Float type.
* [Chapter 13](chapter13/README.md): Nested references in Structs.
* [Chapter 14](chapter14/README.md): Narrow primitive types (e.g. bytes)
* [Chapter 15](chapter15/README.md): One dimensional static length array type, with array loads and stores.
* [Chapter 16](chapter16/README.md): Constructors
* [Chapter 17](chapter17/README.md): Mutability & Syntax Sugar: `var`, `val`, `x+=y`, `for(init;test;next)body`
* [Chapter 17](chapter17/README.md): Mutability & Syntax Sugar: `var`, `val`, `x+=y`, `for(init; test; next) body`
* [Chapter 18](chapter18/README.md): Functions and calls.
* [Chapter 19](chapter19/README.md): Instruction selection and portable compilation
* [Chapter 20](chapter20/README.md): Graph Coloring Register Allocation
35 changes: 24 additions & 11 deletions chapter20/src/main/java/com/seaofnodes/simple/codegen/BuildLRG.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package com.seaofnodes.simple.codegen;

import com.seaofnodes.simple.Utils;
import com.seaofnodes.simple.node.*;

abstract public class BuildLRG {
Expand All @@ -16,16 +17,8 @@ public static boolean run(int round, RegAlloc alloc) {
for( Node bb : alloc._code._cfg )
for( Node n : bb.outs() ) {
if( n instanceof MachNode mach ) {
RegMask def_mask = mach.outregmap();
if( def_mask!=null ) {
LRG lrg = mach.twoAddress() == 0
? alloc.newLRG(n) // Define a new LRG for N
: alloc.lrg2(n,mach.twoAddress()); // Use the matching 2-adr input
// Record mask and mach
// def_mask.size()->1 : single register mask
if( !lrg.machDef(mach,def_mask.size1()).and(def_mask) )
alloc.fail(lrg); // Empty register mask, must split
}
// Define live range
defLRG(alloc,n);

// Now, look in the opposite direction. How are incoming
// LRGs affected by this node: For all uses, make live lrgs
Expand Down Expand Up @@ -56,11 +49,31 @@ public static boolean run(int round, RegAlloc alloc) {
if( lrg._mask.isEmpty() )
alloc.fail(lrg);
}

// MultiNodes have projections which set/kill registers
if( n instanceof MultiNode )
for( Node proj : n.outs() )
if( proj instanceof MachNode )
defLRG(alloc,proj);

}

// Collect live ranges
alloc.unify();

return alloc.success();
}
}

private static void defLRG( RegAlloc alloc, Node n ) {
MachNode mach = (MachNode)n;
RegMask def_mask = mach.outregmap();
if( def_mask == null ) return;
LRG lrg = mach.twoAddress() == 0
? alloc.newLRG(n) // Define a new LRG for N
: alloc.lrg2(n,mach.twoAddress()); // Use the matching 2-adr input
// Record mask and mach
if( !lrg.machDef(mach,def_mask.size1()).and(def_mask) )
alloc.fail(lrg); // Empty register mask, must split
}

}
21 changes: 11 additions & 10 deletions chapter20/src/main/java/com/seaofnodes/simple/codegen/IFG.java
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ private static void do_node(RegAlloc alloc, Node n) {
// Then tlrg and lrg interfere.
// If lrg *must* get its register, make tlrg skip this register.
if( mustDef ) {
if( tlrg.and(mustMask) )
if( !tlrg.clr(mustMask.firstColor()) )
alloc.fail(tlrg);
} else addIFG(lrg,tlrg); // Add interference
}
Expand All @@ -168,19 +168,20 @@ private static void do_node(RegAlloc alloc, Node n) {

// Look for a must-use single register conflicting with some other must-def.
if( n instanceof MachNode m ) {
// Record splits for later
//if( m.isSplit() ) throw Utils.TODO();
if( m.regmap(i).size1() ) { // Must-use single register
RegMask ni_mask = m.regmap(i);
if( ni_mask.size1() ) { // Must-use single register
// Search all current live
for( LRG tlrg : TMP.keySet() ) {
assert !tlrg.unified();
Node live = TMP.get(tlrg);
if( live != def && live instanceof MachNode lmach ) {
// Look at live value and see if it must-def same register
if( lmach.outregmap().size1() && lmach.outregmap().overlap(m.regmap(i)) )
// Then direct reg-reg conflict between use here (at n.in(i)) and def (of tlrg) there
// alloc.failed( tlrg );
throw Utils.TODO();
if( live != def && live instanceof MachNode lmach && lmach.outregmap().overlap(ni_mask) ) {
// Look at live value and see if it must-def same register.
if( lmach.outregmap().size1() ||
// Deny the register, since it absolutely must be used here
!tlrg.clr(ni_mask.firstColor()) )
// Then direct reg-reg conflict between use here (at n.in(i)) and def (of tlrg) there.
// Fail the older live range, it must move its register.
alloc.fail( tlrg );
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,13 @@ boolean and( RegMask mask ) {
_mask = mask.and(_mask);
return !_mask.isEmpty();
}
// Remove this singular register
// True if still has registers
boolean clr( int reg ) {
if( _mask.clr(reg) ) return true;
_mask = _mask.copy(); // Need a mutable copy
return _mask.clr(reg);
}

@Override public String toString() { return toString(new SB()).toString(); }

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import com.seaofnodes.simple.Ary;
import com.seaofnodes.simple.IterPeeps.WorkList;
import com.seaofnodes.simple.Utils;
import com.seaofnodes.simple.node.*;
import com.seaofnodes.simple.type.*;
import java.util.*;
Expand All @@ -11,6 +12,14 @@ public abstract class ListScheduler {
// eXtra per-node stuff for scheduling.
private static final IdentityHashMap<Node,XSched> XS = new IdentityHashMap<>();
private static class XSched {

Node _n; // Node this extra is for
int _bcnt; // Not-ready not-scheduled inputs
int _rcnt; // Ready not-scheduled inputs
boolean _ruse; // Node IS a "remote use", uses a other-block value
boolean _rdef; // Node IS a "remote def" of a value used in other block
boolean _single; // Defines a single register, likely to conflict

static final Ary<XSched> FREE = new Ary<>(XSched.class);
static void alloc(CFGNode bb, Node n) {
XSched x = FREE.pop();
Expand Down Expand Up @@ -42,7 +51,14 @@ private XSched init(CFGNode bb, Node n) {
}

private void computeSingleRDef(CFGNode bb, Node n) {
RegMaskRW rmask = n instanceof MachNode mach && mach.outregmap() != null ? mach.outregmap().copy() : null;
// Also see if this is 2-input, and that input is single-def
if( n instanceof MachNode mach && mach.twoAddress() != 0 ) {
XSched xs = XS.get(n.in(mach.twoAddress()));
if( xs != null )
_single = xs._single;
}
// Visit all outputs
RegMaskRW rmask = !_single && n instanceof MachNode mach && mach.outregmap() != null ? mach.outregmap().copy() : null;
for( Node use : n._outputs ) {
// Remote use, so this is a remote def
if( use!=null && use.cfg0()!=bb ) _rdef = true;
Expand All @@ -53,7 +69,7 @@ private void computeSingleRDef(CFGNode bb, Node n) {
rmask.and(mach.regmap(i));
}
// Defines in a single register
_single = rmask!=null && rmask.size1();
_single |= rmask!=null && rmask.size1();
}

// If _bcnt==0, declare ready; move user bcnts into rcnts.
Expand Down Expand Up @@ -83,13 +99,6 @@ boolean decIsReady() {
_rcnt--;
return isReady();
}

Node _n; // Node this extra is for
int _bcnt; // Not-ready not-scheduled inputs
int _rcnt; // Ready not-scheduled inputs
boolean _ruse; // Node IS a "remote use", uses a other-block value
boolean _rdef; // Node IS a "remote def" of a value used in other block
boolean _single; // Defines a single register, likely to conflict
}


Expand Down Expand Up @@ -181,17 +190,19 @@ static int score( Node n ) {
// Register pressure local scheduling: avoid overlapping specialized
// registers usages.

// Subtract 10 if this op forces a live range to exist that cannot be
// resolved immediately; i.e., live count goes up. Scale to 20 if
// multi-def.
// Subtract 10 (delay) if this op forces a live range to exist that
// cannot be resolved immediately; i.e., live count goes up. Scale to
// 20 if multi-def.

// Subtract 100 if this op forces a single-def live range to exist
// which might conflict on the same register with other live ranges.
// Defines a single register based on def & uses, and the output is not
// ready. Scale to 200 for multi-def.
CNT[1]=CNT[2]=0;
XSched xn = XSched.get(n);
if( xn._rdef ) score = 200; // If defining a remote value, just generically stall alot
// If defining a remote value, just generically stall alot. Value is
// used in a later block, can we delay until the end of this block?
if( xn._rdef ) score = 200;
if( n instanceof MultiNode ) {
for( Node use : n._outputs )
singleUseNotReady( use, xn._single );
Expand All @@ -211,9 +222,7 @@ static int score( Node n ) {
score += 10 * Math.min( CNT[1], 2 );
score += 100 * Math.min( CNT[2], 2 );

//// Nothing special
//if( n instanceof Pe ) return 500; // RISC want to go after PE
//if( n instanceof Risc ) return 400; // RISC want to go after PE
// Nothing special
assert 10 <= score && score <= 990;
return score;
}
Expand All @@ -224,7 +233,7 @@ static int score( Node n ) {
private static void singleUseNotReady( Node n, boolean single ) {
if( n.nOuts() != 1 ) return;
XSched xu = XSched.get(n.out(0));
if( xu !=null && xu._bcnt==0 )
if( xu !=null && xu._bcnt==0 && xu._rcnt <= 1 )
return;
CNT[single ? 2 : 1]++;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ boolean splitEmptyMask( LRG lrg ) {
boolean splitSelfConflict( LRG lrg ) {
for( Node def : lrg._selfConflicts.keySet() ) {
_code._mach.split().insertAfter(def);
if( def instanceof PhiNode phi )
if( def instanceof PhiNode phi && !(def instanceof ParmNode) )
_code._mach.split().insertBefore(phi,1);
}
return true;
Expand Down Expand Up @@ -245,12 +245,18 @@ boolean splitByLoop( LRG lrg ) {
for( Node n : _ns ) {
if( n instanceof MachNode mach && mach.isSplit() ) continue; // Ignoring splits; since spilling need to split in a deeper loop
if( lrg(n)==lrg && // This is a LRG def
(min==max || n.cfg0().loopDepth() <= min) )
// Split after def in min loop nest
_code._mach.split().insertAfter(n);
// At loop boundary, or splitting in inner loop
(min==max || n.cfg0().loopDepth() <= min) ) {
// Clonable constants will be cloned at uses, so delete the def
if( n instanceof MachNode mach && mach.isClone() )
n.remove();
else
// Split after def in min loop nest
_code._mach.split().insertAfter(n);
}

// PhiNodes check all CFG inputs
if( n instanceof PhiNode phi ) {
if( n instanceof PhiNode phi && !(n instanceof ParmNode)) {
for( int i=1; i<n.nIns(); i++ )
// No split in front of a split
if( !(n.in(i) instanceof MachNode mach && mach.isSplit()) &&
Expand All @@ -259,16 +265,18 @@ boolean splitByLoop( LRG lrg ) {
// and not around the backedge of a loop (bad place to force a split, hard to remove)
!(phi.region() instanceof LoopNode && i==2) )
// Split before phi-use in prior block
_code._mach.split().insertBefore(phi, i);
insertBefore(phi,i);

} else {
// Others check uses
for( int i=1; i<n.nIns(); i++ ) {
if( lrg(n.in(i))==lrg && // This is a LRG use
// Not a split already
!(n.in(i) instanceof MachNode mach && mach.isSplit()) &&
// splitting in inner loop or at loop border
(min==max || n.cfg0().loopDepth() <= min) )
// Split before in this block
_code._mach.split().insertBefore(n, i);
insertBefore(n,i);
}
}
}
Expand Down Expand Up @@ -306,6 +314,13 @@ void findAllLRG( LRG lrg ) {
}
}

void insertBefore(Node n, int i) {
Node split = (n.in(i) instanceof MachNode mach && mach.isClone()
? mach.copy()
: _code._mach.split());
split.insertBefore(n, i);
}



// -----------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ RegMask and( RegMask mask ) {
throw Utils.TODO();
}

// Fails if bit is set, because this is immutable
public boolean clr( int reg ) {
return ((_bits >> reg)&1)==0;
}


short firstColor() {
return (short)Long.numberOfTrailingZeros(_bits);
}
Expand Down Expand Up @@ -60,5 +66,5 @@ public SB toString(SB sb) {

class RegMaskRW extends RegMask {
public RegMaskRW(long x) { super(x); }
public void clr(int r) { _bits &= ~(1L<<r); }
public boolean clr(int r) { _bits &= ~(1L<<r); return _bits!=0; }
}
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,12 @@ default void postSelect() { }
// this is their Multi's updated input.
default int twoAddress( ) { return 0; }

// Instructions cheaper to recreate than to spill, such as loading small constants
default boolean isClone() { return false; }
// Known to be a split node
default boolean isSplit() { return false; }
// Instructions cheaper to recreate than to spill, such as loading small constants
default boolean isClone() { return false; }
// Make a clone of a cheap instruction
default Node copy() { return null; }

// Encoding is appended into the byte array; size is returned
int encoding(ByteArrayOutputStream bytes);
Expand Down
Loading

0 comments on commit 9951333

Please sign in to comment.