Proposal: Configurable Tree Width #31

hannahhoward · 2019-08-29T20:52:59Z

What

A proposal for completing at least part of #14, specifically the ability to configure tree depth. I've been tasked with implementing this to support go-filecoin, so I want to outline my approach early before I go and do it.

Why

Go-filecoin would like to experiment with the changing the width of the HAMT used for the overall StateTree in go-filecoin as well as individual HAMTs in certain types of actors, to examine effects on performance in that context.

How

My main goals are:

Keep changes non-breaking to the public API if possible
Don't affect performance for the default case a lot

Components:

Add field to node specifying width of each tree level in bits:

type Node struct {
	Bitfield *big.Int   `refmt:"bf"`
	Pointers []*Pointer `refmt:"p"`

	// for fetching and storing children
	store *CborIpldStore
        treeWidthInBits uint8 // default value = 8, probably shouldn't go higher, not written to CBOR itself for now
}

Add var args options pattern (https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis) to NewNode & LoadNode, create first option to specify the tree width in bits (could allow to specify as a number but that means writing code to enforce powers of two which introduces potential errors and ... meh not worth it to me :)

Modify the internal hash function to take a bit width size (default 8) -- something like this (have no idea if this even compiles, I have yet to test it for real...

func splitByBitSize(h1 uint64, h2 uint64, bitWidth uint8) []byte {
    // algorithm to split into arbitrary bitsize chunks here
     // could be optimized by pre-computing common values
}

var hash = func(k string,  bitSize uint8) []byte {
	h := murmur3.New128()
	h.Write([]byte(k))
         h1, h2 := h.Sum128()
         return splitByBitSize(h1, h2, bitSize)
}

I believe based on the way the rest of the code works, these would be the main changes neccesary.

@whyrusleeping @acruikshank @anorth

rvagg · 2019-08-29T23:28:19Z

It would be really nice if this could be pulled somewhat toward the IPLD HashMap spec. It'd be ideal to not have two implementations but I fear that we're not heading in the same direction.

"bitWidth" being the term used there for controlling branching factor.

I wrote a library for bit slicing for this purpose, https://github.com/rvagg/bit-sequence, it's got JS and Go implementations in the same repo and they are tested against the same data so are consistent. I'm using this in my JS HashMap implementation already and have been using it in my experimental Go + ipld-prime version too.

BitSequence(bytes []byte, bitStart uint32, bitLength uint32) uint32 is the API it currently has but would love to hear feedback if any of this doesn't feel idiomatic or doesn't fit the use here.

hannahhoward mentioned this issue Aug 30, 2019

Configurable HAMT tree bitwidth #32

Merged

anorth closed this as completed Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Configurable Tree Width #31

Proposal: Configurable Tree Width #31

hannahhoward commented Aug 29, 2019

rvagg commented Aug 29, 2019

Proposal: Configurable Tree Width #31

Proposal: Configurable Tree Width #31

Comments

hannahhoward commented Aug 29, 2019

What

Why

How

rvagg commented Aug 29, 2019