Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Configurable Tree Width #31

Closed
hannahhoward opened this issue Aug 29, 2019 · 1 comment
Closed

Proposal: Configurable Tree Width #31

hannahhoward opened this issue Aug 29, 2019 · 1 comment

Comments

@hannahhoward
Copy link
Contributor

What

A proposal for completing at least part of #14, specifically the ability to configure tree depth. I've been tasked with implementing this to support go-filecoin, so I want to outline my approach early before I go and do it.

Why

Go-filecoin would like to experiment with the changing the width of the HAMT used for the overall StateTree in go-filecoin as well as individual HAMTs in certain types of actors, to examine effects on performance in that context.

How

My main goals are:

  • Keep changes non-breaking to the public API if possible
  • Don't affect performance for the default case a lot

Components:

Add field to node specifying width of each tree level in bits:

type Node struct {
	Bitfield *big.Int   `refmt:"bf"`
	Pointers []*Pointer `refmt:"p"`

	// for fetching and storing children
	store *CborIpldStore
        treeWidthInBits uint8 // default value = 8, probably shouldn't go higher, not written to CBOR itself for now
}

Add var args options pattern (https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis) to NewNode & LoadNode, create first option to specify the tree width in bits (could allow to specify as a number but that means writing code to enforce powers of two which introduces potential errors and ... meh not worth it to me :)

Modify the internal hash function to take a bit width size (default 8) -- something like this (have no idea if this even compiles, I have yet to test it for real...

func splitByBitSize(h1 uint64, h2 uint64, bitWidth uint8) []byte {
    // algorithm to split into arbitrary bitsize chunks here
     // could be optimized by pre-computing common values
}

var hash = func(k string,  bitSize uint8) []byte {
	h := murmur3.New128()
	h.Write([]byte(k))
         h1, h2 := h.Sum128()
         return splitByBitSize(h1, h2, bitSize)
}

I believe based on the way the rest of the code works, these would be the main changes neccesary.

@whyrusleeping @acruikshank @anorth

@rvagg
Copy link
Member

rvagg commented Aug 29, 2019

It would be really nice if this could be pulled somewhat toward the IPLD HashMap spec. It'd be ideal to not have two implementations but I fear that we're not heading in the same direction.

"bitWidth" being the term used there for controlling branching factor.

I wrote a library for bit slicing for this purpose, https://github.com/rvagg/bit-sequence, it's got JS and Go implementations in the same repo and they are tested against the same data so are consistent. I'm using this in my JS HashMap implementation already and have been using it in my experimental Go + ipld-prime version too.

BitSequence(bytes []byte, bitStart uint32, bitLength uint32) uint32 is the API it currently has but would love to hear feedback if any of this doesn't feel idiomatic or doesn't fit the use here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants