-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation #52
Documentation #52
Conversation
A dedicated section in the IPLD HashMap spec that draws out the specific variations here. That doc has the correct algorithm so is useful as a general reference (minus some doubts I have about the canonical form child node collapse algorithm I've already documented in here but they should be treated as a bug if I'm correct). It now includes a section at the bottom that talks about implicit parameters and the block layout. |
a1053ce
to
e134aaa
Compare
Added a doc.go which copies some intro text from the IPLD HashMap spec and made the README slightly more useful. There's some TODOs in there, one particularly concerning one for This change couples with ipld/specs#282 to add specifics of how this impl and the Filecoin specific use of it varies from the IPLD HashMap, but that's just in block layout and implicit parameters. I'll also go into filecoin-project/specs and add a little bit more meat to the HAMT section instead of just pointing to the IPLD HashMap spec as it does now. This PR is good to go now I think. @warpfork & @anorth reviews appreciated (anyone else from the Filecoin team you want to pull in for reviews?). |
Could you please add @Stebalien for review too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful, thank you so much.
// will set them. In the case of loadChild() we're not expecting a mutation so | ||
// a save is likely going to mean we incur unnecessary serialization when | ||
// we've simply inspected the tree. Copy() will only set a cached form if | ||
// it already exists on the source. It's unclear exactly what Flush() is good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @whyrusleeping this, and a similar but (as I understand it) worse issue in the AMT may contribute a significant amount of unnecessary write load when we change only a single entry in a HAMT but re-encode and write back unchanged nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, this is a really good catch. We should look into addressing this. Do we know if filecoin usecases actually trigger this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'm far from certain but it seems to me that in chain/validation/state.go StateWrapper holds on to a state tree HAMT and a flush is done whenever the vm is flushed, which seems to be triggered in a few places. So I think that this will mean that as the tree is read, it's building up these cache
entries just from loading the nodes, then these nodes are all serialized and an attempted save happens but presumably most (or all) won't actually save because their CIDs would already be in storage because new nodes are saved when they are created and mutated anyway. It'd be interesting to see what happens if you noop this Flush()
and run all the tests.
// If we perform this depth-first, then it's possible to see the collapse | ||
// cascade upward such that we end up with some parent node with a bucket with | ||
// only bucketSize elements. The canonical form of the HAMT requires that | ||
// any node that could be collapsed into a parent bucket is collapsed and. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @whyrusleeping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't really consider myself enough of an owner to be a deciding approver for this, but fwiw, would really like to merge this. As I also poke around this code, I keep being tempted to write documentation for some of these methods and values, and then keep going "D'oh! Rod already did it in PR 52!". 😅
@Stebalien or @whyrusleeping please merge |
The dual aim here is to properly document the public API and document the internal workings so future maintainers and alternative language implementers can follow how it all works. I'll also do a parallel addition to https://github.com/ipld/specs/blob/master/data-structures/hashmap.md to document the exact ways that this implementation differs from that spec. It's very close but the serialization format is different. That document can already serve as a more general overview of the algorithm here so we'll make sure they work together.
I need to work through the other files and I'm finding some areas of concern that need a deeper look and some additional areas that probably need more test coverage. Already I'm pretty confident that the collapse algorithm in
cleanChild()
isn't doing what it should, leading to non-canonical forms of HAMTs during deletion operations. I've noted that in here./cc @warpfork