Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow checkpointing and restoration of hash state #94

Merged
merged 1 commit into from
Jan 10, 2025
Merged

Conversation

nickbabcock
Copy link
Owner

If one is hashing a stream of data over a long period of time, it becomes conducive to be able to checkpoint the hash state to allow one to recover the state without rehashing the rest of the data.

It also allows great flexibility on how one wants to hash data.

I'm not tied to this exact API. Hard coding the number of bytes is potentially brittle, but it does remove any chance of a fallible write or read during serialization / deserialization. I don't see format changing for some time.

Closes #88

@nickbabcock nickbabcock force-pushed the checkpoint branch 6 times, most recently from 165a679 to a2912e4 Compare January 10, 2025 13:02
If one is hashing a stream of data over a long period of time, it
becomes conduscive to be able to checkpoint the hash state to allow one
to recover the state without rehashing the rest of the data.

It also allows great flexibility on how one wants to hash data.

I'm not tied to this exact API. Hard coding the number of bytes is
potentially brittle, but it does remove any chance of a fallible write
or read during serialization / deserialization. I don't see format
changing for some time.
@nickbabcock nickbabcock merged commit b7e0505 into master Jan 10, 2025
15 checks passed
@sticnarf
Copy link

Sorry I didn't give my thought about the API design because I wasn't confident about what it should be like, either.

I really appreciate you for implementing the checkpoint feature.

Hard coding the number of bytes doesn't seem a problem to me considering its stability.

In addition to the form of a const-length array, I think Vec<u8> is useful as well. When the checkpoint is serialized or sent through the network, we usually have the length of the checkpoint bytes (e.g. in protobuf bytes). In these cases, we can calculate the real length of the buffer, making the length in the checkpoint bytes redundant. This means the Vec<u8> only needs 128..=160 bytes.

Both representations may be useful. Maybe the checkpoint can be an opaque struct while its implementation is still a 168-byte array. And it can provide functions to convert it into and from [u8; 168] or Vec<u8>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for hasher state dump and recovery
2 participants