-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] F2 KV store #922
Open
kkanellis
wants to merge
179
commits into
microsoft:main
Choose a base branch
from
kkanellis:cc-lmhc-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[C++] F2 KV store #922
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* [C++] Add force option to record user Delete request * If force is set to true, then a tombstone will be appended to the log, irrespective of whether the hash index contains the record itself. * [C++] Support for defining a Guid for a session externally * [C++] Replace checkpoint inline callback definition ... with predefined types. * [C++] Rmw can be configured to not create record ... if one does not exists inside the log. * [C++] Implement method for conditionally copying to log tail * [C++] Use minimum number of mutable pages if value is 0 * [C++] Initial implementation of FASTER hot-cold design * Currently supports reads, upserts, deletes and RMWs. * [C++] Fix compilation error * [C++] Initial tests for hot-cold design * [C++] Lookup-based hybrid log compaction (microsoft#487) * [C++] Log scan can now return record address, along with record * [C++] Add implementation of Address + operator * [C++] Add method for finding if a record exists in the hybrid log * Note that if a tombstone record exists, it will return true. * [C++] Initial implementation of a better log compaction algorithm * It leverages the hash index to identify live records, and copy them to the tail of the log. * Ensures that if a user performs a concurrent upsert, compaction won't overwrite their operation. * Avoids expensive scan of the entire log -- only the relevent log section is read. * [C++] Remove unnecessary template typenames * [C++] Fix several issues in compaction code * [C++] Several bugfixes in log scan iterator * Now correctly switches to read from next page if the record didn't entirely fit in the previous one. * Fix bug where record address was wrong * Fix bug where in-disk page wasn't read due to >0 offset in passed address. * [C++] Minor bugfixes in lookup-based compaction * [C++] Add tests for lookup-based compaction algorithm * [C++] Fix bug in Addres + operator * [C++] Add a medium-sized value type for tests * [C++] Compaction context/entry now stores record address * [C++] Bugfixes in compaction code * [C++] Update log compaction tests * Add tests with where other threads perform concurrent insertions & deletions * Test actual log truncation correctness (using `ShiftBeginAddress` method). * [C++] Fix test compilation error * [C++] Refactor log compaction code * [C++] Minor changes * [C++] Better status handling in RecordExists method * [C++] Log compaction with multiple threads * [C++] Unoptimized concurrent page-granularity compaction * [C++] Fix bug in tests * [C++] Concurrent compaction \w non-blocking waiting for threads * [C++] Introduce page- and record-granularity log iterators * Page-granularity iterator is used with the new lookup-based compaction method, while the (older) record-granularity one is used by the (old) compaction algorithm. * The page log iterator can still be optimized further (i.e. avoid locking, prefetching, etc). * [C++] Avoid key/value copying on compaction contexts * [C++] Improvements on the log compaction method * + bugfix on sessions start/stop when using multiple threads. * [C++] Make obsolete write key calls on Read/Exists contexts * [C++] Add variable-length key tests for log compaction * [C++] Add delete ops to varlen keys tests * [C++] Concurrent lock-free log iterator with prefetching * [C++] Add variable-length value tests for lookup compaction * [C++] Bugfixes in tests Co-authored-by: Badrish Chandramouli <[email protected]> Co-authored-by: Kirk Olynyk <[email protected]> * Better design FASTER's of copy to tail method * Implement hot-cold & cold-cold compaction * Include RMW in compact lookup tests * Bugfixes in core FASTER * Bugfixes & preliminary work for retrying RMW ops in hot-cold * Minor changes in compact lookup tests * Rework hot-cold implementation to support retries * Bugfixes in FASTER RMW & log compaction * Update hot-cold design tests * Minor change in RMW * Minor cleanup & better handling of complete pending requests * Proper handling of deleted records in hot-cold `Read` method returns `NOT_FOUND` either if no record was found, or if a tombstone record was found. While there is no point separating the two cases in the single log case, in the hot-cold design it is important to know which is the case. The most useful use-case for that is for the hot-cold `Read` method: if a tombstone was found in hot log, there is no need to search the cold log. In other words, `Read` will go through the cold log only if no record (normal or tombstone) was found in the hot log. Thus, FASTER `Read` method can now be configured to return a different status (i.e. `ABORTED`) if it finds a tombstone, insted of `NOT_FOUND`. We support this, using an additional optional flag `abort_if_tombstone` in the Read function prototype. By default this is set to `false` -- only hot-cold design this flag, when a Read is issued on the hot log. * Update hot-cold tests * Bugfix to guarrantee progress in both stores Co-authored-by: Badrish Chandramouli <[email protected]> Co-authored-by: Kirk Olynyk <[email protected]>
During log compaction (\w lookup), live records are being copied to the tail of the log. Once the all live records have been copied, the part of the log that was just compacted is truncated. However, there is a slim chance that during the log truncation a pending Read operation will return NOT_FOUND, even thought a record for this key exists. Specifically this can happen if a live record is being copied to the tail of the log, but the Read operation has already checked the log tail, and has issued one (or more) I/O requests to read disk-resident records. In this case, if we truncate the log before this Read operation reaches the live record, the Read will return NOT_FOUND. In order to handle this undesired behavior we keep track of the number of truncations after performance log compaction (global). Each Read operation keeps a local copy of this number in its context. If the Read operation has reached the end (begin) of the log and has not found a live record, we check if there a log truncation occured due to a log compaction. If this is the case, this Read op will retry, in order check the newly introduced log part. This last part is now supported using the `min_start_address' argument that can be defined in the Read context. In this case, the Read operation will not go throught the entire log.
This fixes some spurious error messages, including the following: `Assertion `idx < size_' failed.'
Fixes a bug that was due to improper calculation of how many bytes to read from disk.
For write-intensive workloads, it is possible that even during compaction, the maximum hlog budget can been reached. For example, this can occur when the rate of ingesting requests to the hot log is higher then the rate of compacting rates to the cold log. To fix, we now allow user threads to participate to the compaction process, only if we reach the (hard) hlog size limit. Note that background compaction threads are anyways performing only compaction work. Once the compaction completes, user threads can resume serving user requests, as before.
We would like to keep the typedefs, even if unused, for clarity purposes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR F2, an evolution of FASTER key-value store. More info can be found here.