-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Implement MMap for In Memory IO #972
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible, @PThorpe92 did you have the opportunity to review this already?
} | ||
} | ||
|
||
#[cfg(not(target_os = "linux"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about this, the more I think maybe we should keep the previous implementation for non-linux systems where resizing isn't available?
It could all remain in this file, but just a couple struct fields and functions would just be feature flagged. Wouldn't be too difficult either. but I'd like to hear @penberg's opinion, perhaps @LtdJorge may have thoughts on this as well as he has dug into the IO quite a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use rustix::mm, which we already import in Unix family, btw. You can the gate the remap behind the cfg.
On this or the previous impl, I think both are fine. What drawbacks do you think the new has for platforms without remap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking about edge kind of cases, where it might not be ideal to copy everything as the default resize method. I guess it's probably fine, just wanted to hear others thoughts on it. I honestly didn't know that only linux supported this till now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a problem for certain write heavy applications that do not use disk at all and only rely on the in-memory option. However, I think for this to have a significant impact, someone would need to have a lot of data in memory. Also, the INITIAL_MEM_PAGES
flag would possibly alleviate this use case allowing the user to reduce the amount of resizes needed.
Hey @pedrocarlo for Linux we are importing rustix in io, which has mmap, munmap, mremap, etc. In case you want to use that instead of importing a new dependency. The main difference is that rustix doesn't give you a clean safe API, it exposes unsafe functions, closer to the syscalls/libc but returning Result and such. If you want help, feel free to ping me. |
Hey @LtdJorge! I agree. For the project, it would be better to use an already existing dependency instead of introducing a new one like |
Hey @pedrocarlo! I think dropping the new dependency is a pre-requisite for merging this, but otherwise looks sensible to me. |
wip: checkpoint testing for performance mass insert test fix: resizing too much incorrectly. Changed resize condition comment on mmap persistance to disk use remap for linux fix add cfg for RemapOptions cleanup comments add mass-insert-delete test bigger memory tests correct memory test amount of inserts change back to 100 inserts so that CI can run properly changed from tcl to python to test for memory adjusted amount of tests generated to speed up testing create ci flag to run in tests so that limbo is built with less pages. This makes the program resize more pages so that this behavior can be tested changed ci flag to be contained in OnceCell cleanup remove unnecessary whitespace remove ci flag in workflow and modify memory test to account for that rename env var and add some documentation remove log from file removed memmap2 dependency in favor of rustix Fix comment: new -> old Comment about tx visibility when deleting a row refactor: is_version_visible() -> RowVersion::is_visible_to() thank you clippy, actually a nice suggestion cleanup and correct remap call adding rustix to be a dependency for everything reverting to memmap2 for windows and wasi builds as rustix::mm does not support these plataforms adjust builds for wasm trying to add errno crate to compile wasm change to use crate result error wip wip wip wip wip adjusting len of map after remap
84fb931
to
e9f00ba
Compare
Hey @penberg! Unfortunately, I just noticed that |
@pedrocarlo I think it might make more sense to feature flag mmap for unix, then the others can use the original impl and we add no new dependencies |
@PThorpe92 that seems the best path. Sad that persisting to database will only be available to unix for now. I will fix it either today or tomorrow. |
This PR implements MMap to support, in the future, persistence of an in-memory database to a file, as is tracked by this issue: #859. I wanted to thank @PThorpe92 a lot for letting me try implementing this feature.
Changes/Additions
BTreeMap<usize, MemPage>
forMMapMut
INITIAL_PAGES
, that defines the initial number of pages that the MMap will be initialized with. The default is 16, that amounts to 16 * 4096 bytes = 64kb of memory allocated on startup.CI
enviroment value has been added in the testing github workflow to set theINITIAL_PAGES
env var to 5, so that resizing happens more often in testing.TODO's and Improvements
PRAGMA
statement that takes a file path or DB name to persistence the file on disk. This value would then be saved in theMemoryFile
struct. This could happen by extending theFile
trait with a method e.gfn persist(db_path:&str)
.MemoryFile
is dropped.DELETE
statements.I would appreciate some feedback on my implementation and any other point I may have overlooked.