Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Implement MMap for In Memory IO #972

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

pedrocarlo
Copy link
Contributor

@pedrocarlo pedrocarlo commented Feb 10, 2025

This PR implements MMap to support, in the future, persistence of an in-memory database to a file, as is tracked by this issue: #859. I wanted to thank @PThorpe92 a lot for letting me try implementing this feature.

Changes/Additions

  • Swap BTreeMap<usize, MemPage> for MMapMut
  • For Linux, utilize remap to attempt to resize the map in place, else move the mmap.
  • Added a test suite to test for insertions, selects with ":memory" option selected.
  • Added an environment variable, INITIAL_PAGES, that defines the initial number of pages that the MMap will be initialized with. The default is 16, that amounts to 16 * 4096 bytes = 64kb of memory allocated on startup.
  • The resizing factor at the moment is set at 2. I picked this value as it is somewhat common place in hashmaps and other datastructures.
  • Also, a CI enviroment value has been added in the testing github workflow to set the INITIAL_PAGES env var to 5, so that resizing happens more often in testing.

TODO's and Improvements

  • As in mentioned in Support persisting in-memory database to disk #859, to support writing the in-memory database to disk Limbo would need to have a modification in its syntax and/or opcode.
  • One suggestion for that would be to implement a PRAGMA statement that takes a file path or DB name to persistence the file on disk. This value would then be saved in the MemoryFile struct. This could happen by extending the File trait with a method e.g fn persist(db_path:&str) .
  • Then, to improve the user experience, we could flush the contents of the MMap in WAL checkpoints and when the MemoryFile is dropped.
  • TODO: implement tests that incorporate DELETE statements.

I would appreciate some feedback on my implementation and any other point I may have overlooked.

Copy link
Collaborator

@penberg penberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible, @PThorpe92 did you have the opportunity to review this already?

.github/workflows/rust.yml Outdated Show resolved Hide resolved
core/storage/database.rs Outdated Show resolved Hide resolved
core/storage/pager.rs Outdated Show resolved Hide resolved
core/storage/pager.rs Outdated Show resolved Hide resolved
core/io/memory.rs Outdated Show resolved Hide resolved
}
}

#[cfg(not(target_os = "linux"))]
Copy link
Contributor

@PThorpe92 PThorpe92 Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, the more I think maybe we should keep the previous implementation for non-linux systems where resizing isn't available?

It could all remain in this file, but just a couple struct fields and functions would just be feature flagged. Wouldn't be too difficult either. but I'd like to hear @penberg's opinion, perhaps @LtdJorge may have thoughts on this as well as he has dug into the IO quite a bit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use rustix::mm, which we already import in Unix family, btw. You can the gate the remap behind the cfg.

On this or the previous impl, I think both are fine. What drawbacks do you think the new has for platforms without remap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking about edge kind of cases, where it might not be ideal to copy everything as the default resize method. I guess it's probably fine, just wanted to hear others thoughts on it. I honestly didn't know that only linux supported this till now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a problem for certain write heavy applications that do not use disk at all and only rely on the in-memory option. However, I think for this to have a significant impact, someone would need to have a lot of data in memory. Also, the INITIAL_MEM_PAGES flag would possibly alleviate this use case allowing the user to reduce the amount of resizes needed.

@LtdJorge
Copy link
Contributor

Hey @pedrocarlo for Linux we are importing rustix in io, which has mmap, munmap, mremap, etc. In case you want to use that instead of importing a new dependency.

The main difference is that rustix doesn't give you a clean safe API, it exposes unsafe functions, closer to the syscalls/libc but returning Result and such. If you want help, feel free to ping me.

@pedrocarlo
Copy link
Contributor Author

pedrocarlo commented Feb 12, 2025

Hey @LtdJorge! I agree. For the project, it would be better to use an already existing dependency instead of introducing a new one like memmap2. I started with memmap2 for the reasons you mentioned, but now that we have some functioning decent piece of code to test against, it would be a great idea to use rustix to do the same functionality. I think it would make sense to first merge this and then try to eliminate the dependency in favor of rustix. Also, would love to talk with you about this! I'm still inexperienced with some low-level systems and I really want to learn more about. Please tell me where I should ping you (discord, github, x, etc...).

@penberg
Copy link
Collaborator

penberg commented Feb 15, 2025

Hey @pedrocarlo! I think dropping the new dependency is a pre-requisite for merging this, but otherwise looks sensible to me.

wip: checkpoint testing for performance

mass insert test

fix: resizing too much incorrectly. Changed resize condition

comment on mmap persistance to disk

use remap for linux

fix add cfg for RemapOptions

cleanup comments

add mass-insert-delete test

bigger memory tests

correct memory test amount of inserts

change back to 100 inserts so that CI can run properly

changed from tcl to python to test for memory

adjusted amount of tests generated to speed up testing

create ci flag to run in tests so that limbo is built with less pages. This makes the program resize more pages so that this behavior can be tested

changed ci flag to be contained in OnceCell

cleanup

remove unnecessary whitespace

remove ci flag in workflow and modify memory test to account for that

rename env var and add some documentation

remove log from file

removed memmap2 dependency in favor of rustix

Fix comment: new -> old

Comment about tx visibility when deleting a row

refactor: is_version_visible() -> RowVersion::is_visible_to()

thank you clippy, actually a nice suggestion

cleanup and correct remap call

adding rustix to be a dependency for everything

reverting to memmap2 for windows and wasi builds as rustix::mm does not support these plataforms

adjust builds for wasm

trying to add errno crate to compile wasm

change to use crate result error

wip

wip

wip

wip

wip

adjusting len of map after remap
@pedrocarlo
Copy link
Contributor Author

Hey @penberg! Unfortunately, I just noticed that rustix::mm module does not support windows or wasi. In my view, it would make sense to use rustix for unix and memmap2 for the rest. What do you think?

@PThorpe92
Copy link
Contributor

@pedrocarlo I think it might make more sense to feature flag mmap for unix, then the others can use the original impl and we add no new dependencies

@pedrocarlo
Copy link
Contributor Author

@PThorpe92 that seems the best path. Sad that persisting to database will only be available to unix for now. I will fix it either today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants