-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch all IO to fsspec #49
Conversation
Conflicts fixed |
Should be ready now. Replaced most mmaps with simple seek on streams (there is quite a bit of jumping around, mostly on the shuffling part of tokenization). I tried to optimize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Regarding the mmap
issue, I think we should be able to address it by using the LocalFileSystem
's underlying file descriptor in the mmap
functions.
Co-authored-by: Mario Šaško <[email protected]>
I will add a local "working_directory" for this part then I suppose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more comments
WIP, a better solution needs to be found for tokenization and other memmap dependent blocks for sources like s3.
Paths can be passed as simple strings, or as tuples (str, fs options dictionary) or as a DataFolder object directly.
addresses #41