-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for private remote object storage #441
Comments
I refactored to use UPath, which solves many issues I had with remote support. So I would recommend UPath over Path, str, ZarrLocation... It works with my own object storage:
I also added tests for the remote datasets and mocked remote tests. There are still some remaining issues:
I will likely be a while until I can work on this some more. |
This is still an open issue. There is an old fork at https://github.com/berombau/spatialdata main with closed PR #442 that does support remote reading using UPath, but merging it would be difficult, especially because the spatialdata code base underwent a lot of changes. The current solutions without this support:
Future work plan:
It's still very doable, but currently not a priority and easier after the merging of a lot of open PR's |
Here is a usecase where the full res image is remote and read-only, while the rest is local. |
I've readded the dependencies and moto tests already in the PR. Changing every Path to UPath in the _io module should be straightforward, as Path is a subtype of UPath. Using the right interfaces and API design is a bit more complicated, as it would be nice to also think on the upcoming Zarr v3 API's. The design should be something like: filepath in SpatialData.read / .write = str | UPath Right now the element interface is not really consistent or easily allows reading / writing each element separately. All elements focus on just zarr.Group for writing, but for reading:
The user should convert str | UPath | zarr.BaseStore | ZarrLocation | ... to zarr.Group outside of the functions or use the SpatialData interface. Since these are all private functions and the public functions only change Path -> UPath, the impact is hopefully minimal. |
Ah ok. The public interface should be added on top then maybe. So try to make all private read/write functions have a zarr.Group-only API and make all the public read/write functions use the various possibilities of getting to a group:
There are various convenience methods like zarr v3 with StorePath and make_store_path, ZarrLocation... but it's probably easier to just have our own utility function. The remote mock tests currently only tests read/writing a SpatialData object with one element of each type. I'll try to extend them so each element is tested also on it's own. |
Thanks, gonna isolate the action point in the item below for myself:
|
Right now, there is support for local .zarr stores and remote stores publically accessible via HTTP or S3.
Private remote stores are more difficult, as they need certain options or credentials that are not representable by simply a string or Path. One option is to use a zarr.storage.FSStore, which can have storage_options or any fsspec.spec.AbstractFileSystem.
Two pull requests enable this:
Testing is difficult, but this is what I used:
The text was updated successfully, but these errors were encountered: