-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nested collections of datasets (datatree) #92
Comments
For responsive front-end applications, it's probably better to pre-compute all pyramid levels before serving them. So it would be best if This is something that xpublish could support, but then we would probably need something to distinguish between API routers which have special support for The alternative approach using a custom xarray "index" has the advantage that we could still use xpublish as usual with |
A 3rd approach that may work now without much effort is to encapsulate the ds = xr.open_dataset(...)
# create the data pyramids using ndpyramid
# and store the resulting datatree in an internal attribute
# of the `pyramids` dataset accessor
ds.pyramids.build(...)
# property that returns the datatree
ds.pyramids.datatree Then, you could write a custom xpublish API router with path functions in which you can access the datatree via the |
cc @TomNicholas |
We have a group of folks taking a look at this at the IOOS Code Sprint this week. We'd love to be able to bring a dynamic CarbonPlan/maps type experience to our various regions forecast data. From our discussions so far, I think the things we need to focus on to make this happen are:
Full disclosure I've only gushed over carbonplan/map, datatree, and nbpyramid rather than having used them in anger, though others in our group have. |
If the reprojectiong/tiling step is lazy, this should be possible. We've experimented with a few ways to do this and, so far, the Xesmf method is the most promising (see
This also seems to be easier with Xesmf but, if you end up using rasterio to reproject/tile, I think you'll need to bring in some custom logic in to generate each chunk. |
I am working with @abkfenris on this, and to expand, our specific reasoning for wanting to do is that model data is non static in the time dimension AND our datasets are often not global. So precomputing is inefficient for us, but understand it can be a niche use case. So far I am working with rasterio to tile, just now starting to think about how this applies to zarr chunking |
If I understand well the problem, I think it should be possible to create a custom API endpoint where, e.g., an input bounding box (and/or an input time value or slice) is used to first index the dataset (using xarray Those operations may take time, though, probably too much for interactive visualization applications, but if the input bounding boxes are on a static grid (tiles) it may be worth caching the intermediate results (using xpublish's Not sure to which extent it is possible to "just" reuse here the logic currently implemented in xpublish that is used to serve the dataset chunks through its zarr API endpoints. |
Thanks for the input that folks had on this. While we didn't solve it during our code sprint, we did make some headway. Right now folks are traveling back from the event, but we are going to try to compile what we found and written down (rather than just the mid event muttering back and forth over Zoom). Hopefully we can have that ready to share in the next few days. |
With the new plugin system, this should now be possible without changes to Xpublish itself. It would involve writing a app router plugin to serve datatrees under a new path. The app router can then mount existing dataset routers to it's prefix, and providing them a modified To make it nicely adaptable, plugins can also define new hooks that other plugins can implement to further extend Xpublish, so a datatree router could include |
Hi there,
We want to use Datatree (a new package for working with hierarchies of xarray Datasets) together with Xpublish. A single
datatree.DataTree
can be written to a zarr dataset where subgroups typically contain anxarray.Dataset
and optional subgroups.Our specific application is looking to serve data from a multi-dimensional data pyramid (see ndpyramid for more details) that looks something like:
We could serve each subgroup independently but that is less desirable since the top level group metadata (stored in
.zarrs
and in the consolidated.zmetadata
) is needed to describe the relationship among groups.Proposed feature addition
My assumption is that to serve a dataset like the one I described above, we need to build a custom router for DataTrees. This new router, we’ll call it the
ZarrDataTreeRouter
, would be able to reuse many of the existing zarr endpoints, but would support a more nested data model.In carbonplan/maps#15, @benbovy suggested that this sort of support would make sense here so, perhaps we can simply ask for some pointers on how to architect the
ZarrDataTreeRouter
?One specific question we have is how an implementation of this should interface with #88 and #89. Both which seem to be reshaping how complex, custom routers are developed.
cc @jhamman
The text was updated successfully, but these errors were encountered: