Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of extract_points vs rasterio sample #81

Open
martinfleis opened this issue Aug 29, 2024 · 2 comments
Open

Performance of extract_points vs rasterio sample #81

martinfleis opened this issue Aug 29, 2024 · 2 comments

Comments

@martinfleis
Copy link
Member

One of the questions in the recent Earthmover's webinar on xvec was about the performance of extract_points compared to rasterio's sample method. I have never tested this before so wanted to give it a go and for a large lazy-loaded raster (digital terrain model), our extract_points is waaaay slower. See https://notebooksharing.space/view/4459f651d27b2f214f8590c30aba0782f7515d4c16b00dcd3283492f19f8e694#displayOptions=

The DTM is from https://geoportalpraha.cz/en/data-and-services/97d2c9c11aa9478cb21b469b8a4f820e in case you'd like to test the same but any raster should do the trick I assume.

Under the hood, extract_points is simply passing the coordinates to .sel with method='nearest', which should be doing exactly the same as rasterio's sample.

xvec/xvec/accessor.py

Lines 1261 to 1263 in 66b541b

subset = self._obj.sel(
{x_coords: x_, y_coords: y_}, method="nearest", tolerance=tolerance
)

This is not optimal.

We could possibly use sample via rioxarray if the raster is loaded via xarray as it is available through dtm_da.rio._manager.acquire().sample(list(zip(x, y))) but that is relying on a private API of rasterio. I'll open an issue there if there's an appetite to expose sample on the rio accessor.

Outside of relying on rasterio, is there a way of speeding it up using some xarray magic?

@martinfleis
Copy link
Member Author

Further observations:

This massive difference happens only when we need to load practically whole raster into memory, as is the case in the notebook sampling points across the whole extent. If I sample points only from a small spatial subset (a 2x2km square in the corner), xvec is actually faster than rasterio, which does not seem to care about the spatial distribution of points.

Also, if the whole array is loaded in memory, we are faster.

It seems that no matter what, rasterio is able to load only the pixels needed.

We should at least document this behaviour and point to sample if the use case requires heavy sampling over the whole area.

@scottyhq
Copy link

scottyhq commented Jan 6, 2025

Outside of relying on rasterio, is there a way of speeding it up using some xarray magic?

I made some comments over in the rioxarray issue explaining where the slowness is coming from, but currently it's at least the same order of magnitude to simply loop over the points :) I could create a pull request to modify extract_points if it seems like a good idea:

# Wall time: 399 ms compared to 6s for me running the example notebook above
pts_da = np.zeros_like(x)
for i,p in enumerate(points):
    pts_da[i] = dtm_da.sel(x=p.x, y=p.y, method="nearest").squeeze().to_numpy()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants