-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache Dataset.xindexes #10074
base: main
Are you sure you want to change the base?
Cache Dataset.xindexes #10074
Conversation
Funny enough you made things "worse" than #10066 in terms of raw performance. Not sure about the other axes you are looking at (didn't look at the change request).
While not exactly the same dataset I was using, here is the I verified that it gives similar results. Now my benchmarks were done a Threadripper 2950X (worse decision I ever made to buy it). Single core performance is pretty wimpy. I don't doubt that you get 2x improvement on your laptop. I hope this helps |
Could reproduce similar results but I see, I only cached |
great!
Please only take my benchmark as a "single use case" but a real one from some1 half smart trying to read through all the xarray docs. the data is from a high speed video, where we include some limited metadata on a per frame basis. We often have calling functions "slice" the data that we want lower level functions to analyze. It may be that the loss of performance here is acceptable. that is for you all to decide |
Looks good now with the last commit:
|
Yes that's why I think it is worth caching the |
edit: I just can't read large number... |
MacBook M4 |
Oh i mean "5.9k" instead of "59k".... that makes much more sense. Your machine is running at "59k" vs "19k" on my Threadripper. |
I was a bit confused and was wondering if your comment was ironic :-). Yeah I recently upgraded my machine and feel the difference. |
This PR is ready for review and is an improved solution over #10066. Hopefully I didn't miss any place where the |
Forgive me for dropping in without that much context and with something that could be interpreted with negativity, but: if we want to add caching which requires xarray to do invalidation — i.e. requires remembering to cal |
I agree with you @max-sixty and I think this PR needs careful review. About prioritizing correctness over performance I agree in general, but in this case I find @hmaarrfk's benchmark and comment #10074 (comment) relevant, where performance is more than nice to have. So caching might be worth it despite the risks. To reduce the risk of breaking something in the future, we should definitely ensure that |
👍! |
Just for clarity, at the time, I was hunting 2x and 3x performance improvements. The little decrease I saw recently (1-5%) seems ok. Looking back at those merge requests, I think it was due to a timestamp conversion, and the rest of the performance improvements were from refactors I found in my hunt for the underlying bug. I think the little performance decrease is definitely ok if the goal is "simplicity". I believe that caching can be a large barrier to contributors and not warranted if the gains are small. I would be honored if that |
Alternative to #10066, implementing #10066 (comment).
DataArray.xindexes
@hmaarrfk could you try your benchmark #10066 (comment) with this branch, please?