-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZipOuterJoin
should have a fast path for single chunks
#1040
Comments
I'll pick this up if thats okay |
Ah, yes! 🙂 |
Hi, @ritchie46 I attempted this by using .get_unchecked() directly on the chunked array when there is only one chunk. This didn't improve the performance noticeably. I tested it out on an outer join of two 10mil row tables. Have I misunderstood the request? |
The idea was to unpack to the underlaying We can get the |
I tried it in this commit on my fork: marcvanheerden@edcf5cd Still not getting a measurable run time difference on the same two 10mil row tables. I used hyperfine for the measurement and the change is slower by 10ms on around 7s for the join. I think this is just noise and not an actual regression. Let me know if I'm doing something wrong |
@marcvanheerden I've left some comments that should help the performance. |
Thanks for your help again, I've made the change here. Unforunately I'm not seeing a performance improvement. If I'm reading these flamegraphs correctly it looks like:
Here are the before and after flamegraphs. Let me know if I'm making a mistake or missing something. Thanks |
An outer join does also quite some work. Maybe should isolate the functionality we're benching? |
Currently the
TakeRandom
traits are used, but this is expensive for the most common case, single chunks.The text was updated successfully, but these errors were encountered: