`ZipOuterJoin` should have a fast path for single chunks #1040

ritchie46 · 2021-07-25T08:36:55Z

Currently the TakeRandom traits are used, but this is expensive for the most common case, single chunks.

The text was updated successfully, but these errors were encountered:

marcvanheerden · 2021-07-31T12:14:19Z

I'll pick this up if thats okay

ritchie46 · 2021-08-01T09:01:50Z

Ah, yes! 🙂

marcvanheerden · 2021-10-18T18:51:27Z

Hi, @ritchie46 I attempted this by using .get_unchecked() directly on the chunked array when there is only one chunk.

marcvanheerden@71a1b34

This didn't improve the performance noticeably. I tested it out on an outer join of two 10mil row tables. Have I misunderstood the request?

ritchie46 · 2021-10-19T10:08:25Z

The idea was to unpack to the underlaying PrimitiveArray instead of using the take_rand proxy.

We can get the PrimitiveArray like this: self.downcast_iter().next().unwrap();

marcvanheerden · 2021-10-22T10:32:28Z

I tried it in this commit on my fork: marcvanheerden@edcf5cd

Still not getting a measurable run time difference on the same two 10mil row tables. I used hyperfine for the measurement and the change is slower by 10ms on around 7s for the join. I think this is just noise and not an actual regression.

Let me know if I'm doing something wrong

ritchie46 · 2021-10-22T11:19:44Z

@marcvanheerden I've left some comments that should help the performance.

marcvanheerden · 2021-10-23T12:38:24Z

Thanks for your help again, I've made the change here. Unforunately I'm not seeing a performance improvement.

If I'm reading these flamegraphs correctly it looks like:

The changes don't make a big difference
The piece we're working on is quite a small portion of the runtime overall so there isn't much time to be gained.

Here are the before and after flamegraphs.
Archive.zip

Let me know if I'm making a mistake or missing something.

Thanks

ritchie46 · 2021-10-25T07:36:38Z

An outer join does also quite some work. Maybe should isolate the functionality we're benching?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ZipOuterJoin` should have a fast path for single chunks #1040

`ZipOuterJoin` should have a fast path for single chunks #1040

ritchie46 commented Jul 25, 2021

marcvanheerden commented Jul 31, 2021

ritchie46 commented Aug 1, 2021

marcvanheerden commented Oct 18, 2021

ritchie46 commented Oct 19, 2021

marcvanheerden commented Oct 22, 2021

ritchie46 commented Oct 22, 2021

marcvanheerden commented Oct 23, 2021

ritchie46 commented Oct 25, 2021

ZipOuterJoin should have a fast path for single chunks #1040

ZipOuterJoin should have a fast path for single chunks #1040

Comments

ritchie46 commented Jul 25, 2021

marcvanheerden commented Jul 31, 2021

ritchie46 commented Aug 1, 2021

marcvanheerden commented Oct 18, 2021

ritchie46 commented Oct 19, 2021

marcvanheerden commented Oct 22, 2021

ritchie46 commented Oct 22, 2021

marcvanheerden commented Oct 23, 2021

ritchie46 commented Oct 25, 2021

`ZipOuterJoin` should have a fast path for single chunks #1040

`ZipOuterJoin` should have a fast path for single chunks #1040