You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When the range of the join keys is small and we have integer keys, instead of using a hashtable we can use a array/vec which starts on the minimum value of the values in the build-side key range.
join constraint should be on single integer (boolean/byte also possible, as long as it is limited in domain) keys.
no duplicates (?) - not sure whether its required
If no duplicates - > we can see how much unique values we have -> should be less than max limit, e.g. less than 100K values.
Get statistics on min/max values. It should be a small enough build-side (e.g. max 100K key values)
If this is all true we can copy the offsets to somethig like a Vec<Option<u64>> (or arrow equivalent) which contains the offsets starting at the minimum offset at index 0. Each element is indexed by x - MIN(x).
Instead of hashing the right-side values and probing, we can compute y - MIN(x) for the right side and index directly into the array. Checking hash collisions is not necessary for this approach.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When the range of the join keys is small and we have integer keys, instead of using a hashtable we can use a array/vec which starts on the minimum value of the values in the build-side key range.
Idea from:
duckdb/duckdb#1959
Describe the solution you'd like
Filter on
If this is all true we can copy the offsets to somethig like a
Vec<Option<u64>>
(or arrow equivalent) which contains the offsets starting at the minimum offset at index 0. Each element is indexed byx - MIN(x)
.Instead of hashing the right-side values and probing, we can compute
y - MIN(x)
for the right side and index directly into the array. Checking hash collisions is not necessary for this approach.Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: