Never fallback to cartesian product for join estimation when we know the min/max values for columns #3813

isidentical · 2022-10-12T19:21:28Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
distinct_count is usually expensive to compute, so some platforms which save parquet files abstain from injecting it at the metadata section. We should be able to estimate the join cardinality without it before falling back to cartesian product.

Describe the solution you'd like
Since we already require min/max values to be present, we should be able to just do min(num_left_rows - num_nulls or 0, scalar_range(left_stats.min, left_stats.max)) to determine an alternative distinct count.

Describe alternatives you've considered
None.

Additional context
Original discussion was here #3787 (comment)

The text was updated successfully, but these errors were encountered:

Dandandan · 2022-10-12T19:29:54Z

There is also this presentation about optimizing the order of joins without statistics available (which also seems to do fine for DuckDB). We could also see if we can reuse some of these ideas:

https://www.youtube.com/watch?v=aNRoR0Z3SzU

isidentical added the enhancement New feature or request label Oct 12, 2022

isidentical mentioned this issue Oct 12, 2022

Join cardinality computation for cost-based nested join optimizations #3787

Merged

isidentical mentioned this issue Oct 15, 2022

Infer the count of maximum distinct values from min/max #3837

Merged

isidentical mentioned this issue Oct 22, 2022

[EPIC] Improving cost calculations and cost based optimizations #3929

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Never fallback to cartesian product for join estimation when we know the min/max values for columns #3813

Never fallback to cartesian product for join estimation when we know the min/max values for columns #3813

isidentical commented Oct 12, 2022

Dandandan commented Oct 12, 2022 •

edited

Loading

Never fallback to cartesian product for join estimation when we know the min/max values for columns #3813

Never fallback to cartesian product for join estimation when we know the min/max values for columns #3813

Comments

isidentical commented Oct 12, 2022

Dandandan commented Oct 12, 2022 • edited Loading

Dandandan commented Oct 12, 2022 •

edited

Loading