You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do. distinct_count is usually expensive to compute, so some platforms which save parquet files abstain from injecting it at the metadata section. We should be able to estimate the join cardinality without it before falling back to cartesian product.
Describe the solution you'd like
Since we already require min/max values to be present, we should be able to just do min(num_left_rows - num_nulls or 0, scalar_range(left_stats.min, left_stats.max)) to determine an alternative distinct count.
Describe alternatives you've considered
None.
Additional context
Original discussion was here #3787 (comment)
The text was updated successfully, but these errors were encountered:
There is also this presentation about optimizing the order of joins without statistics available (which also seems to do fine for DuckDB). We could also see if we can reuse some of these ideas:
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
distinct_count
is usually expensive to compute, so some platforms which save parquet files abstain from injecting it at the metadata section. We should be able to estimate the join cardinality without it before falling back to cartesian product.Describe the solution you'd like
Since we already require min/max values to be present, we should be able to just do
min(num_left_rows - num_nulls or 0, scalar_range(left_stats.min, left_stats.max))
to determine an alternative distinct count.Describe alternatives you've considered
None.
Additional context
Original discussion was here #3787 (comment)
The text was updated successfully, but these errors were encountered: