-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate table constraints through physical plans to optimize sort operations #14111
Propagate table constraints through physical plans to optimize sort operations #14111
Conversation
…anner-functional-dependence
I left my reviews here: synnada-ai#53 (review) |
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are very close to the finish line. Let's iterate over my comments
datafusion/core/src/datasource/physical_plan/file_scan_config.rs
Outdated
Show resolved
Hide resolved
match constraint { | ||
Constraint::PrimaryKey(indices) => { | ||
let new_indices = | ||
update_elements_with_matching_indices(indices, proj_indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you refactor the update_elements_with_matching_indices
function to take two impl Iterator
's (you probably need to replace the looping order to do that), this function can also accept proj_indices
as an impl Iterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update_elements_with_matching_indices
function uses .position()
on proj_indices
, which makes it necessary to clone it if we take it as an impl Iterator
. I think this defeats the whole purpose of this refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I had in mind was to swap the loop order (iterate on proj_indices
on the outer loop). That may enable us to use an impl Iterator
for proj_indices
. We probably will need to keep the type of entries
as a slice because it does not have an ordering (though we can enforce that in a future PR). Had entries
was ordered, I think we could have also taken it in as an impl Iterator
-- but let's leave the latter for a future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We looked into this with @berkaysynnada and it seems to have some intricacies. Let's leave this to another PR
.map(|col| col.index()) | ||
.collect::<Vec<_>>(); | ||
debug_assert_eq!(mapping.map.len(), indices.len()); | ||
self.constraints.project(&indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will not need to collect and materialize indices
if you refactor project
to accept an iterator. See my comment in functional_dependencies.rs
.
…ation object" This reverts commit bbe35d4.
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
a6c83a7
to
726737b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this carefully (twice) and it LGTM
I'll merge this PR once the main branch is all green |
Great efforts! Thank you @gokselk |
Which issue does this PR close?
Closes #14110.
Rationale for this change
This PR extends the physical planner to propagate table constraints (PRIMARY KEY and UNIQUE) through the query plan. This allows us to optimize sort operations by recognizing when ordering requirements are already satisfied by existing constraints.
What changes are included in this PR?
Constraints
propagation through physical plans including:EquivalenceProperties
to consider constraints when evaluating sort requirementsAre these changes tested?
Yes, the changes include:
Are there any user-facing changes?
The changes are mostly internal optimizations, but users will see: