-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some dependency removals and setup for refactor of FileScanConfig
#14543
Conversation
cc @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @logan-keede -- this looks great. Thank you so much for making this small PRs.
I had a small suggestion but I think we could do it as a follow on PR too
/// Check if repartition is supported | ||
fn supports_repartition(&self, config: &FileScanConfig) -> bool { | ||
!(config.file_compression_type.is_compressed() || config.new_lines_in_values) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better we did not provide a default implementation for supports_repartition
Assuming that a file source can repartition based on its compression and newlines seems very specific to CSV.
/// Check if repartition is supported | |
fn supports_repartition(&self, config: &FileScanConfig) -> bool { | |
!(config.file_compression_type.is_compressed() || config.new_lines_in_values) | |
} | |
/// Check if repartition is supported | |
fn supports_repartition(&self, config: &FileScanConfig) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to include it in my next PR, which should be tomorrow hopefully.
just for clarity do you want to implement something like this for other supported file format
fn supports_repartition(&self, config: &FileScanConfig) -> bool {
!(config.file_compression_type.is_compressed())
}
and original for csv. I do not have much context on this function, So i can not comment on whether newlines is useful only in csv or not. I will try to look into it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! Thanks -- I pushed a change to this branch -- hopefully that was ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(let's just hope I haven't broken the CI checks. 😆 😭 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it moving!
Thanks again @logan-keede |
Which issue does this PR close?
datafusion-catalog-listing
out ofcore/src/datasource
#14462datafusion
crate (datafusion/core
) #14444.SessionState
withSession
and progress towards movingFileFormatFactory
out ofdatasource
#14517Rationale for this change
SessionState
needs to be replaced withSession
indatasource
to split it out ofcore
.get_file_format_factory
needs to be implemented forSession
datafusion/datafusion/core/src/datasource/file_format/mod.rs
Line 64 in 304488d
FileScanConfig
todatafusion-catalog-listing
(or some other place, but out of core)What changes are included in this PR?
Refactoring,
SessionState
->dyn Session
+ some downcast to be removed later.AvroSource
infile_format/mod.rs
(thanks to @mertak-synnada for helping out)major: major movecoupling due to new merges seems to have made this a bit difficult than before, I am hoping we can go with this while I sort other parts out.FileScanConfig
todatafusion-catalog-listing
(upcoming, we can probably do it in a separate PR)Are these changes tested?
Are there any user-facing changes?
There should not be