-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Move various parts of datasource out of core #14616
Conversation
cc @alamb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @logan-keede -- I think this looks great to me. I have a few small questions but nothing major that can't be done as a follow on PR
I love this incremental approach @logan-keede -- very very nice
/// as other operators. | ||
/// | ||
/// [`FileStream`]: <https://github.com/apache/datafusion/blob/main/datafusion/core/src/datasource/physical_plan/file_stream.rs> | ||
pub struct FileStreamMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about if there was some better name for this module than file_stream_part.rs
but I couldn't come up with one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, when all of file_stream
is moved, we can change the name. or perhaps we should just keep it file_stream
as I did not make a file_scan_config_part
but file_scan_config
(it was an arbitrary decision).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer file_stream
use crate::datasource::physical_plan::FileMeta; | ||
use crate::datasource::physical_plan::{FileOpenFuture, FileOpener}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it is ok to re-export FileOpenFuture
and FileOpener
in physical_plan
? Or should they still be in a file_stream
submodule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are already in physical_plan. They could be imported from both file_stream
and physical_plan
earlier.
pub use file_stream::{FileOpenFuture, FileOpener, FileStream, OnError}; |
or did I misunderstand something?
Why not move to the |
original plan was to move just |
If A and B is tightly couple, you need to pull partial structure out to C and import C for A and B. Not moving A and B together. |
The possible plan I proposed on #14444 proposes a structure like this
This was before the refactor with @jayzhan211 are you proposing we add another crate in there, something like?
|
FWIW I think we can merge this PR and then keep moving code around as follow on PRs) |
Yes and not only the 3 I mentioned, FileFormat, FileFormatFactory, etc. I think |
|
use crate::physical_plan::RecordBatchStream; | ||
|
||
use arrow::datatypes::SchemaRef; | ||
use arrow::error::ArrowError; | ||
use arrow::record_batch::RecordBatch; | ||
use datafusion_common::instant::Instant; | ||
pub use datafusion_catalog_listing::file_stream_part::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to avoid importing everything at once so that the imported modules are explicitly controlled and managed.
also, I am not suggesting that we keep the file_format and physical_plan folders in new solution, PS: @jayzhan211 can/should I tag you in related PRs? |
Ok, I am going to merge this one in and we can keep working on this in follow on PRs. @logan-keede can you update the plan on #14444 and keep on hacking ? Thanks again! |
Which issue does this PR close?
datafusion
crate (datafusion/core
) #14444.Rationale for this change
What changes are included in this PR?
Move the parts of
datasource
that do not have much coupling out of core.Are these changes tested?
yes, by Github CI.
Are there any user-facing changes?
No, there should not be.