-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce TableChangesScan::execute
and ScanFileReader
#555
Conversation
This reverts commit 9702ef0.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #555 +/- ##
==========================================
- Coverage 80.61% 80.05% -0.57%
==========================================
Files 67 71 +4
Lines 14278 15499 +1221
Branches 14278 15499 +1221
==========================================
+ Hits 11510 12407 +897
- Misses 2188 2478 +290
- Partials 580 614 +34 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few initial comments
struct Cli { | ||
/// Path to the table to inspect | ||
path: String, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra spaces
use bytes::Bytes; | ||
use roaring::RoaringTreemap; | ||
use std::io::{Cursor, Read}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need these?
@@ -54,19 +55,26 @@ impl DvInfo { | |||
self.deletion_vector.is_some() | |||
} | |||
|
|||
pub fn get_selection_vector( | |||
pub fn get_treemap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate)
?
} | ||
} | ||
|
||
pub(crate) fn get_generated_columns(&self) -> DeltaResult<HashMap<String, Expression>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't need to be pub(crate)
} | ||
} | ||
|
||
pub fn scan_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't make this pub
yet
let generated_columns: HashMap<String, Expression> = cols | ||
.iter() | ||
.map(ToString::to_string) | ||
.zip(expressions) | ||
.collect(); | ||
Ok(generated_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to make these Strings
:
let generated_columns: HashMap<String, Expression> = cols | |
.iter() | |
.map(ToString::to_string) | |
.zip(expressions) | |
.collect(); | |
Ok(generated_columns) | |
Ok(cols.into_iter().zip(expressions).collect()) |
Also requires:
- Change the return of this to
DeltaResult<HashMap<&str, Expression>>
- do:
.remove(field_name.as_str())
on 106 below
mut self, | ||
engine: &dyn Engine, | ||
) -> DeltaResult<impl Iterator<Item = DeltaResult<ScanResult>>> { | ||
let table_root = Url::parse(&self.global_scan_state.table_root)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this down to just above when you join to get location
})) | ||
} | ||
} | ||
pub(crate) fn resolve_scan_file_dv( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc comment please
Closing this to break it up into smaller pieces. I'll address the comments in the new PRs. |
What changes are proposed in this pull request?
This PR introduces
TableChangesScan::execute
, which reads the change data feed for aTableChangesScan
. We do so by introducing aScanFileReader
which performs all the data file reading, batch iteration, and physical to logical transformation.I introduce some helper methods for resolving the deletion vectors in
ScanFile
and the remove_dv.Depends on #546
How was this change tested?
TODO: add tests