-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the iceberg_scan
internals to make use of the MultiFileReader API
#101
base: main
Are you sure you want to change the base?
Conversation
…eList and ParseOption, so the copy we made of the IcebergOptions in the MultiFileList constructor is wrong now, making it a const reference instead
…o account for this
…anifest has been depleted of data files
3214a96
to
59e234c
Compare
…g_tables_with_deletes
@@ -42,22 +41,22 @@ SELECT count(*) FROM ICEBERG_SCAN('data/persistent/iceberg/lineitem_iceberg', ve | |||
# 1 = 2023-02-15 15:07:54.504 | |||
# 2 = 2023-02-15 15:08:14.73 | |||
query I | |||
SELECT count(*) FROM ICEBERG_SCAN('data/persistent/iceberg/lineitem_iceberg', '2023-02-15 15:07:54.504'::TIMESTAMP, ALLOW_MOVED_PATHS=TRUE); | |||
SELECT count(*) FROM ICEBERG_SCAN('data/persistent/iceberg/lineitem_iceberg', snapshot_from_timestamp='2023-02-15 15:07:54.504'::TIMESTAMP, ALLOW_MOVED_PATHS=TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially break things for people, we should think about whether thats ok
return make_uniq<NodeStatistics>(0, 0); | ||
} | ||
|
||
// FIXME: visit metadata to get a cardinality count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can check if we can get this one in asap, good cardinality estimates do make a big difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the added_rows_count
and existing_rows_count
contain this information per manifest, which we can get by just scanning the manifest list.
Optional in v1 but required by >=v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to do that after #110 lands and we integrate those changes in to this PR.
I don't really want to figure out how to extend the avro-cpp reading logic to read this information, especially if we're going to replace it anyways
…e amount of lookups in the unordered_map
These changes should enable future optimizations that would not be possible (or very difficult/limited) with the old bind_replace based method.
The
duckdb
submodule has been updated to point tov1.2-histrionicus
One change is made that fixes an issue on
main
(git hash 6571dca3c05e7cb4b047fc81175580fff392c86a) related toreader_data.column_indexes
That is also relevant for the
duckdb_delta
extension.Thanks to @Tmonster for working on the testing improvements, wouldn't have found a lot of the bugs that crept in with this rework without that work 🚀