Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extract & insert sidecar batches in
replay
's action iterator #679base: main
Are you sure you want to change the base?
feat: extract & insert sidecar batches in
replay
's action iterator #679Changes from all commits
a7b96a0
d55768e
c880d1f
68a5139
ce2ee83
ec44a01
029b2f2
3e4ea17
89e02fc
2ff3eb1
91a187e
0b57452
433a4bb
2122f59
98cba07
51a34a8
fe22868
1e8ed59
f6370ef
1133914
b60ba43
eb0f1bb
df874bb
5ea49d7
306e4ea
9a67e06
59936cf
fc180a4
ce422f7
21893d6
6e64916
626f7b4
ff67fcf
fbe1d87
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc, we want to avoid passing handlers around. Only reference to the engine. I think it's because we want to make it clear that the handler is tied to the engine and not to encourage holding an Arc ref to the handler.
cc @zachschuermann to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall running into lifetime issues when passing the entire engine. I believe we would have to explicitly tie the iterator's lifetime to that of the engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also pass an Arc.
Basically the iterator needs to hold a reference for the entire duration it's lazily evaluating. So you want to give it a reference it can hold for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this change extends all the way to changing the
scan_data
function signature to explicitly tie the engines lifetime to the iterator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, so the basic issue is that we have delayed reading of parquet files, so at some point we want an item off the iterator, and to produce it, we need to read some parquet, so we need a handler. Previously we could do all the read calls up front and then just map off that iterator, so we didn't need an engine ref plumbed through.
I think if this is all internal, i.e., we don't want to expose any of these function signatures to engines (especially in the FFI), then cloning the
Arc
s is fine (it's very cheap. as a suggestion we usually put// cheap arc clone
at those clone sites to make it clear).If we do want to ever expose this, we'll need to think more, but afaict, we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the look @nicklan, just to confirm we will go ahead and clone the parquet handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if sidecar_to_file_meta could be a closure. We only use this once.
And then map sidecar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give it a shot and see how it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's a good idea to leave it as a separate function for unit testing purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say either keep the separate function (if needed for testing) or embed the logic directly in the
map
call? What purpose does a separately named closure serve?(aside: not sure if cargo fmt will like my indentation choice above -- depends on whether the
(
or{
is more important)