feat: implement support for querying deeply nested remote table names (i.e. SELECT * FROM one.two.three.four.five
)
#35
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🗣 Description
This PR implements support for the federation crate to be able to submit queries to remote sources that can have deeply nested tables. This is possible in systems like Dremio.
I've added new type
MultiPartTableReference
andMultiTableReference
:A
MultiPartTableReference
is logically a superset of aTableReference
- equivalent in every way when the number of parts is <= 3.During the initial pass to rewrite table names, we keep track of the rewrites now as:
known_rewrites: &mut HashMap<TableReference, MultiPartTableReference>
. If the MultiPartTableReference is a regular DataFusion TableReference, then it will proceed with its original behavior of rewriting the table scans and column expressions. If not, it will skip rewriting and let it continue to have its name as defined in Spice.Once the unparser runs and generates the sqlparser::ast, we do another pass on the ast if there are any multi-level table references. This new code will modify the AST in-place to perform the final rewrites.
I also did some restructuring and moved the rewrite logic (both the existing logical plan based one, and the new sqlparser::ast one) into separate modules to better structure it.
🔨 Related Issues
🤔 Concerns
I considered implementing this entirely in the DataFusion unparser, by leveraging the newly added behavior to unparse a custom node: apache/datafusion#13880. That would work well for the TableScans, but many expressions have a TableReference (for things like fully qualified column names;
SELECT table.column FROM table
). Unless we rewrote the entire unparser, it wouldn't be feasible to do this using theMultiPartTableReference
type directly.