Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement support for querying deeply nested remote table names (i.e. SELECT * FROM one.two.three.four.five) #35

Merged
merged 9 commits into from
Jan 3, 2025

Conversation

phillipleblanc
Copy link

🗣 Description

This PR implements support for the federation crate to be able to submit queries to remote sources that can have deeply nested tables. This is possible in systems like Dremio.

I've added new type MultiPartTableReference and MultiTableReference:

pub enum MultiPartTableReference {
    TableReference(TableReference),
    Multi(MultiTableReference),
}

pub struct MultiTableReference {
    pub parts: Vec<Arc<str>>,
}

A MultiPartTableReference is logically a superset of a TableReference - equivalent in every way when the number of parts is <= 3.

During the initial pass to rewrite table names, we keep track of the rewrites now as: known_rewrites: &mut HashMap<TableReference, MultiPartTableReference>. If the MultiPartTableReference is a regular DataFusion TableReference, then it will proceed with its original behavior of rewriting the table scans and column expressions. If not, it will skip rewriting and let it continue to have its name as defined in Spice.

Once the unparser runs and generates the sqlparser::ast, we do another pass on the ast if there are any multi-level table references. This new code will modify the AST in-place to perform the final rewrites.

I also did some restructuring and moved the rewrite logic (both the existing logical plan based one, and the new sqlparser::ast one) into separate modules to better structure it.

🔨 Related Issues

🤔 Concerns

I considered implementing this entirely in the DataFusion unparser, by leveraging the newly added behavior to unparse a custom node: apache/datafusion#13880. That would work well for the TableScans, but many expressions have a TableReference (for things like fully qualified column names; SELECT table.column FROM table). Unless we rewrote the entire unparser, it wouldn't be feasible to do this using the MultiPartTableReference type directly.

@phillipleblanc phillipleblanc requested a review from a team January 2, 2025 15:26
@phillipleblanc phillipleblanc self-assigned this Jan 2, 2025
@phillipleblanc phillipleblanc changed the title Implement support for querying deeply nested remote table names (i.e. SELECT * FROM one.two.three.four.five) feat: Implement support for querying deeply nested remote table names (i.e. SELECT * FROM one.two.three.four.five) Jan 2, 2025
@phillipleblanc phillipleblanc changed the title feat: Implement support for querying deeply nested remote table names (i.e. SELECT * FROM one.two.three.four.five) feat: implement support for querying deeply nested remote table names (i.e. SELECT * FROM one.two.three.four.five) Jan 3, 2025
@phillipleblanc phillipleblanc merged commit 691226f into spiceai-43 Jan 3, 2025
17 checks passed
@phillipleblanc phillipleblanc deleted the phillip/241231-rewrite-table-scan-string branch January 3, 2025 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants