Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake column identifier #520

Conversation

pawel-big-lebowski
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski commented Jun 9, 2022

Signed-off-by: Pawel Leszczynski [email protected]

resolves #519

According to documentation, Snowflake allows:

  • query data staged in files with an identifier of a form [<alias>.]$<file_col_num>[.<element>]
  • copy into table with an column identifier containing $ character like select t.$1,t.$2,t.$3 from @~/datafile.csv.gz t;
  • get-path like select v:attr[0].name from vartab;

Although the feature seems to be snowflake specific, it cannot be implemented within snowflake dialect. It's because tokens starting with $ character are tokenized into Token::Placeholder. That's why parser's change is required.

Initial OpenLineage project issue: OpenLineage/OpenLineage#814.

@coveralls
Copy link

coveralls commented Jun 12, 2022

Pull Request Test Coverage Report for Build 2487464140

  • 38 of 42 (90.48%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 89.813%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ast/mod.rs 12 14 85.71%
src/parser.rs 22 24 91.67%
Totals Coverage Status
Change from base Build 2450394803: 0.02%
Covered Lines: 8755
Relevant Lines: 9748

💛 - Coveralls

@alamb
Copy link
Contributor

alamb commented Jun 12, 2022

There appear to be a few CI failures on this PR

Signed-off-by: Pawel Leszczynski <[email protected]>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @pawel-big-lebowski I will try and find time to review this PR tomorrow.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @pawel-big-lebowski -- I spent some time researching this feature.

First, I think it might help this PR if you split it into 2 pieces so we can discuss each feature ($identifiers and get path) separately.

Also, 🤔 it seems as though snowflake also supports identifiers like column$FILENAME -- is this something you want to explore?

https://docs.snowflake.com/en/user-guide/querying-metadata.html#metadata-columns

@@ -217,6 +217,18 @@ pub enum Expr {
Identifier(Ident),
/// Multi-part identifier, e.g. `table_alias.column` or `schema.table.col`
CompoundIdentifier(Vec<Ident>),
/// Multi-part identifier with a column number, e.g. [alias.]$file_col_num[.element] (snowflake)
CompoundIdentifierWithColumnNumber {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of supporting things like a.$1 -- what would you think about modeling this as a a two part CompoundIdent identifier ["a", "$1"] rather than a new variant?

I think the parser would still have to be changed, but I think that approach would keep the AST and the consumption of the AST smaller with fewer dialect specific structures

@@ -99,6 +99,12 @@ fn test_single_table_in_parenthesis() {
);
}

#[test]
fn test_snowflake_variables() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have a test that the parsed structure is actually what is expected -- for example https://github.com/sqlparser-rs/sqlparser-rs/blob/c884fbc/tests/sqlparser_common.rs#L138-L168

@alamb
Copy link
Contributor

alamb commented Aug 11, 2022

Marking this as a draft to show it is waiting some changes so I can easily filter out which PRs need review. Please mark it as "ready for review" when it is next ready.

@alamb alamb marked this pull request as draft August 11, 2022 10:47
@AugustoFKL
Copy link
Contributor

@alamb @pawel-big-lebowski is closing this for now because it's almost a year without an update.

@pawel-big-lebowski, ping me if you will work on this code specifically, if so, I open the PR again :)

@AugustoFKL AugustoFKL closed this Feb 25, 2023
@alamb
Copy link
Contributor

alamb commented Feb 26, 2023

Thanks for cleaning this up @AugustoFKL

@coveralls
Copy link

coveralls commented Oct 4, 2024

Pull Request Test Coverage Report for Build 2487464140

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 38 of 42 (90.48%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 89.813%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/ast/mod.rs 12 14 85.71%
src/parser.rs 22 24 91.67%
Totals Coverage Status
Change from base Build 2450394803: 0.02%
Covered Lines: 8755
Relevant Lines: 9748

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

snowflake column identifier for copy into / query data staged in files
4 participants