Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create test pattern for Spans #1563

Open
Tracked by #1548 ...
alamb opened this issue Nov 26, 2024 · 5 comments
Open
Tracked by #1548 ...

Create test pattern for Spans #1563

alamb opened this issue Nov 26, 2024 · 5 comments

Comments

@alamb
Copy link
Contributor

alamb commented Nov 26, 2024

Part of #1548

Usecase

We would like to add span information through out the rest of the AST structures (see docs in #1549) and list on #1548

In order to support this feature reasonably, we need to be able to test the feature and avoid regressions

Proposal

I think we need some sort of span test.

I recommend

  1. A new test binary like tests/sqlparser_spans.rs

Then add a test that

  1. Parses a SQL string
  2. Check the spans of the AST nodes match what is expected

I think we'll have to use judgement on how fine grained the location information
can / should be.

@alamb
Copy link
Contributor Author

alamb commented Nov 30, 2024

@lustefaniak , given your comments on

We have an extensive internal test suite living in a downstream project using a fork of this library.

I wonder if you have any suggestions here (or maybe a code pointer for the downstream tests that we could possible start looking at?)

@graup
Copy link
Contributor

graup commented Jan 10, 2025

Hi! Just a small note since I started playing around with this. I assume it's because it's not implemented yet but anyway to keep track: it seems currently that Function args don't have a span, and also the Function span only covers the function name, not its arguments and parentheses. So for CURRENT_DATE('UTC'), the ast node for the Function expr has a span containing only CURRENT_DATE (and func.args has an empty span). I'd expect the function expr span to cover the whole expression including the parentheses.

/edit to add: I guess it's because Spanned for Value is not implemented yet. But even if it were, I'm not sure the function span will cover all the text until the closing brace, since it's just union'ing the spans of the parsed elements, which wouldn't include the braces...? I have a feeling Spanned for FunctionArguments needs to be extended to the open and closing braces in the List and Subquery case. @Nyrox wdyt?

Very tangential, it would be useful if there was a helper function to convert line-column-based locations to byte offsets. (My use case is implementing a fixer without rebuilding the SQL from the AST, just applying fixes locally to individual nodes in the source text to keep unaffected parts unchanged in the original format.)

@lustefaniak
Copy link
Contributor

Hi @alamb, what I did downstream couldn't be ported immediately as it is a much higher level testing, which I think might not work well in the low-level library. I work on the result of the AST analysis, storing locations into higher-level concepts like columns and dependency chains. Categorising locations into specific categories:

  SOURCE_LOCATION_KIND_PROJECTION,
  SOURCE_LOCATION_KIND_IDENTIFIER,
  SOURCE_LOCATION_KIND_EXPRESSION,
  SOURCE_LOCATION_KIND_SELECTION etc.

I assert that the captured "location" is precisely a specific string.

    assert_location_contains(
        sql,
        &get_column_dep_locations(&res, "t", "started_at"),
        &snip(LocationKind::Identifier, "started_at"),
    );
    assert_location_contains(
        sql,
        &get_column_dep_locations(&res, "t", "started_at"),
        &snip(
            LocationKind::Expression,
            "GREATEST(finished_at, started_at)",
        ),
    );
    assert_location_contains(
        sql,
        &get_column_dep_locations(&res, "t", "started_at"),
        &snip(
            LocationKind::Projection,
            "GREATEST(finished_at, started_at) as t",
        ),
    );

On the one hand, it is a very high level without looking into details of all the AST nodes; on the other hand, I found working with line/char super hard to grasp and error-prone and led to not the level of testing I wanted. Ultimately, I wanted the best experience for our users, and snipped-based testing worked pretty well.

@Nyrox
Copy link
Contributor

Nyrox commented Jan 13, 2025

/edit to add: I guess it's because Spanned for Value is not implemented yet. But even if it were, I'm not sure the function span will cover all the text until the closing brace, since it's just union'ing the spans of the parsed elements, which wouldn't include the braces...? I have a feeling Spanned for FunctionArguments needs to be extended to the open and closing braces in the List and Subquery case. @Nyrox wdyt?

@graup Yes you are right. There is a lot of missing cases like these. For FunctionArgumentList, adding the parenthesis as AttachedToken would be straight forward and should be done. I never quite figured out what the best to do it for enum tuple variants like Subquery is though.

So for CURRENT_DATE('UTC'), the ast node for the Function expr has a span containing only CURRENT_DATE (and func.args has an empty span). I'd expect the function expr span to cover the whole expression including the parentheses.

Yeah this is definitely the Span for Value missing. Getting that in is pretty high priority imo.
Not sure if the best way here is to actually implemented Spanned for every Value or to wrap every usage of value in a WithSpan like wrapper. Both are unfortunately quite breaking changes.

@graup
Copy link
Contributor

graup commented Jan 13, 2025

Thanks for the reply @Nyrox . It seems if we just implement AttachedToken for FunctionArgumentList's parentheses, the missing spans for Value wouldn't matter much for my case, as the union'ed Function span including the right parenthesis would also include anything between the function name and the right parenthesis.

I looked into the code a bit and understand your struggle, somehow FunctionArguments needs to gain knowledge about its parentheses, kind of like Cte has a closing_paren_token, right? Turning the enum FunctionArguments into a struct with at least { body, closing_paren_token } would be a breaking change but might be the only correct way to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants