Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50965][SQL][TESTS] Making sure that multiple parameterized queries work on SparkConnect #49628

Closed

Conversation

viktorluc-db
Copy link
Contributor

@viktorluc-db viktorluc-db commented Jan 23, 2025

What changes were proposed in this pull request?

Tests only.

Why are the changes needed?

Making sure that having multiple parametrization nodes in the parsed logical plan is handled properly. Multiple parametrization nodes are made by doing a union of different dataframes over SparkConnect, and this feature was not supported previously, but this PR managed to support it, so testing for this feature was needed.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Tests in ClientE2ETestSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@viktorluc-db
Copy link
Contributor Author

@vladimirg-db Can you take a look?

@viktorluc-db
Copy link
Contributor Author

@HyukjinKwon Could you please take a look?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this feature was not supported previously, but this PR managed to support it

This PR is about testing, correct?

If it tests only, add the tag [TESTS]

@HyukjinKwon HyukjinKwon changed the title [SPARK-50965][SQL] Making sure that multiple parameterized queries work on SparkConnect [SPARK-50965][SQL][TESTS] Making sure that multiple parameterized queries work on SparkConnect Jan 24, 2025
@viktorluc-db
Copy link
Contributor Author

@MaxGekk Yes, it is about testing. I accidentally didn't place the link to your PR in that sentence that you quoted. Now its fixed.

@viktorluc-db
Copy link
Contributor Author

@vladimirg-db Addressed the comments. PTAL.

Copy link
Contributor

@vladimirg-db vladimirg-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after resolving comments, thanks for working on this!

val temp = spark.sql("SELECT ?", Array(i))
df = df.union(temp)
}
checkAnswer(df, (0 until 10).map(i => Row(i)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usually comes from extending QueryTest. @MaxGekk is this a valid usage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow the existing convention, and extend QueryTest in which Spark's session can be bound to a remote Spark session.

val temp = spark.sql("SELECT ?", Array(i))
df = df.union(temp)
}
checkAnswer(df, (0 until 10).map(i => Row(i)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow the existing convention, and extend QueryTest in which Spark's session can be bound to a remote Spark session.

@viktorluc-db
Copy link
Contributor Author

Thanks @MaxGekk! Addressed your comment.

@MaxGekk
Copy link
Member

MaxGekk commented Jan 24, 2025

@viktorluc-db Could you resolve conflicts, please.

@viktorluc-db
Copy link
Contributor Author

@MaxGekk Done.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding these test coverage. I left a few minor comments. The test body itself looks good to me.

@viktorluc-db
Copy link
Contributor Author

@dongjoon-hyun Addressed your comments. Thanks!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @viktorluc-db , @vladimirg-db, @MaxGekk .
Merged to master.

@dongjoon-hyun
Copy link
Member

There was a conflict on branch-4.0. Could you make a backporting PR, @viktorluc-db ?

dongjoon-hyun pushed a commit that referenced this pull request Jan 29, 2025
…d queries work on SparkConnect

### What changes were proposed in this pull request?

Tests only. This the same PR as [this](#49628). The only difference is that this one is for `branch-4.0`.

### Why are the changes needed?

Making sure that having multiple parametrization nodes in the parsed logical plan is handled properly. Multiple parametrization nodes are made by doing a union of different dataframes over SparkConnect, and this feature was not supported previously, but [this](#49442) PR managed to support it, so testing for this feature was needed.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tests in. `ClientE2ETestSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49731 from viktorluc-db/parametrization_nodes.

Authored-by: viktorluc-db <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants