-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: JOIN should require ON condition #1552
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! cc @alamb
MySQL actually does allow joins with no ON/USING, so this could break some (admittedly weird) workflows. |
What I found
cc @findepi |
so maybe we should fix this in DF only? |
as I know, in df, we don’t parse SQL directly; we only work with statements. |
It could be implemented for the dialects that (don't) support the joins without |
So we could:
|
Ah yeah using a dialect method for this makes sense in that case |
/// Verifies whether the provided `JoinOperator` is supported by this SQL dialect. | ||
/// Returns `true` if the `JoinOperator` is supported, otherwise `false`. | ||
fn verify_join_operator(&self, _join_operator: &JoinOperator) -> bool { | ||
true | ||
} | ||
|
||
/// Verifies if the given `JoinOperator`'s constraint is valid for this SQL dialect. | ||
/// Returns `true` if the join constraint is valid, otherwise `false`. | ||
fn verify_join_constraint(&self, join_operator: &JoinOperator) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so since the behavior seems to vary a bit across dialects, I'm thinking it could make sense after all if we let the parse continue to be permissive in syntax and downstream crates can perform the additional checks in the cases where specific combinations need to be enforced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your idea, and it makes sense. But the "issue" is based on the idea that the parser should restrict cases like “INNER JOIN” without an ON condition.
For me, as a developer, it would feel strange if I pick a parser, for example, the PostgreSQL dialect, and it allows “ANTI JOIN” without any error. This would mean I have to check all the statements again to find mistakes. It seems like a trade-off, and we need to decide which approach works best.
Or are you suggesting moving this specific implementation directly to GenericDialect
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is indeed a tradeoff, the expectation currently is that downstream crates further validate the output of the parser against any dialect specific requirements/invariants, see note here for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion:
- if there are any dialects that support
<left> JOIN <right>
without anON
clause then so should sqlparser - If there are no dialects that support such syntax, then erroring is a good idea
I have not done the research to know if there are any dialects that support such syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, DataFusion processes SELECT ... FROM l JOIN r
as a CROSS JOIN
for all dialects, including DuckDB, where an INNER JOIN
explicitly requires an ON
clause.
@Dandandan proposed addressing this issue in sqlparser-rs
, which seems reasonable to me. However, after discussing it with @iffyio, I see
indeed a tradeoff, the expectation currently is that downstream crates further validate the output of the parser
My questions are:
- Should this behavior be considered a bug?
- If yes, at which level should it be addressed:
datafusion
orsqlparser-rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should this behavior be considered a bug?
I don't have a strong opinion -- is it causing anyone problems? It seems like the ramification of allowing JOIN
... as CROSS JOIN
's largest implication is that now DataFusion has some new dialect.
If it isn't causing problems, my suggestion is do nothing until someone has a concrete usecase
Converting to draft to show we aren't necessairly planning to merge this |
Closes: #1550