-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add hint for missing fields #14521
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Lordworms -- this looks really nice ❤️
I had one question, but otherwise this looks ready to me
cc @adriangb
statement ok | ||
create table a(timestamp int, birthday int); | ||
|
||
query error DataFusion error: Schema error: No field named timetamp\. Did you mean 'a\.timestamp'\?\. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@@ -90,16 +90,16 @@ drop table case_insensitive_test | |||
statement ok | |||
CREATE TABLE test("Column1" string) AS VALUES ('content1'); | |||
|
|||
statement error DataFusion error: Schema error: No field named column1. Valid fields are test\."Column1"\. | |||
statement error DataFusion error: Schema error: No field named column1\. Valid fields are test\."Column1"\. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should't these errors also contain the nice new message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we have to use full name to do match and the threshold is 0.5 now but in this case it is 'test.Column1' and column1 so the score is happen to be 0.5
Amazing work! It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in #13664 while we're at it? I'd also suggest a test showing that a threshold of |
query error DataFusion error: Schema error: No field named timetamp\. Did you mean 'a\.timestamp'\?\. | ||
select timetamp from a; | ||
|
||
query error DataFusion error: Schema error: No field named dadsada\. Valid fields are a\.timestamp, a\.birthday\. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still old message 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently we would only do 'Did you mean' with a match over 0.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right so in this case there were no matches over 0.5 and thus all of the fields were shown
527fbd1
to
1f4a95b
Compare
FYI @eliaperantoni for your comments / suggestions |
I can refactor that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! I love that Diagnostic
is gaining so much traction ❤️. I have one comment, but otherwise looks perfect to me.
4032d58
to
adfee55
Compare
assert_eq!( | ||
diag.notes[0].message, | ||
"possible reference to 'first_name' in table 'a'" | ||
); | ||
assert_eq!( | ||
diag.notes[1].message, | ||
"possible reference to 'first_name' in table 'b'" | ||
); | ||
assert_eq!(diag.notes[0].message, "possible column a.first_name"); | ||
assert_eq!(diag.notes[1].message, "possible column b.first_name"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling ambiguity might be okay, the final choice is on the user but what is definitely needs to avoid ig leventshein gets false positives and get 0.5 match. in this case the closest column name will be proposed(which can be totally different from what user typed) but others also hidden. If that happen user will have to do more action like EXPLAIN
to find the schema. Not sure how to test such scenario though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok if you say non-exact matches should be shown too, I'm on board.
datafusion/common/src/column.rs
Outdated
.map_err(|e| match &e { | ||
DataFusionError::SchemaError( | ||
SchemaError::FieldNotFound { | ||
field, | ||
valid_fields, | ||
}, | ||
_, | ||
) => { | ||
let mut diagnostic = Diagnostic::new_error( | ||
format!("column '{}' not found", &field.name()), | ||
field.spans().first(), | ||
); | ||
add_possible_columns_to_diag(&mut diagnostic, field, valid_fields); | ||
e.with_diagnostic(diagnostic) | ||
} | ||
_ => e, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ❤️
2a052a6
to
a6a7d70
Compare
I merged up from main to resolve a conflict with this branch |
Thanks everyone! This is epic |
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?