Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add hint for missing fields #14521

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Lordworms
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Feb 6, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Lordworms -- this looks really nice ❤️

I had one question, but otherwise this looks ready to me

cc @adriangb

statement ok
create table a(timestamp int, birthday int);

query error DataFusion error: Schema error: No field named timetamp\. Did you mean 'a\.timestamp'\?\.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -90,16 +90,16 @@ drop table case_insensitive_test
statement ok
CREATE TABLE test("Column1" string) AS VALUES ('content1');

statement error DataFusion error: Schema error: No field named column1. Valid fields are test\."Column1"\.
statement error DataFusion error: Schema error: No field named column1\. Valid fields are test\."Column1"\.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should't these errors also contain the nice new message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we have to use full name to do match and the threshold is 0.5 now but in this case it is 'test.Column1' and column1 so the score is happen to be 0.5

@adriangb
Copy link
Contributor

adriangb commented Feb 6, 2025

Amazing work!

It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in #13664 while we're at it?

I'd also suggest a test showing that a threshold of 0.5 is reasonable. Maybe a table with multiple columns that are increasingly further apart by eyeball (timesamp, timeamp, timp, ts, tokens, amp, foo?) and see which ones get returned and which don't.

query error DataFusion error: Schema error: No field named timetamp\. Did you mean 'a\.timestamp'\?\.
select timetamp from a;

query error DataFusion error: Schema error: No field named dadsada\. Valid fields are a\.timestamp, a\.birthday\.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still old message 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently we would only do 'Did you mean' with a match over 0.5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right so in this case there were no matches over 0.5 and thus all of the fields were shown

@alamb
Copy link
Contributor

alamb commented Feb 6, 2025

It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in #13664 while we're at it?

FYI @eliaperantoni for your comments / suggestions

@Lordworms
Copy link
Contributor Author

threshold of 0.5 is reasonable. Maybe a table with multiple columns that are increasingly further apart by eyeball (timesamp, timeamp, timp, ts, tokens, amp, foo?) and see which ones get returned and which don't.

Amazing work!

It seems like these will just be part of the existing error message? Wouldn't it make sense to integrate with the new APIs in #13664 while we're at it?

I can refactor that

I'd also suggest a test showing that a threshold of 0.5 is reasonable. Maybe a table with multiple columns that are increasingly further apart by eyeball (timesamp, timeamp, timp, ts, tokens, amp, foo?) and see which ones get returned and which don't.

@github-actions github-actions bot added the sql SQL Planner label Feb 7, 2025
Copy link
Contributor

@eliaperantoni eliaperantoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! I love that Diagnostic is gaining so much traction ❤️. I have one comment, but otherwise looks perfect to me.

datafusion/common/src/column.rs Outdated Show resolved Hide resolved
datafusion/common/src/error.rs Outdated Show resolved Hide resolved
fix bugs
Comment on lines -204 to +205
assert_eq!(
diag.notes[0].message,
"possible reference to 'first_name' in table 'a'"
);
assert_eq!(
diag.notes[1].message,
"possible reference to 'first_name' in table 'b'"
);
assert_eq!(diag.notes[0].message, "possible column a.first_name");
assert_eq!(diag.notes[1].message, "possible column b.first_name");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happened if a very similar column (e.g. first_neme) was in scope. I guess they would see "possible column a.first_neme" even though there wasn't really ambiguity there because the planner didn't match that column.

I don't think this is desirable. What do you think @comphead @alamb

Comment on lines 304 to 320
.map_err(|e| match &e {
DataFusionError::SchemaError(
SchemaError::FieldNotFound {
field,
valid_fields,
},
_,
) => {
let mut diagnostic = Diagnostic::new_error(
format!("column '{}' not found", &field.name()),
field.spans().first(),
);
add_possible_columns_to_diag(&mut diagnostic, field, valid_fields);
e.with_diagnostic(diagnostic)
}
_ => e,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add hint to errors for missing fields based on Levenshtein distance
5 participants