-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early exit on column normalisation to improve DataFrame performance #14636
Conversation
Got +38% increase in
|
FYI @Omega359 |
I'll check this out tomorrow. we've been chatting about our approaches on #14563 |
I think this is a reasonable change. I think it could be incorporated into my upcoming PR to enhance things a bit, especially since I think it may help other dataframe functions such as select(exprs) |
Nice thank you! But let's maybe keep PRs atomic? I plan to do one more (the one I described as a third in the issue), I don't think they overlap with each other? |
Of course we can have them atomic :) |
let column = column.into(); | ||
if column.relation.is_some() { | ||
// column is already normalized | ||
return Ok(column); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a no brainer performance boost. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a no brainer performance boost. Thanks!
I agree -- thank you @blaginin @timsaucer and @Omega359
🚀
Which issue does this PR close?
Related to #14563 (probably more prs to come)
Rationale for this change
Now, when normalizing the column, we always generate
plan.using_columns()
which is recursive and very expensive - and may not be needed if column is already normalizedWhat changes are included in this PR?
Exit early if column already has a relation set. Also, set the relation when
with_column_renamed
is calledAre these changes tested?
Extended a test to assert references
Are there any user-facing changes?
No