-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse last projection layer when renaming columns #14684
base: main
Are you sure you want to change the base?
Conversation
|
Quite a speed improvement! I'll look at the code in a bit |
.with_column_renamed("c4", "boom")?; | ||
|
||
assert_eq!("\ | ||
Projection: aggregate_test_100.c1 AS one, aggregate_test_100.c2 AS two, aggregate_test_100.c3, aggregate_test_100.c2 + aggregate_test_100.c3 AS sum AS total\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AS sum AS total is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the thorough reading 🙏 AS sum AS total
happens because there's an alias over an alias - I believe it's harmless since it just wraps the original expression.
There's a function unalias_nested
that can resolve this, but I don't want to call it because it's recursive and expensive. A potentially better option IMO is to reuse the existing alias in alias_qualified
and alias
if it's already there (similar to this PR). This feels like a separate improvement from this PR though
SchemaError::FieldNotFound { .. }, | ||
_, | ||
)) => { | ||
self.plan = LogicalPlan::Projection( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the field is not found why are we not just returning Ok(self)? I am unsure why we would need to add a projection for the plan in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to do that because self
was destructed and so I need to put plan
back. try_new_with_schema
is quite cheap and this shouldn't be happening often anyway
}) | ||
.collect(); | ||
LogicalPlan::Projection(Projection::try_new(expr, input)?) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of duplication here. I wonder if there is a way we could reduce that? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very fair! I also dislike repeating qualifier_rename, field_rename
twice, but it's happening because they reference plan
and must come after its destruction (hence needed twice)
Which issue does this PR close?
Related to #14563
Rationale for this change
Currently, when
with_column_renamed
is called, a new Projection layer is always created. For example, in thedataframe
benchmark, logical plan looks like this:These layers do not affect the query itself (as they are removed by
optimize_projections
), but they make renaming new columns (and optimization itself) quite slow.What changes are included in this PR?
I added an optimization for one edge case - when there's already a projection on top and we're adding a new one. This is a common scenario, especially when renaming many columns (as in the
dataframe
benchmark). Instead of adding a new projection layer on top, we replace the existing one if possible.Are these changes tested?
Extended test case
Are there any user-facing changes?
No