-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Analyzer
within LogicalPlan building stage
#14618
Comments
+1 for the change |
+1 I'd be happy to help here. I can start by taking on moving I opened #14634 to track this |
I would not call them "optimizations". If we do this issue (+1 from me), we should be able to do
Even without introducing new IR, just making current LP the IR. So if LP is to be the real IR, we should also plan on making the Expr a real expression: |
Another important consideration I think is that DataFusion is used from non SQL contexts (for example the DataFrame API and when other languages are compiled into LogicalPlans) The type coercion logic specifically, and likely the other analyzer rules, need to be applied to LogicalPlans that also come from these non SQL sources |
Good point @alamb . This would mean that the LogicalPlan, being the public API for syntactial query building, is inherently not "fully resolved". Which suggests creating new IR might actually be an easier option that "healing" LP to become fully resolved IR itself. |
I agree. In SQL, the transformation follows this path: SQL → Expr → LogicalPlan For DataFrame operations, the transformation is: Expr → LogicalPlan The same principles that apply to the Expr → LogicalPlan transformation in the SQL case should also be considered for DataFrame operations. We need to ensure consistency in handling this transformation. |
A new IR might work I think @wiedld and I were trying to do the "healing" (well really checking if the It isn't yet sidely used, but maybe could be more and more formalized 🤔 |
Is your feature request related to a problem or challenge?
The steps in Logical Layer is Sql->LogicalPlan->Analyzer->Optimizer.
These 5 rules are in
Analyzer
The role of the Analyzer is unclear to me. Having two types of "optimization" after the plan is completed doesn’t seem necessary. Instead, we should have one optimization step during plan construction and another after the plan is finalized. I believe these rules can be placed either in the SQL → LogicalPlan building stage or in the optimizer.
If a rule MUST be executed for plan validity (i.e. TypeCoercion), it should be applied during the plan creation stage, not after the plan is completed. However, if the rule is OPTIONAL for plan completion, it should be applied in the optimizer.
I propose removing the concept of the Analyzer and integrating it into the SQL → LogicalPlan stage. Specifically, TypeCoercion should be applied before the plan is finalized (#14380).
Before moving TypeCoercion into the builder, ExpandWildcardRule needs to be relocated first. The remaining three rules can be moved either into the builder or the optimize
Describe the solution you'd like
Requirement
Rules in the Analyzer are optional, allowing users to choose whether to apply them or add custom rules. This flexibility should be preserved, ensuring that the rule remains optional and customizable even after being moved out of the Analyzer.
Tasks
ExpandWildcardRule
in SQL->LogicalPlan stage (MoveExpandWildcardRule
into Logical Plan construction #14634)TypeCoercion
in SQL->LogicalPlan stageAnalyzer
internally.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: