-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve EnforceSorting docs. #14673
base: main
Are you sure you want to change the base?
Improve EnforceSorting docs. #14673
Conversation
// TODO(xudong963): the plans are not mutated, only the `data` attribute is set. | ||
// Therefore this should be called before this function. | ||
node_and_ctx.update_plan_from_children() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update this after @xudong963 's PR #14650 merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI maybe we could do something like I suggest here
To avoid having to remember to call update_plan_from_children
as much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this TODO comment, since we'll have a race anyways btwn which PR merges first. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wiedld -- this looks great
cc @xudong963
/// whether we elect to transform [`CoalescePartitionsExec`] + [`SortExec`] cascades | ||
/// into [`SortExec`] + [`SortPreservingMergeExec`] cascades, which enables us to | ||
/// perform sorting in parallel. | ||
/// If `repartition_sorts` is enabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is greaet -- thank you @wiedld
I am not sure it will be rendered as a doc if it is listed on the impl
-- perhaps it would be better here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to fix up these docs more than this, since the flowcharts refer to the Done. @alamb you may want to take another look.parallelize_sorts
part of EnforceSorting, and not to everything that this optimization pass does.
// TODO(xudong963): the plans are not mutated, only the `data` attribute is set. | ||
// Therefore this should be called before this function. | ||
node_and_ctx.update_plan_from_children() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI maybe we could do something like I suggest here
To avoid having to remember to call update_plan_from_children
as much
/// A node context object beneficial for writing optimizer rules. | ||
/// This context encapsulating an [`ExecutionPlan`] node with a payload. | ||
/// | ||
/// Since each wrapped node has it's children within both the [`PlanContext.plan.children()`], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
this is indeed quite subtle -- it would be awesome if we could find some way to make it harder to forget to call update_plan_from_children
/// whether we elect to transform [`CoalescePartitionsExec`] + [`SortExec`] cascades | ||
/// into [`SortExec`] + [`SortPreservingMergeExec`] cascades, which enables us to | ||
/// perform sorting in parallel. | ||
/// Performs optimizations based upon a series of subrules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/// sort first on a per-partition basis, thereby parallelizing the sort. | ||
/// | ||
/// | ||
/// The outcome is that plans of the form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great reading experience, thanks @wiedld
Which issue does this PR close?
Helps with the docs effort #7013.
Rationale for this change
Noticed while reviewing #14650, that some of the EnforceSorting docs could be improved.
What changes are included in this PR?
Only docs. No code.
Are these changes tested?
N/A
Are there any user-facing changes?
N/A