Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6652] RelDecorrelator can't decorrelate query with limit 1 #4181

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

suibianwanwank
Copy link
Contributor

This PR handles decorrelate since the query contains limit 1. There are two main types of conversions:

  1. If sorted with no OFFSET and FETCH = 1, and only one collation field, rewrite the Sort as Aggregate using MIN/MAX function.
Example:
  Sort(sort0=[$0], dir0=[ASC], fetch=[1])
    input
Rewrite to:
  Aggregate(group=(corVar), agg=[min($0))
  1. For the more general case,rewrite the sort to be
Aggregate(group=(corVar.. , field..))
  project(first_value(field) over (partition by corVar order by (sort collation)))
     input

@caicancai
Copy link
Member

If no one reviews it, I will start reviewing it this weekend, it may take some time.

@suibianwanwank
Copy link
Contributor Author

If no one reviews it, I will start reviewing it this weekend, it may take some time.

Thank you. If you have any questions or need further clarification, feel free to ask.

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to review some more later, after I understand the relationship between order by and min/max.
Perhaps it's enough to check that the orderby does null handling in the expected way.

// MIN/MAX returns NULL, while LIMIT 1 returns 0 rows.
// However, in the decorrelate, we add correlated variables to the group list
// to ensure equivalence when Correlate is transformed to Join. When the group list
// is non-empty, MIN/MAX will also return 0 rows if input with 0 rows.(This behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if the input has 0 rows."

I think that the references to the other calcite issue should not be in this comment - they will become obsolete if the issue is closed, and they don't really clarify what is going on here.

//
// Rewrite logic:
//
// If sorted with no OFFSET and FETCH = 1, and only one collation field,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check for fetch == 1 is not here (so you can mention that it's enforced by the caller).
Moreover, for symmetry, should this comment be within the if branch?

}

final int newIdx = requireNonNull(frame.oldToNewOutputs.get(collation.getFieldIndex()));
RelBuilder.AggCall aggCall = relBuilder.push(frame.r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering: sorting has collation, but min/max do not.
For example, sorting can specify what happens to nulls.

Copy link
Contributor

@rubenada rubenada Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my exact same thought after a quick look at the PR... is the proposed conversion 100% equivalent in all cases, also when nulls are involved? What about when the original Sort is nulls-first and the relevant field has null values?

Copy link
Contributor Author

@suibianwanwank suibianwanwank Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as mentioned in JIRA, I missed this. This conversion is only valid if nulls-last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have some unit tests with nulls-first, to see the expected result being the conversion not being applied?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that MAX works for NULL LAST only, and MIN for NULL FIRST.
Am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MIN/MAX, null values are ignored in the databases I know (I didn't find a corresponding description in the sql standard). So I think both asc and desc should be null last, i.e.return null only if all values are null.

@suibianwanwank
Copy link
Contributor Author

suibianwanwank commented Feb 7, 2025

@mihaibudiu @rubenada Thanks for your feedback! I've updated the code and added null value test, thanks to review when you have time.

@@ -795,6 +801,146 @@ private static void shiftMapping(Map<Integer, Integer> mapping, int startIndex,
return null;
}

private @Nullable Frame decorrelateFetchOneSort(Sort sort, final Frame frame) {
Copy link
Contributor

@rubenada rubenada Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm a bit too paranoid, but just to be on the safe side, I'd propose the new methods (decorrelateFetchOneSort and decorrelateSortAsAggregate) to be protected instead of private; the reason is that if, for whatever reason, a downstream project wants to "switch off" this conversion, they could simply extend the RelDecorreletor with their own child class and override these methods to simply return null, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need a safe way to avoid errors. Usually the user uses the decorrelate by calling decorrelateQuery. So I think is it possible to add something like SqlToRelConverter#CONFIG (which can be combined with [CALCITE-6674]) to choose whether to enable it or not. WDYT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a cleaner solution, but IMO maybe a bit overkill for this particular purpose. I think protected methods would suffice to provide a way to "fallback". Let's see if there are other opinions on this subject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I've made the changes. Thanks for the suggestion.

Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that the quidem test results have been checked with another database as well.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants