[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

rishabhmaurya · 2024-08-15T19:11:21Z

Is your feature request related to a problem? Please describe

Currently the logic to pick Ordinals vs DirectCollector is dynamic and based on the number of ordinals to collect and memory overhead. If memory overhead is high for OrdinalsCollector, then DirectCollector is used. Due to https://issues.apache.org/jira/browse/LUCENE-9663 few users are reporting regression in Cardinality aggregation because of slower DirectCollector after replacing prefix compression with LZ4 compression for terms dictionary in lucene 8.9.
Fix proposed here is to use OrdinalsCollector more often which will collect the ordinals into a bitset first and then performs term lookup in postCollect() of segment and that's a lot faster.
However, we don't have a way to control picking up OrdinalsCollector in OpenSearch.

Describe the solution you'd like

Introduce a memory threshold dynamic setting which OrdinalsCollector can use and if its usage is under this threshold, always use OrdinalsCollector. This logic can be added here.

Related component

Search:Aggregations

Describe alternatives you've considered

Use of eager_global_ordinals or murmur hash, but its an index time setting and can have impact on indexing performance. Its discussed in more detail here.

Additional context

No response

The text was updated successfully, but these errors were encountered:

mch2 · 2024-08-21T16:09:43Z

Thanks @rishabhmaurya seems like something we should do, am removing untriaged.

maitreya2954 · 2024-08-28T20:56:07Z

@mch2 I would like to work on this. I am new to contributing to OpenSearch. So, would appreciate any guidance.

jainankitk · 2024-08-28T20:58:51Z

Thanks @maitreya2954 for volunteering, assigned it to you!

rishabhmaurya · 2024-08-28T21:40:47Z

@maitreya2954 Appreciate you picking this up. Here is what we can do -

Introduce execution_hint for cardinality aggregation similar to term aggregation, with values: direct, ordinal to start with.
The default behaviour should be current behaviour. When execution_hint is provided, override the collector to be picked here and don't honour this condition here.
Add unit tests for all 3 cases.
Add documentation here by opening a documentation issue on this repo.

If you're blocked don't hesitate to ask for pointers. Thank you.

maitreya2954 · 2024-09-09T14:56:03Z

@rishabhmaurya I added execution hint field to the cardinality agg. However, I am stuck at adding unit tests.

Add unit tests for all 3 cases.

3 cases being: direct, ordinal and when no value is provided. right?

Here's my question: What type of query should I test after setting the execution hint. I can see many different type of queries being tested here. Should I just do MatchAllDocsQuery?

rishabhmaurya · 2024-09-09T16:15:11Z

@maitreya2954 That's great that you were able to add execution hint.

3 cases being: direct, ordinal and when no value is provided. right?

yes.

What type of query should I test after setting the execution hint. I can see many different type of queries being tested here. Should I just do MatchAllDocsQuery?

You can test for both match_all query and a filter using term query on a field other than cardinality aggregation field.

rishabhmaurya · 2024-09-09T16:23:36Z

@maitreya2954 how were you thinking about checking which collector is picked from tests?

One way could add a method to this class to get the current collector picked for a given segment and add a logic here to store the name/class -

OpenSearch/test/framework/src/main/java/org/opensearch/search/aggregations/AggregatorTestCase.java

Line 1207 in 978d14e

    
           public LeafBucketCollector getLeafCollector(LeafReaderContext ctx) throws IOException {

maitreya2954 · 2024-09-13T20:13:03Z

@rishabhmaurya Here's the PR: #15764

Documentation PR: opensearch-project/documentation-website#8265

rishabhmaurya · 2024-09-13T20:49:52Z

@maitreya2954 great, thanks! I will take a look shortly. Meanwhile you can work on fixing gradle checks which are failing.

getsaurabh02 · 2024-10-02T16:37:29Z

Thanks @maitreya2954 for raising the PR. Is it something we can target for 2.18?
cc: @rishabhmaurya

maitreya2954 · 2024-10-03T18:26:09Z

@rishabhmaurya From the conversation in the PR, I am assuming we are going to add the cluster setting as well for the memory threshold. Can you guide me to a previous PR that added a cluster setting for reference.

@getsaurabh02 I am not sure, how long the new changes might take. @rishabhmaurya can we target for 2.18.

sandeshkr419 · 2024-12-04T17:24:38Z

@maitreya2954 Are you still working on it?

maitreya2954 · 2024-12-16T15:50:55Z

@sandeshkr419 Yes, I will start working was busy with something else. Since, I am very new to this project, can you guide me to a previous PR that added a cluster setting. I will refer that and make changes. Thank you

asimmahmood1 · 2025-02-17T19:50:53Z

Plan to follow up to #17312

rishabhmaurya added enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers untriaged labels Aug 15, 2024

rishabhmaurya added this to Search Project Board Aug 15, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Aug 15, 2024

github-actions bot added the Search:Aggregations label Aug 15, 2024

mch2 removed the untriaged label Aug 21, 2024

jainankitk assigned maitreya2954 Aug 28, 2024

maitreya2954 linked a pull request Sep 5, 2024 that will close this issue

Introduce execution hint for Cardinality aggregation #15764

Open

3 tasks

This was referenced Feb 3, 2025

[Feature Request] Use of Binary DocValue for high cardinality fields to improve aggregations performance #16837

Open

Introduce execution hint for Cardinality aggregation #17301

Closed

Introduce execution_hint for Cardinality aggregation #17312

Open

getsaurabh02 added this to Performance Roadmap Feb 17, 2025

github-project-automation bot moved this to Todo in Performance Roadmap Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

rishabhmaurya commented Aug 15, 2024 •

edited

Loading

mch2 commented Aug 21, 2024

maitreya2954 commented Aug 28, 2024 •

edited

Loading

jainankitk commented Aug 28, 2024

rishabhmaurya commented Aug 28, 2024

maitreya2954 commented Sep 9, 2024

rishabhmaurya commented Sep 9, 2024 •

edited

Loading

rishabhmaurya commented Sep 9, 2024

maitreya2954 commented Sep 13, 2024 •

edited

Loading

rishabhmaurya commented Sep 13, 2024

getsaurabh02 commented Oct 2, 2024

maitreya2954 commented Oct 3, 2024

sandeshkr419 commented Dec 4, 2024

maitreya2954 commented Dec 16, 2024

asimmahmood1 commented Feb 17, 2025

[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

Comments

rishabhmaurya commented Aug 15, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

mch2 commented Aug 21, 2024

maitreya2954 commented Aug 28, 2024 • edited Loading

jainankitk commented Aug 28, 2024

rishabhmaurya commented Aug 28, 2024

maitreya2954 commented Sep 9, 2024

rishabhmaurya commented Sep 9, 2024 • edited Loading

rishabhmaurya commented Sep 9, 2024

maitreya2954 commented Sep 13, 2024 • edited Loading

rishabhmaurya commented Sep 13, 2024

getsaurabh02 commented Oct 2, 2024

maitreya2954 commented Oct 3, 2024

sandeshkr419 commented Dec 4, 2024

maitreya2954 commented Dec 16, 2024

asimmahmood1 commented Feb 17, 2025

rishabhmaurya commented Aug 15, 2024 •

edited

Loading

maitreya2954 commented Aug 28, 2024 •

edited

Loading

rishabhmaurya commented Sep 9, 2024 •

edited

Loading

maitreya2954 commented Sep 13, 2024 •

edited

Loading