-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269
Comments
Thanks @rishabhmaurya seems like something we should do, am removing untriaged. |
@mch2 I would like to work on this. I am new to contributing to OpenSearch. So, would appreciate any guidance. |
Thanks @maitreya2954 for volunteering, assigned it to you! |
@maitreya2954 Appreciate you picking this up. Here is what we can do -
If you're blocked don't hesitate to ask for pointers. Thank you. |
@rishabhmaurya I added execution hint field to the cardinality agg. However, I am stuck at adding unit tests.
3 cases being: Here's my question: What type of query should I test after setting the execution hint. I can see many different type of queries being tested here. Should I just do |
@maitreya2954 That's great that you were able to add execution hint.
yes.
You can test for both match_all query and a filter using term query on a field other than cardinality aggregation field. |
@maitreya2954 how were you thinking about checking which collector is picked from tests? One way could add a method to this class to get the current collector picked for a given segment and add a logic here to store the name/class - OpenSearch/test/framework/src/main/java/org/opensearch/search/aggregations/AggregatorTestCase.java Line 1207 in 978d14e
|
@rishabhmaurya Here's the PR: #15764 Documentation PR: opensearch-project/documentation-website#8265 |
@maitreya2954 great, thanks! I will take a look shortly. Meanwhile you can work on fixing gradle checks which are failing. |
Thanks @maitreya2954 for raising the PR. Is it something we can target for 2.18? |
@rishabhmaurya From the conversation in the PR, I am assuming we are going to add the cluster setting as well for the memory threshold. Can you guide me to a previous PR that added a cluster setting for reference. @getsaurabh02 I am not sure, how long the new changes might take. @rishabhmaurya can we target for 2.18. |
@maitreya2954 Are you still working on it? |
@sandeshkr419 Yes, I will start working was busy with something else. Since, I am very new to this project, can you guide me to a previous PR that added a cluster setting. I will refer that and make changes. Thank you |
Plan to follow up to #17312 |
Is your feature request related to a problem? Please describe
Currently the logic to pick
Ordinals
vsDirectCollector
is dynamic and based on the number of ordinals to collect and memory overhead. If memory overhead is high forOrdinalsCollector
, thenDirectCollector
is used. Due to https://issues.apache.org/jira/browse/LUCENE-9663 few users are reporting regression in Cardinality aggregation because of slowerDirectCollector
after replacing prefix compression with LZ4 compression for terms dictionary in lucene 8.9.Fix proposed here is to use OrdinalsCollector more often which will collect the ordinals into a bitset first and then performs term lookup in
postCollect()
of segment and that's a lot faster.However, we don't have a way to control picking up
OrdinalsCollector
in OpenSearch.Describe the solution you'd like
Introduce a memory threshold dynamic setting which OrdinalsCollector can use and if its usage is under this threshold, always use OrdinalsCollector. This logic can be added here.
Related component
Search:Aggregations
Describe alternatives you've considered
Use of eager_global_ordinals or murmur hash, but its an index time setting and can have impact on indexing performance. Its discussed in more detail here.
Additional context
No response
The text was updated successfully, but these errors were encountered: