You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The calendar_interval parameter is not supported currently in the date histogram aggregation, this is an outline on its challenges and drafting solutions.
Unlike fixed_interval, calendar_interval may have intervals of different sizes, depending on which timestamp-ranges the months/years/etc. map.
Fixed Interval DateHistogram
Currently the date histogram collection reuses the histogram implementation, which collects sparse by default. That means we don't allocate e.g. a Vec upfront, instead we have a Hashmap:
For every timestamp, we truncate to the nearest bucket timestamp and collect into it.
This behavior allows for "drill-down", where we apply a filter and get a high resolution histogram. Preallocating over min-max of the column may OOM in these cases.
Calendar Aware DateHistogram
With the calendar aware date histogram we have two value spaces, the data stored as UTC and the data converted into a timezone. We want to avoid converting every fetched timestamp into its timezone specific counterpart, ideally the buckets should reflect that.
The simplest solution for calendar_interval would be to reuse the range aggregation by preallocating the ranges. This has two problems:
Filter + high resolution may OOM due to too many buckets
A binary_search to find the bucket may be slow
Potential Solutions
A multi-level data structure that preallocates the top-level and is lazy on lower levels
group buckets into fixed interval ranges and have a similar algorithm as now inside a group, where we truncate to the closest bucket with some metadata
The text was updated successfully, but these errors were encountered:
PSeitz
changed the title
calendar_aware interval in datehistogram
calendar_interval in datehistogram
Jul 26, 2024
The
calendar_interval
parameter is not supported currently in the date histogram aggregation, this is an outline on its challenges and drafting solutions.Unlike
fixed_interval
,calendar_interval
may have intervals of different sizes, depending on which timestamp-ranges the months/years/etc. map.Fixed Interval DateHistogram
Currently the date histogram collection reuses the histogram implementation, which collects sparse by default. That means we don't allocate e.g. a
Vec
upfront, instead we have aHashmap
:For every timestamp, we truncate to the nearest bucket timestamp and collect into it.
This behavior allows for "drill-down", where we apply a filter and get a high resolution histogram. Preallocating over min-max of the column may OOM in these cases.
Calendar Aware DateHistogram
With the calendar aware date histogram we have two value spaces, the data stored as UTC and the data converted into a timezone. We want to avoid converting every fetched timestamp into its timezone specific counterpart, ideally the buckets should reflect that.
The simplest solution for
calendar_interval
would be to reuse the range aggregation by preallocating the ranges. This has two problems:Potential Solutions
The text was updated successfully, but these errors were encountered: