You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe
As part of tiered caching's 3.0 changes (#17303) we want to expose some new stats for the IndicesRequestCache.
Most importantly, we should report the fraction of requests that are or aren't cacheable as decided in IndicesService.canCache(). This can help users decide whether to enable tiered caching (if the cacheable fraction is low, it may not help much). It can also help them decide whether to change indices.requests.cache.maximum_cacheable_size to allow some size > 0 queries into the cache, and it can help verify that more queries are entering the cache after making that change.
We also might expose some caching latency stat. This could expose issues, for example if the disk cache is performing poorly.
Tracking these is simple but exposing them in an API has some complications. Currently (for the IRC) there are 2 relevant APIs: _nodes/stats/indices/request_cache and nodes/stats/caches/request_cache.
The first API is older and is tracked by ShardRequestCache, which has callbacks that the IndicesRequestCache class directly calls. Responses can be aggregated by index or shard. We plan to remove this code in 3.0, and ideally, hitting this API would just return the exact same content as the second API. So, we shouldn't add new functionality here.
The second API is newer and the stats are tracked by the ICache implementation itself. CacheService holds a map from cache type (for example REQUEST_CACHE) to ICache instance, and when the API is hit, the instance provides its stats. This allows more info, for example, aggregating by tier in the TieredSpilloverCache. In the future, other caches which use the ICache framework will get a _nodes/stats/caches/... API as well. Responses can be aggregated by index or shard (for the IRC), and ICache implementations can allow other dimensions like tier.
However, the cacheable-fraction stat is specific to the request cache, because it relies on IndicesService's logic. It's not like hits, misses, etc which make sense for every cache and are already tracked by the ICache implementation. So the ICache implementation can't be the one responsible for tracking it. It also doesn't really make sense to aggregate it by the levels we can use to aggregate the rest of the cache stats response. For example it makes no sense to aggregate this value by cache tier. It would make sense to aggregate by index or shard, but I'm not sure it would be useful and it would make tracking more complex.
Caching latency on the other hand might make sense to be tracked by every ICache implementation, in the way hits, misses, etc, are right now.
Describe the solution you'd like
I'm not totally happy with any solution I've come up with so please leave feedback if you have ideas.
The best solution I've come up with is to allow some non-aggregateable fields in the newer cache stats API. These would be specific to each cache type (REQUEST_CACHE, QUERY_CACHE, etc) and would not be tracked by the ICache implementation. They would be tracked at the CacheService level. IndicesService already requires a CacheService to be passed in through the constructor so it can instantiate its IRC in the first place, and the same will be true for any other services that need to instantiate an ICache, so communicating the stats to the CacheService shouldn't be difficult.
For the request cache these extra fields would be something like "total_requests" and "cacheable_requests". We'd expect "cacheable_requests" to equal hits + misses so it may not be necessary. So the response from _nodes/stats/caches/request_cache might look like:
"request_cache" : {
"size_in_bytes" : 5000000,
"evictions" : 500,
"hit_count" : 700,
"miss_count" : 800,
"item_count" : 400,
"total_requests": 4300, // new field
"cacheable_requests": 1500, // new field
"store_name" : "tiered_spillover"
}
And if you aggregated by some value, like _nodes/stats/caches/request_cache?level=tier, that aggregation would show up underneath the total values like it does now, but the new extra fields wouldn't appear in the aggregation:
With this setup it's simple to add new non-generic stats for other future caches. For example in the query cache if we decided to track something around the frequency-based entry policy, which similarly doesn't make sense to aggregate by shard/index/tier, that could just show up in the response body like it does for the request cache:
If we decided caching latency made sense for all ICaches to track we could just add some cumulative total_caching_latency into the StatsHolder objects, and it'd show up in both the total and aggregated sections of the response.
Related component
Search:Performance
Describe alternatives you've considered
Have ShardRequestCache track it directly. This allows aggregation by shard/index along with the rest of the old cache stats API. But aggregation by these levels probably isn't useful, and we're getting rid of this code very soon anyway. It's also not easily extendable to new caches/stats and couldn't easily appear in the new cache stats API.
Use some notification-based method rather than tracking these stats in memory and exposing them through an API. I thought I'd heard the otel plugin could be used like this for stats, but from what I can tell it's just for more detailed request traces? Maybe I'm missing something here.
Have IndicesService track the cacheability stats itself, and break it down by index/shard so it can be aggregated according to whatever levels the overall API call requests. Then provide it to CacheService somehow on API call. I think this would be way overcomplicating it for little gain, and it wouldn't work for all combinations of levels (what happens when a user aggregates by shard+tier? Do we make two separate response sections, one for stats that can be aggregated by tier, and one for the cacheability stats that can't?) This is also not easily extensible.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe
As part of tiered caching's 3.0 changes (#17303) we want to expose some new stats for the IndicesRequestCache.
Most importantly, we should report the fraction of requests that are or aren't cacheable as decided in
IndicesService.canCache()
. This can help users decide whether to enable tiered caching (if the cacheable fraction is low, it may not help much). It can also help them decide whether to changeindices.requests.cache.maximum_cacheable_size
to allow some size > 0 queries into the cache, and it can help verify that more queries are entering the cache after making that change.We also might expose some caching latency stat. This could expose issues, for example if the disk cache is performing poorly.
Tracking these is simple but exposing them in an API has some complications. Currently (for the IRC) there are 2 relevant APIs:
_nodes/stats/indices/request_cache
andnodes/stats/caches/request_cache
.The first API is older and is tracked by
ShardRequestCache
, which has callbacks that theIndicesRequestCache
class directly calls. Responses can be aggregated by index or shard. We plan to remove this code in 3.0, and ideally, hitting this API would just return the exact same content as the second API. So, we shouldn't add new functionality here.The second API is newer and the stats are tracked by the
ICache
implementation itself.CacheService
holds a map from cache type (for exampleREQUEST_CACHE
) toICache
instance, and when the API is hit, the instance provides its stats. This allows more info, for example, aggregating by tier in the TieredSpilloverCache. In the future, other caches which use theICache
framework will get a_nodes/stats/caches/...
API as well. Responses can be aggregated by index or shard (for the IRC), andICache
implementations can allow other dimensions liketier
.However, the cacheable-fraction stat is specific to the request cache, because it relies on
IndicesService
's logic. It's not like hits, misses, etc which make sense for every cache and are already tracked by theICache
implementation. So theICache
implementation can't be the one responsible for tracking it. It also doesn't really make sense to aggregate it by the levels we can use to aggregate the rest of the cache stats response. For example it makes no sense to aggregate this value by cache tier. It would make sense to aggregate by index or shard, but I'm not sure it would be useful and it would make tracking more complex.Caching latency on the other hand might make sense to be tracked by every
ICache
implementation, in the way hits, misses, etc, are right now.Describe the solution you'd like
I'm not totally happy with any solution I've come up with so please leave feedback if you have ideas.
The best solution I've come up with is to allow some non-aggregateable fields in the newer cache stats API. These would be specific to each cache type (
REQUEST_CACHE
,QUERY_CACHE
, etc) and would not be tracked by theICache
implementation. They would be tracked at theCacheService
level.IndicesService
already requires aCacheService
to be passed in through the constructor so it can instantiate its IRC in the first place, and the same will be true for any other services that need to instantiate anICache
, so communicating the stats to theCacheService
shouldn't be difficult.For the request cache these extra fields would be something like "total_requests" and "cacheable_requests". We'd expect "cacheable_requests" to equal hits + misses so it may not be necessary. So the response from
_nodes/stats/caches/request_cache
might look like:And if you aggregated by some value, like
_nodes/stats/caches/request_cache?level=tier
, that aggregation would show up underneath the total values like it does now, but the new extra fields wouldn't appear in the aggregation:With this setup it's simple to add new non-generic stats for other future caches. For example in the query cache if we decided to track something around the frequency-based entry policy, which similarly doesn't make sense to aggregate by shard/index/tier, that could just show up in the response body like it does for the request cache:
If we decided caching latency made sense for all
ICache
s to track we could just add some cumulativetotal_caching_latency
into the StatsHolder objects, and it'd show up in both the total and aggregated sections of the response.Related component
Search:Performance
Describe alternatives you've considered
ShardRequestCache
track it directly. This allows aggregation by shard/index along with the rest of the old cache stats API. But aggregation by these levels probably isn't useful, and we're getting rid of this code very soon anyway. It's also not easily extendable to new caches/stats and couldn't easily appear in the new cache stats API.otel
plugin could be used like this for stats, but from what I can tell it's just for more detailed request traces? Maybe I'm missing something here.IndicesService
track the cacheability stats itself, and break it down by index/shard so it can be aggregated according to whatever levels the overall API call requests. Then provide it to CacheService somehow on API call. I think this would be way overcomplicating it for little gain, and it wouldn't work for all combinations of levels (what happens when a user aggregates by shard+tier? Do we make two separate response sections, one for stats that can be aggregated by tier, and one for the cacheability stats that can't?) This is also not easily extensible.Additional context
No response
The text was updated successfully, but these errors were encountered: