-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce node level circuit breaker settings for k-NN #2509
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Mark Wu <[email protected]>
*/ | ||
private String getKnnCircuitBreakerLimitForNode(Settings settings) { | ||
// Get this node's circuit breaker tier attribute | ||
String tierAttribute = clusterService.localNode().getAttributes().get(KNN_CIRCUIT_BREAKER_TIER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running into an issue very similar to #2223 for the build failures
java.lang.AssertionError: should not be called by a cluster state applier. reason [the applied cluster state is not yet available]
» at org.opensearch.cluster.service.ClusterApplierService.assertNotCalledFromClusterStateApplier(ClusterApplierService.java:443)
» at org.opensearch.cluster.service.ClusterApplierService.state(ClusterApplierService.java:229)
» at org.opensearch.cluster.service.ClusterService.state(ClusterService.java:182)
» at org.opensearch.cluster.service.ClusterService.localNode(ClusterService.java:166)
» at org.opensearch.knn.index.KNNSettings.getKnnCircuitBreakerLimitForNode(KNNSettings.java:652)
Since node attributes are not dynamically configurable currently we can cache the value on node initialization when we refresh the cache. This would prevent the update thread from trying to access the attributes via the cluster state.
Signed-off-by: Mark Wu <[email protected]>
@@ -106,5 +109,26 @@ public void initialize(ThreadPool threadPool, ClusterService clusterService, Cli | |||
} | |||
}; | |||
this.threadPool.scheduleWithFixedDelay(runnable, TimeValue.timeValueSeconds(CB_TIME_INTERVAL), ThreadPool.Names.GENERIC); | |||
|
|||
// Update when node is fully joined | |||
clusterService.addLifecycleListener(new LifecycleListener() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node attributes aren't available during the time the cache manager computes its size on initialization. With the additional dependency on node attributes to the circuit breaker we may need to check if the circuit breaker needs to be updated after the node has bootstrapped.
I only see 1 example of a listener being attached in here but attached the listener on KnnCircuitBreaker initialization due needing to refresh the cache to recompute the size and caching the attribute value. If there's any feedback on this that'd be appreciated.
Description
KNN plugin currently uses a cluster-wide circuit breaker. This doesn't work as well as it could when nodes have different memory capacities.
Solution
Added node-specific circuit breaker limits using node attributes. This opens up flexibility for heterogenous circuit breaker limits.
Usage
opensearch.yml
:Implementation
groupSetting
for dynamic limit configurationsTesting
Modified KNNCircuitBreakerIT integration tests to include node-level CB.
Used OSB benchmarking to run a modified low load test with/without the node level circuit breaker.
node.attr.knn_cb_tier = 'integ'
to build.gradleInitial state (no limits set):
After setting node limit to 500000kb:
Setting cluster limit to 5kb didn't affect node with specific limit which confirms proper override behavior.
Related Issues
Resolves #2263
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.