-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate QuantizationStateCache teardown from KNNTestCase #2406
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: owenhalpert <[email protected]>
terminate(threadPool); | ||
QuantizationStateCache.getInstance().close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call to close()
(which solely cancels the maintenance) isn't necessary because of how the RescheduledRunnable is implemented. When we terminate the threadpool on line 46, the next task is rejected and the task is canceled anyway.
I thought I'd include it for clarity and to immediately cancel the task, but I wanted to flag it to see if anyone had thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have it for code clarity. My concern is the setThreadPool(null)
pattern.
I know it exists solely for the tests but the threadpool termination and close() should ideally take care of all the cleanup.
Going one step further - I think the setThreadPool
should be on a final variable with a non-null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess NativeMemoryCacheManager will always exist because it is created as part of plugin initialization. Thats why the NativeMemoryCacheManager.getInstance().close()
works butQuantizationStateCacheManager.getInstance().close();
doesn't
Why cant we just have something like public boolean quantizationCacheEnabled;
as part of KNNTestCase.java
and inside the resetState()
have a check like -
// Clean up the cache
NativeMemoryCacheManager.getInstance().invalidateAll();
NativeMemoryCacheManager.getInstance().close();
if (quantizationCacheEnabled) {
QuantizationStateCacheManager.getInstance().close();
}
Only in the tests where we actually have QuantizationStateCacheManager created, we can go set this value to true
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]>
… schedule tasks on the shutdown pools Signed-off-by: owenhalpert <[email protected]>
@@ -54,6 +54,7 @@ public void tearDown() throws Exception { | |||
client().admin().cluster().updateSettings(clusterUpdateSettingsRequest).get(); | |||
NativeMemoryCacheManager.getInstance().close(); | |||
terminate(threadPool); | |||
NativeMemoryCacheManager.setThreadPool(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think we need this explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the close of NativeMemoryCacheManager should have this. Because that is where the executor.shutdown()
is called. The same place should be used to cleanup all tasks. I see the maintenanceTask.cancel()
is called. Post that we should clear out the threadpool; no?
Actually, this should not even be required as @kotwanikunal mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gankris96 the only consideration there is that the threadpool is set only once in KNNPlugin where we have access to pass in OpenSearch's threadpool. Once close() is called and that is set to null, we would lose that reference unless KNNPlugin.createComponents
is called again. I would need to investigate the impact of getting rid of that reference fully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is that the threadpool is getting shutdown in the tests before you clear out the QSCache object. That is what causes the threadpool != null
check to succeed but the mockNode.shutDown()
terminates the threadpool.
For example
mockNode.close(); |
@@ -42,8 +42,10 @@ public void setThreadPool() { | |||
} | |||
|
|||
@After | |||
public void terminateThreadPool() { | |||
public void stopMaintenance() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can use cleanup() or tearDown() as a generic method name for test class cleanup.
Signed-off-by: owenhalpert <[email protected]>
Description
QuantizationStateCacheManager.getInstance()
will create an instance of a QuantizationStateCache if one is not already present. Since the cache is not created by default (like the NativeMemoryCache is), the call toQuantizationStateCacheManager.getInstance().close()
in KNNTestCase will wastefully create instances in many cases. There are very few test classes in which a QuantizationStateCache is created, so it will be better to move this logic somewhere more specific.Importantly, a side effect of the current flow is KNNWeightTests mocks KNNSettings but doesn't mock the QS cache settings, so when the cache is created due to the call above, a NullPointerException is thrown and all the tests in the class fail when running in isolation:
java.lang.NullPointerException: Cannot invoke "org.opensearch.core.common.unit.ByteSizeValue.getKb()" because the return value of "org.opensearch.knn.index.KNNSettings.getSettingValue(String)" is null
This error doesn't show up when running the full suite of tests, presumably because the default setting is applied elsewhere.
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.