Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate QuantizationStateCache teardown from KNNTestCase #2406

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

owenhalpert
Copy link
Contributor

Description

QuantizationStateCacheManager.getInstance() will create an instance of a QuantizationStateCache if one is not already present. Since the cache is not created by default (like the NativeMemoryCache is), the call to QuantizationStateCacheManager.getInstance().close() in KNNTestCase will wastefully create instances in many cases. There are very few test classes in which a QuantizationStateCache is created, so it will be better to move this logic somewhere more specific.

Importantly, a side effect of the current flow is KNNWeightTests mocks KNNSettings but doesn't mock the QS cache settings, so when the cache is created due to the call above, a NullPointerException is thrown and all the tests in the class fail when running in isolation:
java.lang.NullPointerException: Cannot invoke "org.opensearch.core.common.unit.ByteSizeValue.getKb()" because the return value of "org.opensearch.knn.index.KNNSettings.getSettingValue(String)" is null

This error doesn't show up when running the full suite of tests, presumably because the default setting is applied elsewhere.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • [NA] API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • [NA] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

terminate(threadPool);
QuantizationStateCache.getInstance().close();
Copy link
Contributor Author

@owenhalpert owenhalpert Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call to close() (which solely cancels the maintenance) isn't necessary because of how the RescheduledRunnable is implemented. When we terminate the threadpool on line 46, the next task is rejected and the task is canceled anyway.

I thought I'd include it for clarity and to immediately cancel the task, but I wanted to flag it to see if anyone had thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have it for code clarity. My concern is the setThreadPool(null) pattern.
I know it exists solely for the tests but the threadpool termination and close() should ideally take care of all the cleanup.

Going one step further - I think the setThreadPool should be on a final variable with a non-null check.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess NativeMemoryCacheManager will always exist because it is created as part of plugin initialization. Thats why the NativeMemoryCacheManager.getInstance().close() works butQuantizationStateCacheManager.getInstance().close(); doesn't

Why cant we just have something like public boolean quantizationCacheEnabled; as part of KNNTestCase.java
and inside the resetState() have a check like -

// Clean up the cache
        NativeMemoryCacheManager.getInstance().invalidateAll();
        NativeMemoryCacheManager.getInstance().close();
        if (quantizationCacheEnabled) {
            QuantizationStateCacheManager.getInstance().close();
        }

Only in the tests where we actually have QuantizationStateCacheManager created, we can go set this value to true

@@ -54,6 +54,7 @@ public void tearDown() throws Exception {
client().admin().cluster().updateSettings(clusterUpdateSettingsRequest).get();
NativeMemoryCacheManager.getInstance().close();
terminate(threadPool);
NativeMemoryCacheManager.setThreadPool(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we need this explicitly.

Copy link

@Gankris96 Gankris96 Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the close of NativeMemoryCacheManager should have this. Because that is where the executor.shutdown() is called. The same place should be used to cleanup all tasks. I see the maintenanceTask.cancel() is called. Post that we should clear out the threadpool; no?

Actually, this should not even be required as @kotwanikunal mentioned

Copy link
Contributor Author

@owenhalpert owenhalpert Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gankris96 the only consideration there is that the threadpool is set only once in KNNPlugin where we have access to pass in OpenSearch's threadpool. Once close() is called and that is set to null, we would lose that reference unless KNNPlugin.createComponents is called again. I would need to investigate the impact of getting rid of that reference fully.

Copy link

@Gankris96 Gankris96 Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that the threadpool is getting shutdown in the tests before you clear out the QSCache object. That is what causes the threadpool != null check to succeed but the mockNode.shutDown() terminates the threadpool.

For example

is where the node is closed, releasing all non-active threadpool assignments. In your case it wouldn't be active since QuantizationStateCache has not been initiailzed but you are setting it on a static variable.

@@ -42,8 +42,10 @@ public void setThreadPool() {
}

@After
public void terminateThreadPool() {
public void stopMaintenance() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use cleanup() or tearDown() as a generic method name for test class cleanup.

Signed-off-by: owenhalpert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants