2386/integrate jvector knn engine #2505

sam-herman · 2025-02-07T01:57:46Z

Description

This change integrates jVector as a JVM based vector format into KNN plugin.
jVector introduces DiskANN search that is purely implemented in Java and doesn't require native dependencies.

Related Issues

Resolves #2386

Check List

[x ] New functionality includes testing.
[ x] New functionality has been documented.
[x ] API changes companion pull request created.
[x ] Commits are signed per the DCO using --signoff.
[x ] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Samuel Herman <[email protected]>

navneet1v · 2025-02-07T02:14:37Z

@sam-herman I think based on our discussion we were thinking to move the JVector not part of the original build and code but to be vended out as a separate engine whose code will be residing in a separate module and also will be a custom build process.

The main reason for choosing this approach was:

Adding engine directly in k-NN plugin is not scaleable.

What is the reason of choosing this approach? Are putting this as a draft PR to provide a context on interfaces we need to change?

sam-herman · 2025-02-07T02:38:43Z

@sam-herman I think based on our discussion we were thinking to move the JVector not part of the original build and code but to be vended out as a separate engine whose code will be residing in a separate module and also will be a custom build process.

The main reason for choosing this approach was:

Adding engine directly in k-NN plugin is not scaleable.

What is the reason of choosing this approach? Are putting this as a draft PR to provide a context on interfaces we need to change?

Hi @navneet1v totally agree with the scalability issues mentioned, if we keep adding random libraries it will get bloated. I think we also mentioned that we have two options:

Provide benchmarks to show that it has some added value.
With the absence of benchmark to make a point of added value, we'll make it part of a "extras" modules folder.

I am working on getting those benchmarks out next week, so how about we treat it as a draft for now to review the correctness of the integration?
Once the benchmarks are out we can decide to refactor to different module or keep as default?
Moreover, I made some sample files that I wanted the reviewer to consider as suggestions for abstraction that would make the project more concise. So I think it's fair to say we can treat it as a draft at the moment to review the approach.

navneet1v · 2025-02-07T02:40:06Z

So I think it's fair to say we can treat it as a draft at the moment to review the approach.

Thanks for the confirmation.

sam-herman · 2025-02-07T15:17:26Z

build.gradle

@@ -331,7 +335,7 @@ task windowsPatches(type:Exec) {
 task cmakeJniLib(type:Exec) {
    workingDir 'jni'
    def args = []
-    args.add("cmake")
+    args.add("/opt/homebrew/bin/cmake")


I will fix this one

sam-herman · 2025-02-07T15:18:33Z

demo.sh

Will move this to resources or as part of a README.md for jVector demonstration

sam-herman · 2025-02-07T15:21:01Z

src/main/java/org/opensearch/knn/index/codec/WrapperCodecForKNNPlugin.java

This file is a not used right now, but it illustrates a suggestion for a more concise abstraction that will also allow to select compound format based on engine.
It also more consistent with how suppliers and constructors are passed in OpenSearch core and avoids back and forth referenced between class to enum and back by being self contained.

jmazanec15

Do we need updates to all of the gradle wrapper?

jmazanec15 · 2025-02-12T18:34:39Z

build.gradle

@@ -316,6 +316,10 @@ dependencies {
    }
    testFixturesImplementation "org.opensearch:common-utils:${version}"
    implementation 'com.github.oshi:oshi-core:6.4.13'
+
+    implementation 'io.github.jbellis:jvector:4.0.0-beta.2-SNAPSHOT'
+    implementation 'org.agrona:agrona:1.20.0'


Where is this used? I didnt see in the codec

should be runtimeOnly

It's a transient dependency of jvector, I had to include it because it doesn't resolve automatically. Can make it a runtimeOnly because it's not explicitly used.

jmazanec15 · 2025-02-12T18:37:46Z

src/main/java/org/opensearch/knn/index/codec/jvector/JVectorCompoundFormat.java

+// TODO: This needs to be moved under the same package name as the Lucene internal package name for {@link Lucene90CompoundReader}
+// this way the internal package constants can be accessed directly and we can avoid duplicating them.
+@Log4j2
+public class JVectorCompoundFormat extends CompoundFormat {


Can you remind me if this is still needed after our discuss? I remember that @navneet1v mentioned on slack that we dont need this. It seems like this is the only reason we'd need to create a custom codec for jvector as opposed to just the knnvectorsformat. So, Id like to get rid of it if possible.

It can potentially be removed after RandomAccessReader for jVector is delegated to IndexInput.

sam-herman requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan, luyuncheng, shatejas, 0ctopus13prime and Vikasht34 as code owners February 7, 2025 01:57

sam-herman force-pushed the 2386/integrate-jvector-knn-engine branch from 3e86bfc to 125e473 Compare February 7, 2025 02:03

introduce jVector to the supported KNN engines

f9c1120

Signed-off-by: Samuel Herman <[email protected]>

sam-herman force-pushed the 2386/integrate-jvector-knn-engine branch from 125e473 to f9c1120 Compare February 7, 2025 02:04

sam-herman commented Feb 7, 2025

View reviewed changes

jmazanec15 reviewed Feb 12, 2025

View reviewed changes

sam-herman mentioned this pull request Feb 12, 2025

[FEATURE] Integrate Jvector engine as another vector engine of choice #2386

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2386/integrate jvector knn engine #2505

2386/integrate jvector knn engine #2505

sam-herman commented Feb 7, 2025

navneet1v commented Feb 7, 2025 •

edited

Loading

sam-herman commented Feb 7, 2025

navneet1v commented Feb 7, 2025

sam-herman Feb 7, 2025

sam-herman Feb 7, 2025

sam-herman Feb 7, 2025

jmazanec15 left a comment

jmazanec15 Feb 12, 2025

tjake Feb 13, 2025

sam-herman Feb 14, 2025

jmazanec15 Feb 12, 2025

sam-herman Feb 14, 2025

2386/integrate jvector knn engine #2505

Are you sure you want to change the base?

2386/integrate jvector knn engine #2505

Conversation

sam-herman commented Feb 7, 2025

Description

Related Issues

Check List

navneet1v commented Feb 7, 2025 • edited Loading

sam-herman commented Feb 7, 2025

navneet1v commented Feb 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmazanec15 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

navneet1v commented Feb 7, 2025 •

edited

Loading