-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2386/integrate jvector knn engine #2505
base: main
Are you sure you want to change the base?
2386/integrate jvector knn engine #2505
Conversation
3e86bfc
to
125e473
Compare
Signed-off-by: Samuel Herman <[email protected]>
125e473
to
f9c1120
Compare
@sam-herman I think based on our discussion we were thinking to move the JVector not part of the original build and code but to be vended out as a separate engine whose code will be residing in a separate module and also will be a custom build process. The main reason for choosing this approach was:
What is the reason of choosing this approach? Are putting this as a draft PR to provide a context on interfaces we need to change? |
Hi @navneet1v totally agree with the scalability issues mentioned, if we keep adding random libraries it will get bloated. I think we also mentioned that we have two options:
I am working on getting those benchmarks out next week, so how about we treat it as a draft for now to review the correctness of the integration? |
Thanks for the confirmation. |
@@ -331,7 +335,7 @@ task windowsPatches(type:Exec) { | |||
task cmakeJniLib(type:Exec) { | |||
workingDir 'jni' | |||
def args = [] | |||
args.add("cmake") | |||
args.add("/opt/homebrew/bin/cmake") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will move this to resources or as part of a README.md
for jVector demonstration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a not used right now, but it illustrates a suggestion for a more concise abstraction that will also allow to select compound format based on engine.
It also more consistent with how suppliers and constructors are passed in OpenSearch core and avoids back and forth referenced between class to enum and back by being self contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need updates to all of the gradle wrapper?
@@ -316,6 +316,10 @@ dependencies { | |||
} | |||
testFixturesImplementation "org.opensearch:common-utils:${version}" | |||
implementation 'com.github.oshi:oshi-core:6.4.13' | |||
|
|||
implementation 'io.github.jbellis:jvector:4.0.0-beta.2-SNAPSHOT' | |||
implementation 'org.agrona:agrona:1.20.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used? I didnt see in the codec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be runtimeOnly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a transient dependency of jvector, I had to include it because it doesn't resolve automatically. Can make it a runtimeOnly because it's not explicitly used.
// TODO: This needs to be moved under the same package name as the Lucene internal package name for {@link Lucene90CompoundReader} | ||
// this way the internal package constants can be accessed directly and we can avoid duplicating them. | ||
@Log4j2 | ||
public class JVectorCompoundFormat extends CompoundFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me if this is still needed after our discuss? I remember that @navneet1v mentioned on slack that we dont need this. It seems like this is the only reason we'd need to create a custom codec for jvector as opposed to just the knnvectorsformat. So, Id like to get rid of it if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can potentially be removed after RandomAccessReader
for jVector is delegated to IndexInput
.
Description
This change integrates jVector as a JVM based vector format into KNN plugin.
jVector introduces DiskANN search that is purely implemented in Java and doesn't require native dependencies.
Related Issues
Resolves #2386
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.