Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to use cosine similarity in faiss engine #2519

Draft
wants to merge 3 commits into
base: 2.17
Choose a base branch
from

Conversation

VijayanB
Copy link
Member

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-trigger-bot bot and others added 3 commits February 11, 2025 13:06
…) (opensearch-project#2375)

* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 84cfa8e)

Co-authored-by: Vijayan Balasubramanian <[email protected]>
…) (opensearch-project#2412)

* Add cosine similarity support for faiss engine

FAISS engine doesn't support cosine similarity natively.
However we can use inner product to achieve the same, because,
when vectors are normalized then inner product will be same
as cosine similarity. Hence, before ingestion and perform search,
normalize the input vector and add it to faiss index with type
as inner product.

Since we will be storing normalized vector in segments, to get
actual vectors, source can be used. By saving as normalized vector,
we don't have to normalize whenever segments are merged. This will
keep force merge time and search at competitive, provided we will
face additional latency during indexing (one time where we normalize).

We also support radial search for cosine similarity.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant