-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Partial loading with FAISS engine. #2401
Comments
@0ctopus13prime this is an interesting finding. I think we should dig more on this. Lets have a gh issue on this for further deep-dive. cc: @vamshin , @jmazanec15 , @kotwanikunal |
@0ctopus13prime Great work on the RFC. The code dive and walkthrough is amazing.
In this case, the Lucene numbers you are comparing with are standard Lucene runs which are also covered as a part of https://opensearch.org/benchmarks/ right?
Would this mean Lucene should perform better (given that it does less work) w.r.t Faiss? I know there are fundamental differences like a Java v/s C++ implementation, but purely in terms of the work complexity, Lucene should be more optimal right? |
@navneet1v, @kotwanikunal Sorry for the confusion, but let me rerun the benchmark and post the results to make sure all engine have the same recall, precision. |
@navneet1v |
I had a naive question on this -- from this dashboard the Lucene 10m Cohere perf seems to be worse than the FAISS 10m Cohere perf on both search and indexing, which is opposite of what we're discussing here. Are those perf runs unrelated to this discussion or is there another piece I am missing? |
@0ctopus13prime Thanks for the great and detailed writeup. I have couple of questions here:
Any reason to not use FAISS search algorithm and instead use Lucene one under the hood ? It seems to me that it will be confusing for users to see different results based on different loading mode specially when data representation and engine is same. Also it looks like for cases with single segment, the recall is also getting affected ? One of the use case I am thinking was changing the partial loading mode while tiering the index to cheaper tier (i.e. warm tier). It will be confusing if search returns different results in different tiers.
In similar tiering use case like above, it will be useful to update the partial loading mode setting dynamically to avoid downtime during the tiering of indices. Otherwise, live tiering will be difficult to achieve as one needs to close the index and index will not be searchable for that time. We can treat field level option as an override to the index level setting. Also I am not sure if field level is really needed currently since it will be complicated for a user to think for each field. They will think from use case perspective and make trade-offs between performance and cost based on available resources to support the use case. So providing at index level will fit most of the users requirement. Other way to look at it will be, today users are making similar choices while choosing Faiss vs Lucene engine during index creation and understand the trade off upfront. So similarly they can make the same trade-off at index level with loading mode. |
I don't think closing and updating the index is necessary. Right now we don't allow any mapping attribute of the k-NN field to be updatable, but we can very well do that and avoid this opening and closing thing of the index.
Also I kind of agree on few points here with @sohami. I would like to keep this setting at an index level to start with and then move this to field level only when there is a customer need. Putting this at field level sounds very attractive option but updating the value is always very painful as it requires building a complex mapping request.
Just to clarify here @sohami, user don't make the engine choice at an index level they make the choice at a field level. In an index you can very well have multiple vector fields with different engine on each field.
+1 on this. But while reading the RFC I see you have mentioned at some place that we should use Lucene search algorithm and at some place adopting the faiss search algorithm. Can you please clarify this. Ref the below quoted sentences from the RFC
Along with that and I think we have discussed this in past,
But having said that there are pros on using the Lucene HNSWGraphSearcher algo which are:
I believe we should dig little bit more deep here to see what should be our the right strategy. On performance benchmarks I am having a hard time to understand what is being benchmarked and how to interpret the results, so can you please add the following details which will help in understanding the benchmarks properly
|
Probably a separate discussion but why is it at field level ? Do users really use different engine at per field level ? |
Since k-NN plugin supports multiple vector fields putting an engine at an index level is not a good option since it limits the usability. A simple example can be user may want to use Native engine with IVF algorithm with quantization since the vector field is of higher dimension, but Lucene engine for a smaller dimension. There are multiple other usecases that is there. Feel free to cut a GH issue to kick off the conversation if you think engine should be at an index setting. :) It would be an interesting discussion to have. |
@0ctopus13prime May be this could be something which can be used if a Java wrapper for FAISS is available. We can probably see if the partial load mechanism can be added there directly. |
Lucene does not prefetch during search for vectors I think. Vector search in lucene specifically use madvise MADV_RANDOM system call to tell kernel to not prefetch to optimize HNSW search considering the nature of graph traversal\
We should rethink this if else structure, this can quickly become hard to maintain. A strategy pattern will be helpful here
Have you looked into introducing an interface here considering these are two different implementations? If not we should look into it instead of branching with if else and adding code in the same file
Any affect on NativeEngineKnnVectorQuery with partial loading? |
Hi @sohami, regarding to the setting I'm open to both have it at field level versus index level. Will follow other ppl's opinion on that. For this one
Answer : I'm sure people will pick up the difference very sooner. Fundamentally, it is not that different from what Lucene is using Directory as a storage layer under the hood. User who understands the current OpenSearch KNN will quickly understand the concept of 'partial loading' - IndexInput to load bytes vs Load everything off-heap.
Answer : If we pour pretty big data into a single segment, the recall is likely getting affected. For that case, queue size during search should be adjusted. What I can guarantee with partial loading is that it will always produce the same result as long as with the same hyper parameters that was used for FAISS was provided. |
For the setting: I'm truly open to whether put it at field level or index level. Will follow other folks opinion.
Answer : No, it is not likely. FAISS file format will guarantee the compatibility (my question), therefore already working logic will be guaranteed to locate file and load. The issue is when the introduced a new file format. In that case, we need to catch up, but it is different from breaking you mentioned.
Answer : This is not true. Multi KNN collector can benefit for both cases regardless - Lucene HNSWGraphSearcher or FAISS ported one. But as we discussed in the past, hoping to have the FAISS ported version to make sure regardless of its mode, user always can get the same result as possible. One of the callout brought in the past discussion was a keep-up cost for the Lucene API changes overtime - We want to avoid this.
Answer : Sorry, but could you directly drop the question here? I can answer that. |
Answer : No, it's a wrapper and we could leverage it to minimize JNI layer, but it still loads everything into memory. |
@shatejas
Answer : sure, happy to discuss in PR.
Answer : sure, I think we already started the convo, happy to discuss more. But I generally agree with you.
Answer : I don't think there will be side effect on it. I wish we could have collector enabled within |
Partial Loading Low Level Design
1. Objective
OpenSearch KNN plugin supports three engines: NMSLIB, FAISS, and Lucene.
The first two native engines, NMSLIB and FAISS, require all vector-related data structures (such as HNSW graphs) to be loaded into memory for search operation.
For large workloads__, this memory cost can quickly become substantial if quantization techniques are not applied.
_Therefore, 'Partial Loading' must be enabled as an option in native engines to control the available memory for KNN search. _
The objective of partial loading is twofold:
If we look closely a HNSW graph mainly consist of below things:
From the above items, main memory is used by these full precision vectors
4 bytes * the number of vectors * the number of dimension
.The way FAISS stores these vectors is in a Flat Index and during serialization and deserialization these vectors are written and read to/from the file and put in the main memory which increases the memory consumption.
GH Issue : #1693
2. Scope
The partial loading LLD described in this issue is specifically tailored for FAISS HNSW. However, the core concept can be easily generalized and extended to support other types of vector search.
3.
PARTIAL_LOADING_MODE
in MappingsI propose adding a
PARTIAL_LOADING_MODE
setting to the mappings. (e.g. have it at field level) To deliver an "out-of-the-box" experience for users, we can begin with straightforward options and progressively introduce more advanced configurations.Proposed Modes:
(Default) BEST_PERFORMANCE (Or simply DISABLED):
Partial loading is disabled by default, ensuring the engine loads full graphs into memory for optimal performance.
MEMORY_EFFICIENT:
The engine minimizes memory usage when serving search requests by delegating to
IndexInput
to fetch the required bytes during a search. Performance depends on the configuredIndexInput
type. For instance, withMMapDirectory
, which maps index files usingmmap
, this design proposes directly invoking the system call from the C++ layer, bypassing multiple JNI calls to retrieve bytes.4. Low Level Design
We have two approaches for achieving partial loading:
Despite the differences between these two options, the underlying concept remains the same: during search operations, the core component relies on a provided IndexInput to fetch bytes as needed, instead of loading all bytes into memory upfront, which is the default behavior. The choice between the Java and C++ implementations should prioritize maximizing productivity while maintaining optimal performance.
I strongly recommend proceeding with the Java-based option for the following four reasons below:
1. Java version is much faster than the C++ version mainly due to JNI call overheads.
For 1M dataset, Java version showed 157.81 ops/s as a median throughput while C++ version showed 115.75. ops/s. Java version is 36% faster than C++.
But with 10M dataset, Java version 137% faster than C++ version. Java version showed 15.11 ops/s throughput and C++ showed 6.36 ops/s. For more details, please refer to Section 5. Performance Benchmark
2. Minimal Code Changes Compared to the C++ Option
Implementing partial loading in C++ requires significantly more effort compared to Java. In C++, integrating partial loading logic into the existing FAISS components is challenging. FAISS tightly couples its search and storage mechanisms, making it infeasible to enable partial loading transparently with simple patches. To achieve this, we would need to extract the minimum required HNSW search algorithms from FAISS and adapt them for partial loading.
This complexity arises because FAISS lacks a clear separation between search and storage. In contrast, Lucene’s architecture inherently supports partial loading through its separation of concerns and its flexible Directory implementation. This makes implementing partial loading in Lucene as simple as switching the Directory.
Consequently, introducing partial loading in FAISS would require fundamental changes that compromise some of its core design principles. In comparison, implementing partial loading in Java is far less intrusive!
While supporting partial loading for IVF still requires additional implementation, keeping the logic on the Java side is undoubtedly simpler and more maintainable than handling it in C++.
3. Debugging is Significantly Easier
Debugging in Java is far simpler and more efficient compared to C++. Debugging C++ can be cumbersome, especially when dealing with JNI bindings and JVM shared libraries, where issues often become complex and difficult to trace. It's widely accepted that debugging Java is much more straightforward than debugging C++. I'll stop here.
4. Miscellaneous Benefits
**"Free Lunch" from
prefetch
and Vector Calculations**Lucene contributors continuously work on improving the system. By leveraging Lucene’s HNSW searching logic to operate on the FAISS index, we can benefit from these improvements without additional effort. One good example is the prefetch optimization during search, which triggers the underlying storage to load bytes asynchronously. (Good opportunity for kernel or network file system daemon to load bytes in parallel). This reduces I/O wait times, and decreasing search latency.
Simpler Unit Testing
Adding unit tests becomes significantly easier, leading to better productivity and maintainability. Mocking in the JVM ecosystem is far simpler and requires much less effort compared to testing in C++. Additionally, flaky tests are easier to identify and resolve in Java, further streamlining the development process.
4.1. [Recommended] Run HNSW Search on FAISS index using Java.
4.1.1. Low Level Design Overview
4.1.2. Index Loading
We use a lazy loading approach for vector indices. Initially, the system checks the cache manager and loads the index only if it is absent in the cache manager. This loading process differs from the baseline behavior, which fully loads the index into memory.
Instead of reading the entire file, we record the starting offset of each section and skip directly to the next section. When a specific section is accessed later, the system calculates the absolute offset in the file by adding the section's starting offset to the given relative offset and fetches the required bytes.
To achieve this, we need to introduce a new component capable of interpreting the FAISS index file layout. This component will handle the conversion of relative positions into absolute offsets and mark the starting offsets of regions within the load method of the IndexLoadStrategy.
This section will not cover index setting changes related to introducing the partial loading mode, as the focus here is on the core components.
IndexLoadStrategy::load
In the
load
function, we initialize the partial loading components and store them in aPartialLoadingContext
. During this process, we save only the starting offsets of each section rather than fully loading the data. ThePartialLoadingContext
will then be used during the search phase.[New Component] FaissIndex
This class primarily focus on marking the offsets of regions within the FAISS index file. (You can find the code in read_index)
Most regions will be skipped during loading, with only their start offsets marked. However, three specific regions listed below will be loaded into memory due to their relatively small size, typically amounting to at most a few megabytes. Lucene follows the same approach by loading these regions into memory:
double[]
): Usually just a few bytes.int[]
): Few byteslong[]
): Generally a few kilobytes. At most few megabytes.[New Component] PartialLoadingContext
We will introduce the
PartialLoadingContext
toIndexAllocation
. This new component will include theFaissIndex
along with all necessary information required for partial loading.4.1.3. Vector Search Part
During search, if partial loading is enabled, we can delegate the search to
FaissIndex
. Otherwise, it will fall back to the default behavior. In the case ofFaissHNSWFlatIndex
, it will execute the ported HNSW searching logic ported from FAISS to Java. For other index types, such as IVF, it will be supported shortly.Currently, the vector search is triggered in the
KNNWeight::scorer
method. After acquiring results from a single segment, we convert the results intoKNNScorer
and return them. In the short term, I propose extending theKNNWeight::doANNSearch
method to branch based on whether partial loading is enabled.For the long term, we should consider refactoring this logic into
KNNQuery
instead of keeping it inKNNWeight
.For parallel search, Lucene uses
MultiLeafKnnCollector
to collect results, allowing each leaf-level collector to exchange eligible minimum scores (e.g., the inverse of distance) at the global level, helping to terminate early in the loop and improving performance. However, since thescorer
method is called at the leaf level (IndexSearcher::searchLeaf) it will not benefit from such a collector during parallel searches.KNNWeight
[New Component] MemoryEfficientPartialLoadingSearchStrategy
For scalability, we define a partial load search strategy that performs vector search based on its mode. Currently, we only have the
MEMORY_EFFICIENT
mode, but this strategy pattern allows us to easily add more modes in the future.The
MemoryEfficientPartialLoadingSearchStrategy
constructs both theKnnCollector
andRandomVectorScorer
, and then passes them toFaissIndex
to initiate the search. Note that both components are defined in the Lucene core package.The following code illustrates the "happy happy" scenario, where there are no parent IDs or
filterIdsBitSet
. This example is simplified for clarity, but it demonstrates how each case can be implemented in the future.[New Component] FaissHNSWFlatIndex::searchLeaf
FaissHNSWFlatIndex
prepares Lucene HNSW graph adapters and delegates the execution of the HNSW search algorithm to Lucene. Lucene defines an HNSW graph interface in its core package. The idea is to create aLuceneFaissHnswGraph
class that wraps theFaissHNSW
to implement the required contracts, allowing Lucene’sHnswGraphSearcher
to navigate the graph and compute scores. Similarly, this approach can be extended to implement specific vector search algorithms in the variousFaissIndex
subclasses, such as the IVF index.4.1.3. Pros / Cons
Pros
Cons
With the same index, partial loading may yield different results compared to when it is disabled. While the overall HNSW search logic is similar between Lucene and FAISS, there is a slight difference in the loop termination conditions. Lucene halts the loop when the best score in the candidate heap is no longer better than the minimum eligible score[Ref], whereas FAISS continues until the candidate heap is exhausted. [Ref].
However, recall and precision will remain at the same level. If this becomes a significant concern, I can port the FAISS HNSW search logic to Java. I've tested this as well, and it yielded the same performance as Lucene.
4.2. [Not Recommended] Implement Partial Loading in C++
(If you’re not interested in knowing what to change in C++, you can skip this section and move on performance section)
In this section, we will discuss how partial loading can be achieved in C++. The underlying concept remains the same: whenever bytes are requested, we rely on Lucene’s
IndexInput
to fetch them.However, for this approach, we need to duplicate the FAISS core logic and introduce several modifications due to the reasons outlined below. This is not something that can be accomplished with simple patches.
4.2.1. Why it is difficult to introduce changes in FAISS
1. Absent of a storage concept in FAISS.
In its current implementation, FAISS directly reads data from
faiss::IOReader
into astd::vector
. (READXBVECTOR) There is no concept of "Storage" in FAISS; everything is expected to reside in memory. However, partial loading is the opposite of this approach, where bytes are loaded on-demand from the underlying storage. This means the resulting bytes could end up being scattered across memory.The challenge is that it’s not straightforward to extend core components to switch between full loading and partial loading modes.
std::vector
should be used when partial loading is disabled, but when it’s enabled, aStorage
type wrapping Lucene'sIndexInput
must be used instead. Implementing this change would require widespread modifications throughout the codebase. (Chain explosion!)Furthermore, since FAISS doesn’t have a clear distinction between updating and searching, these fundamental changes would also affect the index writing logic. FAISS’s in-memory data structures can be updated while being searched in parallel. This interdependence makes it difficult to modify core components without brining additional changes in other parts of the system.
While a storage concept could solve these challenges, introducing such a change would require agreement and collaboration from the FAISS team. Until that happens, it would be unwise to implement disruptive changes that would likely introduce unwanted side effects in other areas of the system.
2. FAISS’s strong 'faith' that bytes will be in a continuous memory space.
FAISS passes
size_t*
orfloat*
pointers around freely, with the firm belief that everything will be stored in a contiguous block of memory. This assumption is deeply embedded in its design, particularly in the distance computation logic, which simply looks up vectors using&ptr[index]
. To implement partial loading, this approach would need to change significantly to rely on aStorage
system to fetch the required bytes on demand.Beyond the technical challenges, I also question whether many contributors fully understand the complexities of partial loading in C++. This could hinder the speed at which we can add new features or maintain the system in a robust manner.
Therefore, If we are determined to implement partial loading in C++, the only viable approach would be to extract the HNSW searching logic from FAISS and tailor it specifically for partial loading. However, as FAISS continues to evolve, we would need to constantly update our partial loading implementation to keep pace. Given that only a few contributors have a deep understanding of FAISS's internal workings, it is likely that we will fall behind. Falling behind on maintaining such critical logic can lead to technical debt, blocking further changes and ultimately resulting in deprecation.
For this reason, I believe it is not worth the risk to pursue this approach. Developing logic that may be deprecated in the future does not make sense.
4.2.2. Low Level Design Overview
4.2.3. [New Component] FaissIndexInputStorage
FAISS HNSW heavily relies on
std::vector<T>
for its search operations, using it as a memory block to manage various data structures, such as float vectors and adjacency neighbor lists. During the loading phase, FAISS usesIndexInput
to fetch bytes and populate the memory regions managed bystd::vector<T>
.By introducing a generalized
Storage<T>
implementation with functionality equivalent tostd::vector<T>
, we could abstract the storage layer in FAISS and replacestd::vector<T>
with this more flexible solution. This would provide a more adaptable way to manage memory while preserving the functionality of the existing system.Storage
is an abstract layer that provides loading APIs with offsets. This will replacestd::vector
infaiss::Index
. Since we will use static polymorphism to introduce this abstract layer, it will be passed as a template type.Storage
must be templated with a data typeT
, such aslong
, and must have the following four methods:markSection(FaissOpenSearchIOReader, size_t nitems)*
Use the provided reader to mark the start offset of each section and skip to the next section.
nitems
refers to the number of items of data typeT
. For example, ifT
islong
andnitems
is 10, it requires skipping10 * 8
bytes.**setAtomicUnit(int32_t atomic_unit)
**Sets the
atomic_unit
to guarantee that an atomic unit of data can be loaded.For example, with a dimension of 768 for a float vector,
atomic_unit
must be set to3072
(i.e.,768 * sizeof(float)
bytes).const T& operator[](const size_t index) const
Loads bytes, the size of which equals the atomic unit, located at the given index in the file.
size()
Returns the total byte size of the storage.
Ex: [Baseline] HNSW Graph Traversal
Ex: [Partial Loading] HNSW Graph Traversal
4.2.4. Index Loading
Similar to the first approach, we duplicate FAISS's
read_index
method. When partial loading is enabled, it marks the starting offsets of each section in the file and moves on to the next section. If partial loading is disabled, the flow falls back to the default behavior, which loads the entire index into memory.IndexLoadStrategy::load
org_opensearch_knn_jni_FaissService.cpp
Here, we create a C++ struct,
PartialLoadingContext
, which wraps the passed Java instance ofPartialLoadingContext
. This struct provides a set of convenient methods that internally handle the necessary JNI calls.faiss_wrapper.h
Here, depending on whether partial loading is enabled, the loading function branches. If partial loading is enabled, it calls
partialLoadingFaissIndexLoad
to load the index partially. Otherwise, it falls back to the default behavior, which loads the full index into memory.partialloading/faiss_index_load.h
Similar to what we did in
FaissIndex::load
in Java, we need to mark the starting offset of each section in the FAISS index.4.2.5. Vector Search Part
Once an index is partially loaded in C++, we can perform the vector search in the same way we did in Java. The search process in C++ is more complicated than in Java, as it does not delegate to Lucene's
HnswGraphSearcher
. Instead, we need to duplicate the FAISS search logic, specifically tailored for partial loading. However, the underlying idea remains the same: depending on the partial loading mode, we create a search strategy and proceed with the vector search.KNNWeight::doANNSearch
faiss_wrapper.cpp
[New Component] partialloading/memory_efficient_partial_loading_search_strategy
MemoryEfficientPartialLoadingSearchStrategy
allows each index to continue the vector search and fetch bytes on demand. The actual byte fetching is done through JNI calls to invoke Lucene'sIndexInput
.[New Component] OpenSearchIndexIDMapTemplate
This is the top-level index that delegates the search to its nested index (e.g., currently
IndexHNSWFlat
) and then transforms the result IDs.[New Component] OpenSearchIndexHNSWFlat
Eventually, it will reach the
hnswSearch
method, which performs vector search by traversing the HNSW graph. All byte fetching will be handled transparently by eachStorage
implementation.[New Component] OpenSearchHNSW
For simplicity, I will not go into the details of the HNSW search algorithm. In short, the algorithm consists of two parts:
efSearch
.For more details, refer to the greedy_update_nearest and search_from_candidates methods in FAISS.
[New Component] OpenSearchFlatL2Dis
This is the L2 distance calculator created by
OpenSearchIndexFlatL2
. As the name suggests, it calculates the L2 distance (i.e., euclidian distance) between the given query vector and a vector stored within the index.It fetches vectors via
Storage
, loading bytes equal to the size configured throughsetAtomicUnit
. This ensures that a single vector is fully loaded. For example, for a 768-dimensional float vector, 3072 bytes (i.e.,sizeof(float) * 768
) will be loaded at once.4.2.6. Pros / Cons
Pros
Cons
5. Performance Benchmark
Vector search performance is primarily influenced by three factors:
Partial loading has little to no impact on this part.
Partial loading mainly affects the first two factors. In my testing, I observed that the more time it takes to load bytes from storage (e.g., when using Network File System with
NioFSDirectory
, though results aren't included here), the harder it becomes to isolate and analyze the performance impact of partial loading.To mitigate this, I used
MMapDirectory
to load the entire index into memory, minimizing the interaction with the Directory and enabling me to assess the worst-case performance. This provides lower bound of latency for comparison with Lucene as a baseline.5.1. Benchmark Environment
-Xms100g -Xmx100g
5.2. 1M Summary
Throughput
Latency
5.3. 10M Summary
For some reasons, I was consistently getting low recall (around 10%) after indexing 10M dataset in a single instance after performing a force merge with
max_segment=1
. Despite trying variousefConstruction
values, the results remained the same.However, when I allowed the engine to keep around 70 segments instead of merging everything into one giant segment, recall improved to over 80%. In this case, Lucene outperformed FAISS by 180% (31.42 ops/s vs. 17.31 ops/s). Interestingly, 'Partial Loading - Java' performed only 7% to 14% slower than 'FullLoading FAISS' (15.09 ops/s vs. 17.31 ops/s). I suspect this performance difference stems from how the index is organized.
I dug deeper into this and discovered that the number of vectors visited per query in FAISS is roughly twice that of Lucene. Lucene visits around 30,000 vectors per a single query, while FAISS visits between 60,000 and 75,000 vectors. This explains the faster performance in Lucene, as it visits far fewer vectors and performs less distance computation. I suspect this difference is due to the quality of the index built, and I believe that with a similarly high-quality index, 'Partial Loading Java' will deliver performance comparable to Lucene's.
It seems odd that partial loading performed similarly with a 1M dataset but experienced a significant performance drop with a 10M dataset.
Nonetheless, I believe this experiment still provides a useful proxy (though not perfect), showing that if we can implement partial loading in Java, we can achieve similar performance to 'FullLoading FAISS', assuming IO costs are not a significant factor. (Note that as more IO is involved, the performance gap narrows. In this analysis, we are focusing on the lower bound of performance.)
Throughput
Latency
6. Conclusion
With the partial loading design, we can selectively decide whether to load all bytes into memory. When its mode is
MEMORY_EFFICIENT
, we can fetch bytes via IndexInput on demand eliminating the need for allocating memory space in application.From the benchmark results, we observe three things.
7. Discussion Points
7.1.
getSizeInKB
For
MEMORY_EFFICIENT
partial loading mode, do we have to return 0 ingetSizeInKB
method inNativeMemoryAllocation
? I believe so for most cases, butFaissIndexInputStorage<T>
really depends on theIndexInput
implementation where it can allocate memory blocks and can hold them inside. Which can be rare, but it is technically not zero.7.2. Setting Update
To update partial loading mode, user must close index first and update the mode then reopen the index again. I don’t think it necessarily be dynamically configurable, but I’m open to make it dynamic.
To update partial loading mode, user must close index first and update the mode then reopen the index again. I don’t think it should be dynamically configurable, but I’m open to make it dynamic.
The text was updated successfully, but these errors were encountered: