-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Using KNN codec in conjunction with custom codec #2414
Comments
@bugmakerrrrrr thanks for raising the issue. I think me and @jmazanec15 have discussed this sometime back when he was proposing the the derived source change. and this is exactly we found if shard moves from 1 node to other node or you close or open the index there will be problems. Let me read your solution in more details. To comment more or if we are missing anything. Because the solution works then I will be pretty excited. |
@bugmakerrrrrr I read your whole solution and I kind of understood what you are trying to do. The approach of writing the default delegate codec in segment info will diverge the KNNCodec and other Opensearch default codec. We already have a lot of differences there in terms of new codecs(created for star tree index etc.) not being compatible with k-NN codec. I think we should think of more elegant solution here. Having said that I am aligned on having a consistent behavior by ensuring that we throw exceptions if custom codec is not provided. @jmazanec15 WDYT? |
@navneet1v Yes, I'm also concerned about compatibility issues if the restrictions on custom codecs are completely liberalized. But in most cases where KNN is used, we only care about the stored field format (for cost reasons). I'm wondering if we can determine if a custom codec can be used with KNN codec by checking if the other formats except stored field format it provides is the same as the default delegate. |
can you please elaborate on this. I didn't get your point completely here. |
@navneet1v @bugmakerrrrrr Sorry, missed this one. I think we do need to start writing this information to the segment info (both actual and delegate) - this approach makes sense. This will allow us to properly load via SPI. Otherwise, Im not sure there is anything else we can do with a no-arg constructor. |
@bugmakerrrrrr Do you have a PoC? If so, could you share via draft PR? |
Was thinking more about this - how would we write custom attributes, yet use the delegate format? This approach does seem to similar to how per field attributes are used for per field formats. I think for the long term, we should just implement a per field knn vector format and then, instead of writing delegates name to the segment info, create a codec that uses the delegate's name as an identifier. Our custom codec will ensure that the per knn vector format is serialized as a field attribute during write, so any codec that implements per field correctly will be able to read with this format. For derived source feature, I think we would need to migrate functionality to use your sole StoredFieldsVisitor approach, given there is no per field StoredFieldsFormat. |
We should make sure the
Indeed, I think your approach is neater and compatible with the derived source feature. I am willing to give it a try. |
@jmazanec15 I just realized that the KNN codec relies on the custom |
Hmmm I think we should see if we can get rid of the custom compound format. @navneet1v I think had an idea for it. |
Is your feature request related to a problem?
One of our customers uses the following index settings, which combines the KNN codec with the zstd codec.
Everything works fine when the shard is running all the time. However, when the index is recovered after a server reboot, the following exception is thrown.
This is because when indices are created, we use the parametric constructor to initialize the KNN codec, so we can get the correct delegate code.
k-NN/src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java
Lines 32 to 34 in f5cf255
And when indices are recovering, we initialize the KNN codec using a parameterless constructor, so the delegate codec we get will always be the default one, which doesn't match the codec used for writing, and thus throws an exception.
k-NN/src/main/java/org/opensearch/knn/index/codec/KNN990Codec/KNN990Codec.java
Lines 29 to 31 in f5cf255
What solution would you like?
We can put the delegate codec name to the segment info attribute during writing
SegmentInfo
data. And when getting stored fields reader, we first read the delegate codec name from theSegmentInfo
and then use the correct delegate codec to generate the stored fields reader.Note: since we can't get a delegate for
SegmentInfoFormat
, we must use the default delegate codec'sSegmentInfoFormat
.What alternatives have you considered?
If we don't want the user to use a custom codec, then we can check the
index.codec
configuration when fetching the codec service factory and throw an exception if it's notdefault
orbest_compression
. Instead of having unintended behavior during runtime.k-NN/src/main/java/org/opensearch/knn/plugin/KNNPlugin.java
Lines 285 to 290 in f5cf255
Do you have any additional context?
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: