-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid negative distances when using E4M3 cosine indices by renormalizing. #75
Conversation
bf95642
to
22b4287
Compare
@psobot I fixed the java tests and included a version bump/doc gen commit. LMK if you want me to do the latter in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @dylanrb123!
Just to talk through the implications of this change:
- Existing indices could have returned zero distances during queries due to precision errors when computing inner product
- With this change, querying existing Cosine E4M3 indices with new vectors can now return negative distances, as the normalized vectors within them will not be re-normalized. Distances will be slightly increased as the query vector is properly normalized, however.
- Newly-created indices after this change lands will never return negative distances.
One last thing I'm worried about: this change also potentially introduces quality loss when fetching a vector from the index and then passing it back in, as the vector will be re-normalized and normalize<E4M3>(x)
is not idempotent. That might be something we want to think through carefully before landing this change.
7dca6dd
to
bfc6bc7
Compare
bfc6bc7
to
7dca6dd
Compare
Closing in favour of #80 due to reduced recall of this approach. |
No description provided.