Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw custom RecallError/RecallException when the number of requested neighbors cannot be returned #88

Merged
merged 2 commits into from
Oct 2, 2024

Conversation

stephen29xie
Copy link
Contributor

@stephen29xie stephen29xie commented Sep 23, 2024

Description

During probabilistic construction of a Voyager index, it's possible that the graph becomes disconnected. During node insertions, some nodes have to prune neighbor nodes via a neighbor selection heuristic to maintain a maximum number of neighbors limit. A node can be pruned by all of its neighbors. This causes the graph to become multiple components.

The maximum number of neighbors that can be returned in a query is the size of the component that the entry point is in. Example: querying for k neighbors where k == size of index is not possible in a disconnected graph because not all nodes can be traversed.

Using a higher M value to construct the index will improve recall because this parameter controls the number of neighbors a node can have. Allowing more neighbors per node results in a lower probability of a disconnected graph. Note that a higher M value also increases construction time.

Related Issues

#38

Changes Made

C++

  • Throw a custom RecallError when the number of requested neighbors cannot be returned.
    • This message informs users that they can reconstruct the index with a higher M value to increase the recall performance.

Python

  • Registered custom exception translator to translate C++ RecallError to a Python bindings RecallError
    • example error message: voyager.RecallError: Fewer than expected results were retrieved; only found 10584356 of 10779975 requested neighbors. Reconstruct the index with a higher M value to increase recall.

Java

  • Implemented custom com.spotify.voyager.jni.exception.RecallException class to be thrown in Java when the native code throws a RecallError.
    • example error message: Exception com.spotify.voyager.jni.exception.RecallException: Fewer than expected results were retrieved; only found 10584356 of 10779975 requested neighbors. Reconstruct the index with a higher M value to increase recall.
  • Updated and regenerated Java docs

Testing

  • Add C++ tests to reproduce error

Checklist

  • My code follows the code style of this project.
  • I have added and/or updated appropriate documentation (if applicable).
  • All new and existing tests pass locally with these changes.
  • I have run static code analysis (if available) and resolved any issues.
  • I have considered backward compatibility (if applicable).
  • I have confirmed that this PR does not introduce any security vulnerabilities.

Additional Comments

@stephen29xie stephen29xie changed the title [WIP] Throw custom RecallError when the number of requested neighbors cannot be returned Sep 25, 2024
@psobot
Copy link
Member

psobot commented Sep 25, 2024

Looks good so far, but worth mentioning that we'll need to register a custom exception handler for pybind11 (and likely similar code for the Java bindings too) if we want RecallError to be visible to users. At the moment, Python users will just see RuntimeError.

@stephen29xie stephen29xie marked this pull request as ready for review September 26, 2024 20:44
Copy link
Member

@psobot psobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work @stephen29xie!

Copy link
Contributor

@dylanrb123 dylanrb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work

@stephen29xie stephen29xie changed the title Throw custom RecallError when the number of requested neighbors cannot be returned Throw custom RecallError/RecallException when the number of requested neighbors cannot be returned Oct 2, 2024
@stephen29xie stephen29xie merged commit 4dc7b6d into main Oct 2, 2024
57 checks passed
@stephen29xie stephen29xie deleted the stephenx/recall branch October 2, 2024 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants