Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Expose extend() in C API #276

Open
wants to merge 12 commits into
base: branch-24.12
Choose a base branch
from

Conversation

ajit283
Copy link
Contributor

@ajit283 ajit283 commented Aug 6, 2024

Add functionality to add additional vectors after build to C API

@ajit283 ajit283 requested a review from a team as a code owner August 6, 2024 15:18
Copy link

copy-pr-bot bot commented Aug 6, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Aug 6, 2024
@benfred benfred added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Aug 6, 2024
@benfred
Copy link
Member

benfred commented Aug 6, 2024

/ok to test

Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this functionality! This change looks good to me -

cpp/src/neighbors/cagra_c.cpp Outdated Show resolved Hide resolved
Comment on lines +96 to +97
auto extend_params = cuvs::neighbors::cagra::extend_params();
extend_params.max_chunk_size = params.max_chunk_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be changed in this PR (especially since the rest of cagra/ivf-* doesn't follow this right now) - but I think we should start re-using the C-api parameter structs in the C++ api , like we are doing in the distances here

using DistanceType = cuvsDistanceType;
. This means we don't have to have two different identical structs in the C and C++ api, and copy between them you are doing here

cuvsCagraExtendParams_t extend_params;
cuvsCagraExtendParamsCreate(&extend_params);
extend_params->max_chunk_size = 100;
cuvsCagraExtend(res, extend_params, &additional_dataset_tensor, index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for debugging the segfault - does this still occur when called with the original dataset - like :

Suggested change
cuvsCagraExtend(res, extend_params, &additional_dataset_tensor, index);
cuvsCagraExtend(res, extend_params, &dataset_tensor, index);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still segfaults unfortunately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajit283 I think @benfred might be right about this being from the number of vectors in the dataset. Can we generate a matrix using the make_blobs from raft::random ? Then we can just generate a larger dataset and bypass the issue.

@cjnolet
Copy link
Member

cjnolet commented Aug 14, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Aug 14, 2024

@ajit283 I think we discussed generating a bigger dataset for the test case here. You could even use raft::random::make_blobs to accomplish this.

@ajit283
Copy link
Contributor Author

ajit283 commented Aug 14, 2024

👍 will look into this tomorrow

@ajit283 ajit283 requested review from a team as code owners August 28, 2024 13:16
@ajit283 ajit283 requested a review from bdice August 28, 2024 13:16
@cjnolet
Copy link
Member

cjnolet commented Sep 4, 2024

/ok to test

@ajit283 ajit283 requested a review from a team as a code owner September 25, 2024 17:53
@cjnolet
Copy link
Member

cjnolet commented Sep 27, 2024

@ajit283 can you fix the conflicts in this PR?

@github-actions github-actions bot removed the CMake label Sep 27, 2024
@ajit283
Copy link
Contributor Author

ajit283 commented Sep 27, 2024

Done! We should look into the tests for this one.

  • The previous approach was to test against a set of precomputed queries, expected neighbors and distances.
  • This does not work anymore as we generate the dataset dynamically using make_blobs(). Right now I am adding a static set of vectors to the dataset using extend(), then query these vectors and check if I get them back with distance 0. This shows that extend() works, but is generally a regression compared to the previous tests.
  • My ideas for improving the tests would be either to remove make_blobs() and use a larger static dataset, or to calculate expected neighbors and distances at runtime using brute-force.

What do you think?

@cjnolet
Copy link
Member

cjnolet commented Sep 27, 2024

/ok to test

@ajit283 ajit283 changed the base branch from branch-24.10 to branch-24.12 October 9, 2024 16:48
@cjnolet
Copy link
Member

cjnolet commented Nov 4, 2024

/ok to test

@ajit283
Copy link
Contributor Author

ajit283 commented Nov 8, 2024

@cjnolet I added a test using the pairwise distance matrix as discussed in the call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change Python
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants