Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise to cuvs ivf_pq, add cosine for ivf_pq and support long_max indices for all ann algorithms. #757

Merged
merged 3 commits into from
Nov 6, 2024

Commits on Nov 6, 2024

  1. squashed and rebased

    support derived class and cuvs ivf_pq
    
    Signed-off-by: Jinfeng <[email protected]>
    
    add testing cosine for ivf_pq
    
    replace cuml ivfpq with cuvs ivf_pq
    
    fix less than k items probed and support long label dtype in create spark dataframe
    
    normalize dataset to unit norms for inner_product distances to avoid mg failure
    
    increase ivf_pq quantization to make its recall more stable
    
    remove normalization as it transform the dataset that leads to lower recall
    
    add case when less than k items are probed
    lijinf2 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    3180953 View commit details
    Browse the repository at this point in the history
  2. rebased and second squash:

    improve test case for fewer k items probed
    
    fix bug relates to CPUNN
    
    revise per comments
    
    fix create_pyspark_dataframe to get it works for cp arrays as input
    
    fix bug on label of create_pyspark_dataframe
    
    fix bug tested in CPUNearestNeighbors model
    
    add refine to the knn.py for ivfpq
    
    in progress for checkout
    
    add debug info
    
    get ivf_pq cosine passed by increasing dataset std to make it separable
    
    get ivf_pq working after using refine
    
    remove unnecessary test for refine
    
    get refine work for less than k itmes probed
    
    replace df.withColumn with df.select to fix slowdown for df that was initialized with wide pd.DataFrame
    
    revise comment to make it more clear
    lijinf2 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    f5aed5b View commit details
    Browse the repository at this point in the history
  3. ensure spark returns are consistent with cuvs when handling less than…

    … k items probed
    
    listening for future updates to consolidate behaviors of ivfflat, ivfpq and refine
    lijinf2 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    8f290f2 View commit details
    Browse the repository at this point in the history