-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary vector format for flat and hnsw vectors #14078
base: main
Are you sure you want to change the base?
Binary vector format for flat and hnsw vectors #14078
Conversation
Hi @benwtrent , I am the first author of the RaBitQ paper and its extended version. As your team have known, our RaBitQ method brings breakthrough performance on binary quantization and scalar quantization. We notice that in this pull request, you mention a method which individually optimizes the lower bound and upper bound of scalar quantization. This idea is highly similar to our idea of individually looking for the optimal rescaling factor of scalar quantization as described in our extended RaBitQ paper, which we shared with your team in Oct 2024. An intuitive explanation can be found in our recent blog. The mathematical equivalence between these two ideas is listed in Remark 2. In addition, the contribution of our RaBitQ has not been properly acknowledged at several other places. For example, in a previous post from Elastic - Better Binary Quantization (BBQ) in Lucene and Elasticsearch, the major features of BBQ are introduced, yet it is not made clear that all these features originate from our RaBitQ paper. In a press release, Elastic claims that "Elasticsearch’s new BBQ algorithm redefines vector quantization", however, BBQ is not a grandly new method, but a variant of RaBitQ with some minor adaption. We note that when a breakthrough is made, it is always easy to derive its variants or to restate the method in different languages. One should not claim a variant to be a new method with a new name and ignore the contribution of the original method. We hope that you would understand our concern and acknowledge the contributions of our RaBitQ and its extension properly in your pull requests and/or blogs.
|
* </tr> | ||
* <tr> | ||
* <td>{@link org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat Vector values}</td> | ||
* <td>.vec, .vem, .veq, vex</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also add veb
and vemb
files to the list?
import org.apache.lucene.index.SegmentWriteState; | ||
|
||
/** | ||
* Copied from Lucene, replace with Lucene's implementation sometime after Lucene 10 Codec for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should we remove this line?
return "Lucene102BinaryQuantizedVectorsFormat(name=" | ||
+ NAME | ||
+ ", flatVectorScorer=" | ||
+ scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we also add + ", rawVectorFormat=" + rawVectorFormat?
@gaoj0017 Thank you for your feedback! Truly, y'all inspired us on improving scalar quantization. RaBitQ showed that it is possible to achieve 32x reduction while achieving high recall without product quantization. And, to my knowledge, we have attributed inspiration where the particulars of the algorithm were used. As for this change, it is not mathematically the same or derived from y'all's new or old paper. Indeed, your new paper is interesting and provides the same flexibility to various bit sizes and shows that it's possible. However, we haven’t tested it, nor implemented it. Here are some details about this implementation. https://www.elastic.co/search-labs/blog/scalar-quantization-optimization |
@benwtrent Thanks for your reply. First, in the blog - Better Binary Quantization at Elastic and Lucene - the BBQ method is a variant of our RaBitQ with no major differences. The claimed major features of BBQ all originate from our RaBitQ paper (as we have explained in our last reply). There is only one attribution to our method, where it is mentioned (in one sentence) that BBQ is based on some inspirations from RaBitQ. We think this attribution is not sufficient - it should be made clear that the mentioned features of BBQ all originate from RaBitQ. Second, for the new method described in this pull request, there is no attribution to our extended RaBitQ method at all - we note that we shared with your team the extended RaBitQ paper more than 2 months ago. To our understanding, the method is highly similar to our extended RaBitQ at its core (which also supports quantizing a vector to 1-bit, 2-bit, ... per dimension). They share the major idea of optimizing the scalar quantization method by trying different parameters. In your new blog, it is mentioned that “Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization.” This is not true since our extended RaBitQ corresponds to an optimized scalar quantization method. Given that our extended RaBitQ method is a prior art of the method introduced in the blog and our method was known to your team more than 2 months ago, you should not have ignored it. Discussions on the differences between two methods if any should be clearly explained and experiments of comparing the two methods should be provided as well. |
@gaoj0017 it sounds to me as if your concern is about lack of attribution in the blog post you mentioned, and doesn't really relate to this pull request (code change) - is that accurate? |
+1 for proper attribution. We should give credit where credit is due. The evolution of this PR clearly began with the RaBitQ paper, as seen in the opening comment on the original PR as well as the original issue. Specifically for the open source changes proposed here (this pull request suggesting changes to Lucene's ASL2 licensed source code):
Linking to the papers that inspired important changes in Lucene is not only for proper attribution but also so users have a deep resource they can fall back on to understand the algorithm, understand how tunable parameters are expected to behave, etc. It's an important part of the documentation too! Also, future developers can re-read the paper and study Lucene's implementation and maybe find bugs / improvement ideas. For the Elastic specific artifacts (blog posts, press releases, tweets, etc.): I would agree that Elastic should also attribute properly, probably with an edit/update/sorry-about-the-oversight sort of addition? But I do not (no longer) work at Elastic, so this is merely my (external) opinion! Perhaps a future blog post, either Elastic or someone else, could correct the mistake (missed attribution). Finally, thank you to @gaoj0017 and team for creating RaBitQ and publishing these papers -- this is an impactful vector quantization algorithm that can help the many Lucene/OpenSearch/Solr/Elasticsearch users building semantic / LLM engines these days. |
To head this off, this implementation is not an evolution of RabitQ in any way. It's intellectually dishonest to say it's an evolution of RaBitQ. I know that's pedantic, but it's a fact. This is the next step of the global vector quantization optimization done already in Lucene. Instead of global, it's local and utilizes anisotropic quantization. I am still curious as to what in particular is considered built on RaBitQ here. Just because things reach the same ends (various bit level quantization) doesn't mean they are the same. We can say "so this idea is unique from RaBitQ in these ways" to keep attribution, but it seems weird to call out another algorithm to simply say this one is different. I agree, Elastic stuff should be discussed and fixed in a different forum. |
Hi @msokolov , the discussion here is not only about the blog posts but also related to the pull request here. In this pull request (and its related blogs), it claims a new method without properly acknowledging the contributions/inspirations from our extended RaBitQ method as we have explained in our last reply. Besides, we believe this discussion is relevant to the Lucene community because Lucene is a collaborative project, used and contributed to by many teams beyond Elastic. Thanks @mikemccand for your kind words - we truly appreciate them! It is also an honor to us that RaBitQ and the extended RaBitQ are seen as impactful in improving industry productivity. Our responses to @benwtrent are as follows. Point 2 in our last reply: In the related blog that describes the method in this pull request, it states “Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization” - this is not true since this target has already been achieved by our extended RaBitQ method. Your team should not have made this mis-claim by ignoring our extended RaBitQ method since we circulated our extended RaBitQ paper to your team more than three months ago. In addition, our extended RaBitQ method proposed the idea of searching for the optimal parameters of scalar quantization for each vector (for details, please refer to our blog). The method in this pull request has adopted a highly similar idea. For this reason, we request that in any existing and potentially future channels of introducing the method in this PR, proper acknowledgement of our extended RaBitQ method should be made. |
In my capacity as the Lucene PMC Chair (and with explicit acknowledgment of my current employment with Elastic, as of the date of this writing), I want to emphasize that proper attribution and acknowledgment should be provided for all contributions, as applicable, in accordance with best practices. While the inclusion of links to external blogs and prior works serves to provide helpful context regarding the broader landscape, it would be of greater value to explicitly delineate which specific elements within this pull request are directly related to the RaBitQ method or its extension. |
Just sticking purely to the issues raised regarding this PR and the blog Ben linked explaining the methodology...
This comment relates to the fact that RaBitQ, as you yourself describe it in both your papers, is motivated by seeking a form of product quantization (PQ) for which one can compute the dot product directly rather than via look up. Your papers make minimal reference to scalar quantisation (SQ) other than to say the method is a drop in replacement. If you strongly take issue to the statement based on this clarification we can further clarify it in the blog. I still feel this is separate to this PR and it seems better to discuss that in a separate forum. I would also reiterate that conceptually, our approach is much closer to our prior work on int4 SQ we blogged about last April, which is what inspired it more directly.
I would argue that finding the nearest point on the sphere is exactly equivalent to the standard process in SQ of finding the nearest grid point to a vector. Perhaps more accurate would be to say you've ported SQ to work with spherical geometry, although as before the more natural motivation, and the one you yourselves adopt, is in terms of PQ. This isn't related to optimising hyperparameters of SQ IMO. You could argue perhaps that arranging for both codebook centres and corpus vectors to be uniformly distributed on the sphere constitutes this sort of optimization, although it would not be standard usage. At best you could say it indirectly arranges for raw vectors to wind up close in some average sense to the quantized vectors. However, I'd take issue with this statement because a single sample of a random rotation does not ensure that the corpus vectors are uniformly distributed on the sphere: using a single random rotation of, for example, a set of points which are concentrated somewhere on the sphere doesn't change this. You would have to use different samples for different vectors, but this eliminates the performance advantages. Incidentally, this I think is the reason it performs significantly worse on GIST and indeed part of the reason why we found small improvements across the board for binary. (Tangentially, it feels like a whitening pre-conditioner might actually be of more benefit to performance of RaBitQ. I also can't help but feel some combination of hyperparameter optimization and normalization will yield even further improvements, but I haven't been able to get this to workout yet.) |
…or-fmt-optimized-scalar-quant
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
After Elastic’s last round of replies, the Elastic team reached us for clarification on the issues via zoom meetings. In the meetings, they promised to fix the misattribution, so we suspended our reply and waited for their amendment. Recently, Elastic made some edits on the BBQ blog and this pull request. However, (1) the updated blog still positions BBQ as a new method (despite that BBQ majorly follows our RaBitQ method with some differences in the implementation only), and (2) the updated PR does not position our extended RaBitQ method as a prior art of their method (despite that the method is similar to our extended RaBitQ method at its core and our method is a prior art). As academic researchers, we feel tired of writing emails asking for proper attribution round after round, and hence we would like to speak out for ourselves through this channel here. Point 1 in our reply: This pull request is a direct follow-up to the preceding pull request of the so-called “BBQ” method. Thus, this point should not be ignored and excluded from the discussion. We emphasize again that all the key features of the so-called “BBQ” method follow our RaBitQ method, with only minor modifications. Point 2 in our last reply: For the community’s information, we would like to provide the following timeline.
In summary, our requirements are two-fold: (1) Elastic should position BBQ as a variant of RaBitQ but not as a new innovation; and (2) Elastic should acknowledge our extended RaBitQ method as a prior art, which has covered similar ideas of the OSQ method and been published months earlier, but not a parallel work. As for future communications, we may not be able to respond to every future response from Elastic because we feel they can keep arguing and changing their arguing points - in this thread, Elastic have changed their attribution of OSQ several times (from the inspiration of RaBitQ to LVQ and SCANN, and to their own earlier PR). We should have been focusing on research more as academic researchers. No matter how they would argue, they cannot deny the following two points:
We would also consider reaching out to more channels to speak out for ourselves if necessary. |
This pull request relates only to OSQ, and thus the proper scope of discussion is regarding the concerns raised around its attribution. We have pursued multiple conversations and discussions in order to resolve various concerns amicably, and we continue to desire collaboration and open discourse with respect to each party's innovations on this problem space. We would like to reiterate that we highly rate the research in both RaBitQ and its extension. However, we also maintain that OSQ is not a derivative of extended RaBitQ as we have done in private communications.
This goes to crux of the disagreement around attribution. In the following we describe the two schemes: extended RaBitQ [1] and OSQ [2] and state again the points we have made privately regarding their differences. Both methods centre the data, which was originally proposed in LVQ [3] to the best of our knowledge. Extended RaBitQ proposes a method for constructing a codebook for which one can compute the dot product efficiently using integer arithmetic. In a nutshell it proceeds as follows:
At its heart it is a variant of product quantization which allows for efficient similarity calculations and was positioned as such in the original paper. The key ingredients of OSQ are as follows:
At an algorithmic level there is no similarity between the methods other than they both centre the data. At a conceptual level RaBitQ was developed as a variant of product quantization and OSQ was developed as an approach to hyperparameter optimization for per vector scalar quantization. For the record, we were inspired to revisit scalar quantization in light of the success of binary RaBitQ and we were exploring ideas prior to any communication between our teams regarding extended RaBitQ. The actual inspiration for the approach was our previous work on hyperparameter optimisation for quantization intervals and LVQ and we attribute this PR accordingly. Retrospectively, and in parallel to our work the authors of RaBitQ explored the relationship between extended RaBitQ and scalar quantization directly ([6] published 12/12/2024). They show that the act of normalising the vectors, finding the nearest centroid then undoing the normalisation can be thought of as a process of rescaling the quantization grid so that the nearest quantized vector winds up closer to the original floating point vector. Whilst this bears some relation to optimising b - a they never formulate this process as an optimisation problem and at no point show exactly what quantity they in fact optimise. In our view, this misses the key point that the important idea is the formulation. As such it is very clear that they do not and could not introduce any different optimisation objective, such as anisotopic loss in the dot product. Even if they did, OSQ optimises both a and b directly which gives it an extra degree of freedom and develops an effective and non-trivial and completely distinct solution for this problem. As we have communicated privately, we dispute that OSQ is a variant of extended RaBitQ. We laid out in broadly similar terms exactly why we disagree at a technical level (along the lines of the discussion in this comment). We have received no further technical engagement with the points we have raised and have therefore reached an impasse. Regarding this point:
we at no point intended to imply that RaBitQ could not be extended to support more than 1 bit and that this is the only motivation for our work. The main original motivations are all the benefits we see in being able to achieve high quality representations using a variant of vanilla scalar quantization such as:
We performed and published a study evaluating binary variants and showed consistent improvements in comparison to binary RaBitQ [2]. The study we performed is perfectly in the spirit of evaluating competing methods and covers multiple datasets and embedding models. Our initial focus was on binary quantization, since this is the only work we were releasing in product at that point. We privately discussed plans with the authors of RaBitQ to perform benchmarking for higher bit counts compared to extended RaBitQ and also to perform larger scale benchmarks. We were unaware of the [4] at the time the blog [2] was written and we communicated privately that we plan to publish a survey which includes a discussion of both works as well as other pertinent work in the field. We have furthermore stated privately, that this feels like a more appropriate forum in which to discuss the two methods. [1] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search, https://arxiv.org/pdf/2409.09913 |
This provides a binary vector format for vectors. The key ideas are:
This allows Lucene to have a single scalar quantization format that allows for high quality vector retrieval, even down to a single bit.
The ideas here build on a couple of foundations:
Lucene wouldn't be the only "Optimized Quantile LVQ" technique on the block, as the original RaBitQ authors has extended its technique though the loss function and optimization are different: https://arxiv.org/abs/2409.09913
For all similarity types, on disk it looks like.
During segment merge & HNSW building, another temporary file is written containing the query quantized vectors over the configured centroids. One downside is that this temporary file will actually be larger than the regular vector index. This is because we use asymmetric quantization to keep good information around. But once the merge is complete, this file is deleted. I think eventually, this can be removed.
Here are the results for Recall@10|50
Even with the optimization step, indexing time with HNSW is only marginally increased.
The consistent improvement in recall and flexibility for various bits makes this format and quantization technique much preferred.
Eventually, we should consider moving scalar quantization to utilize this new optimized quantizer. Though, the on disk format and scoring will change, so, I didn't do that in this PR.
supersedes: #13651
Co-Authors: @tveasey @john-wagster @mayya-sharipova @ChrisHegarty