Binary vector format for flat and hnsw vectors #14078

benwtrent · 2024-12-17T22:22:50Z

This provides a binary vector format for vectors. The key ideas are:

Centroid centered vectors
Asymmetric quantization
Individually optimized scalar quantization

This allows Lucene to have a single scalar quantization format that allows for high quality vector retrieval, even down to a single bit.

The ideas here build on a couple of foundations:

Locally determined vector quantization techniques originated with LVQ: https://arxiv.org/abs/2304.04759
Anisotropic loss originally described with the SCANN technique: https://arxiv.org/abs/1908.10396
Lucene's own vector optimization for quantiles.

Lucene wouldn't be the only "Optimized Quantile LVQ" technique on the block, as the original RaBitQ authors has extended its technique though the loss function and optimization are different: https://arxiv.org/abs/2409.09913

For all similarity types, on disk it looks like.

quantized vector	lower quantile	upper quantile	additional correction	sum quantized components
(vector_dimension/8) bytes	float	float	float	short

During segment merge & HNSW building, another temporary file is written containing the query quantized vectors over the configured centroids. One downside is that this temporary file will actually be larger than the regular vector index. This is because we use asymmetric quantization to keep good information around. But once the merge is complete, this file is deleted. I think eventually, this can be removed.

Here are the results for Recall@10|50

Dataset	old PR	this one	Improvement
Cohere 768	0.933	0.938	0.5%
Cohere 1024	0.932	0.945	1.3%
E5-Small-v2	0.972	0.975	0.3%
GIST-1M	0.740	0.989	24.9%

Even with the optimization step, indexing time with HNSW is only marginally increased.

Dataset	OLD PR	This One	Difference
Cohere 768	368.62s	372.95s	+1%
Cohere 1024	307.09s	314.08s	+2%
E5-Small-v2	227.37s	229.83s	< +1%

The consistent improvement in recall and flexibility for various bits makes this format and quantization technique much preferred.

Eventually, we should consider moving scalar quantization to utilize this new optimized quantizer. Though, the on disk format and scoring will change, so, I didn't do that in this PR.

supersedes: #13651

Co-Authors: @tveasey @john-wagster @mayya-sharipova @ChrisHegarty

gaoj0017 · 2024-12-18T07:02:10Z

Hi @benwtrent , I am the first author of the RaBitQ paper and its extended version. As your team have known, our RaBitQ method brings breakthrough performance on binary quantization and scalar quantization.

We notice that in this pull request, you mention a method which individually optimizes the lower bound and upper bound of scalar quantization. This idea is highly similar to our idea of individually looking for the optimal rescaling factor of scalar quantization as described in our extended RaBitQ paper, which we shared with your team in Oct 2024. An intuitive explanation can be found in our recent blog. The mathematical equivalence between these two ideas is listed in Remark 2.

In addition, the contribution of our RaBitQ has not been properly acknowledged at several other places. For example, in a previous post from Elastic - Better Binary Quantization (BBQ) in Lucene and Elasticsearch, the major features of BBQ are introduced, yet it is not made clear that all these features originate from our RaBitQ paper. In a press release, Elastic claims that "Elasticsearch’s new BBQ algorithm redefines vector quantization", however, BBQ is not a grandly new method, but a variant of RaBitQ with some minor adaption.

We note that when a breakthrough is made, it is always easy to derive its variants or to restate the method in different languages. One should not claim a variant to be a new method with a new name and ignore the contribution of the original method. We hope that you would understand our concern and acknowledge the contributions of our RaBitQ and its extension properly in your pull requests and/or blogs.

Remark 1. The BBQ feature fails on the GIST dataset because it removes the randomization operation of the RaBitQ method. With the randomization operation, RaBitQ is theoretically guaranteed to perform stably on all datasets.
Remark 2. Let $B$ be the number of bits for scalar quantization. The scalar quantization can be represented in two equivalent ways.
1. Scalar quantization can be determined by the lower bound $v_l$ and the upper bound $v_r$. The algorithm first computes $\Delta =(v_r-v_l) / (2^{B}-1)$ and then maps each real value $x$ to the nearest integer of $(x-v_l) / \Delta$.
2. Based on the process above, scalar quantization can be equivalently determined by a rescaling factor $\Delta$ and a shifting factor $v_l$.

mayya-sharipova · 2024-12-18T20:07:30Z

lucene/core/src/java/org/apache/lucene/codecs/lucene102/package-info.java

+ * </tr>
+ * <tr>
+ * <td>{@link org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat Vector values}</td>
+ * <td>.vec, .vem, .veq, vex</td>


should we also add veb and vemb files to the list?

mayya-sharipova · 2024-12-18T20:08:37Z

.../core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java

+import org.apache.lucene.index.SegmentWriteState;
+
+/**
+ * Copied from Lucene, replace with Lucene's implementation sometime after Lucene 10 Codec for


Also should we remove this line?

mayya-sharipova · 2024-12-18T20:23:11Z

.../core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsFormat.java

+    return "Lucene102BinaryQuantizedVectorsFormat(name="
+        + NAME
+        + ", flatVectorScorer="
+        + scorer


nit: should we also add + ", rawVectorFormat=" + rawVectorFormat?

benwtrent · 2024-12-19T16:49:01Z

@gaoj0017 Thank you for your feedback!

Truly, y'all inspired us on improving scalar quantization. RaBitQ showed that it is possible to achieve 32x reduction while achieving high recall without product quantization. And, to my knowledge, we have attributed inspiration where the particulars of the algorithm were used.

As for this change, it is not mathematically the same or derived from y'all's new or old paper. Indeed, your new paper is interesting and provides the same flexibility to various bit sizes and shows that it's possible. However, we haven’t tested it, nor implemented it.

Here are some details about this implementation.

https://www.elastic.co/search-labs/blog/scalar-quantization-optimization

gaoj0017 · 2024-12-26T13:45:59Z

@benwtrent Thanks for your reply.

First, in the blog - Better Binary Quantization at Elastic and Lucene - the BBQ method is a variant of our RaBitQ with no major differences. The claimed major features of BBQ all originate from our RaBitQ paper (as we have explained in our last reply). There is only one attribution to our method, where it is mentioned (in one sentence) that BBQ is based on some inspirations from RaBitQ. We think this attribution is not sufficient - it should be made clear that the mentioned features of BBQ all originate from RaBitQ.

Second, for the new method described in this pull request, there is no attribution to our extended RaBitQ method at all - we note that we shared with your team the extended RaBitQ paper more than 2 months ago. To our understanding, the method is highly similar to our extended RaBitQ at its core (which also supports quantizing a vector to 1-bit, 2-bit, ... per dimension). They share the major idea of optimizing the scalar quantization method by trying different parameters. In your new blog, it is mentioned that “Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization.” This is not true since our extended RaBitQ corresponds to an optimized scalar quantization method.

Given that our extended RaBitQ method is a prior art of the method introduced in the blog and our method was known to your team more than 2 months ago, you should not have ignored it. Discussions on the differences between two methods if any should be clearly explained and experiments of comparing the two methods should be provided as well.

msokolov · 2024-12-30T14:57:39Z

@gaoj0017 it sounds to me as if your concern is about lack of attribution in the blog post you mentioned, and doesn't really relate to this pull request (code change) - is that accurate?

mikemccand · 2025-01-03T13:47:34Z

+1 for proper attribution.

We should give credit where credit is due. The evolution of this PR clearly began with the RaBitQ paper, as seen in the opening comment on the original PR as well as the original issue.

Specifically for the open source changes proposed here (this pull request suggesting changes to Lucene's ASL2 licensed source code):

The CHANGES.txt entry should link to both RaBitQ papers?
The javadoc for the new Lucene102BinaryQuantizedVectorsFormat should also link to both papers, and describe the provenance (e.g. the algorithm described by these papers) along with how this implementation differs from the original papers? We try to do this when a paper inspires changes in Lucene, e.g. the algorithm for efficiently building our FSTs, the paper that inspired our block-tree terms dictionary, the HNSW approximate KNN search algorithm.

Linking to the papers that inspired important changes in Lucene is not only for proper attribution but also so users have a deep resource they can fall back on to understand the algorithm, understand how tunable parameters are expected to behave, etc. It's an important part of the documentation too! Also, future developers can re-read the paper and study Lucene's implementation and maybe find bugs / improvement ideas.

For the Elastic specific artifacts (blog posts, press releases, tweets, etc.): I would agree that Elastic should also attribute properly, probably with an edit/update/sorry-about-the-oversight sort of addition? But I do not (no longer) work at Elastic, so this is merely my (external) opinion! Perhaps a future blog post, either Elastic or someone else, could correct the mistake (missed attribution).

Finally, thank you to @gaoj0017 and team for creating RaBitQ and publishing these papers -- this is an impactful vector quantization algorithm that can help the many Lucene/OpenSearch/Solr/Elasticsearch users building semantic / LLM engines these days.

benwtrent · 2025-01-03T17:18:24Z

To head this off, this implementation is not an evolution of RabitQ in any way. It's intellectually dishonest to say it's an evolution of RaBitQ. I know that's pedantic, but it's a fact.

This is the next step of the global vector quantization optimization done already in Lucene. Instead of global, it's local and utilizes anisotropic quantization. I am still curious as to what in particular is considered built on RaBitQ here. Just because things reach the same ends (various bit level quantization) doesn't mean they are the same.

We can say "so this idea is unique from RaBitQ in these ways" to keep attribution, but it seems weird to call out another algorithm to simply say this one is different.

I agree, Elastic stuff should be discussed and fixed in a different forum.

gaoj0017 · 2025-01-06T12:39:00Z

Hi @msokolov , the discussion here is not only about the blog posts but also related to the pull request here. In this pull request (and its related blogs), it claims a new method without properly acknowledging the contributions/inspirations from our extended RaBitQ method as we have explained in our last reply. Besides, we believe this discussion is relevant to the Lucene community because Lucene is a collaborative project, used and contributed to by many teams beyond Elastic.

Thanks @mikemccand for your kind words - we truly appreciate them! It is also an honor to us that RaBitQ and the extended RaBitQ are seen as impactful in improving industry productivity.

Our responses to @benwtrent are as follows.
Point 1 in our last reply: This has been ignored in Ben’s reply. We would like to emphasize once again that the so-called “BBQ” method from Elastic is largely based on our RaBitQ method, with only minor modifications - this can be reflected in the previous PRs. Yet Elastic has repeatedly referred to BBQ as their new algorithm without acknowledging RaBitQ. For example, in a press release, Elastic states that "Elasticsearch’s new BBQ algorithm redefines vector quantization," omitting any reference to RaBitQ. This is unacceptable and particularly unfair to other teams who have openly acknowledged their use of RaBitQ. We request that our RaBitQ method should be properly credited in all existing and future blogs, press releases, pull requests, and other communications regarding the “BBQ” method.

Point 2 in our last reply: In the related blog that describes the method in this pull request, it states “Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization” - this is not true since this target has already been achieved by our extended RaBitQ method. Your team should not have made this mis-claim by ignoring our extended RaBitQ method since we circulated our extended RaBitQ paper to your team more than three months ago. In addition, our extended RaBitQ method proposed the idea of searching for the optimal parameters of scalar quantization for each vector (for details, please refer to our blog). The method in this pull request has adopted a highly similar idea. For this reason, we request that in any existing and potentially future channels of introducing the method in this PR, proper acknowledgement of our extended RaBitQ method should be made.

ChrisHegarty · 2025-01-06T15:04:07Z

In my capacity as the Lucene PMC Chair (and with explicit acknowledgment of my current employment with Elastic, as of the date of this writing), I want to emphasize that proper attribution and acknowledgment should be provided for all contributions, as applicable, in accordance with best practices.

While the inclusion of links to external blogs and prior works serves to provide helpful context regarding the broader landscape, it would be of greater value to explicitly delineate which specific elements within this pull request are directly related to the RaBitQ method or its extension.

tveasey · 2025-01-06T16:18:27Z

Just sticking purely to the issues raised regarding this PR and the blog Ben linked explaining the methodology...

Although the RaBitQ approach is conceptually rather different to scalar quantization, we were inspired to re-evaluate whether similar performance could be unlocked for scalar quantization” - this is not true since this target has already been achieved by our extended RaBitQ method. Your team should not have made this mis-claim by ignoring our extended RaBitQ method since we circulated our extended RaBitQ paper to your team more than three months ago.

This comment relates to the fact that RaBitQ, as you yourself describe it in both your papers, is motivated by seeking a form of product quantization (PQ) for which one can compute the dot product directly rather than via look up. Your papers make minimal reference to scalar quantisation (SQ) other than to say the method is a drop in replacement. If you strongly take issue to the statement based on this clarification we can further clarify it in the blog. I still feel this is separate to this PR and it seems better to discuss that in a separate forum.

I would also reiterate that conceptually, our approach is much closer to our prior work on int4 SQ we blogged about last April, which is what inspired it more directly.

In addition, our extended RaBitQ method proposed the idea of searching for the optimal parameters of scalar quantization for each vector (for details, please refer to our blog).

I would argue that finding the nearest point on the sphere is exactly equivalent to the standard process in SQ of finding the nearest grid point to a vector. Perhaps more accurate would be to say you've ported SQ to work with spherical geometry, although as before the more natural motivation, and the one you yourselves adopt, is in terms of PQ. This isn't related to optimising hyperparameters of SQ IMO.

You could argue perhaps that arranging for both codebook centres and corpus vectors to be uniformly distributed on the sphere constitutes this sort of optimization, although it would not be standard usage. At best you could say it indirectly arranges for raw vectors to wind up close in some average sense to the quantized vectors. However, I'd take issue with this statement because a single sample of a random rotation does not ensure that the corpus vectors are uniformly distributed on the sphere: using a single random rotation of, for example, a set of points which are concentrated somewhere on the sphere doesn't change this. You would have to use different samples for different vectors, but this eliminates the performance advantages.

Incidentally, this I think is the reason it performs significantly worse on GIST and indeed part of the reason why we found small improvements across the board for binary. (Tangentially, it feels like a whitening pre-conditioner might actually be of more benefit to performance of RaBitQ. I also can't help but feel some combination of hyperparameter optimization and normalization will yield even further improvements, but I haven't been able to get this to workout yet.)

…or-fmt-optimized-scalar-quant

github-actions · 2025-01-31T00:22:23Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

gaoj0017 · 2025-02-08T07:12:27Z

After Elastic’s last round of replies, the Elastic team reached us for clarification on the issues via zoom meetings. In the meetings, they promised to fix the misattribution, so we suspended our reply and waited for their amendment. Recently, Elastic made some edits on the BBQ blog and this pull request. However, (1) the updated blog still positions BBQ as a new method (despite that BBQ majorly follows our RaBitQ method with some differences in the implementation only), and (2) the updated PR does not position our extended RaBitQ method as a prior art of their method (despite that the method is similar to our extended RaBitQ method at its core and our method is a prior art). As academic researchers, we feel tired of writing emails asking for proper attribution round after round, and hence we would like to speak out for ourselves through this channel here.

Point 1 in our reply: This pull request is a direct follow-up to the preceding pull request of the so-called “BBQ” method. Thus, this point should not be ignored and excluded from the discussion. We emphasize again that all the key features of the so-called “BBQ” method follow our RaBitQ method, with only minor modifications.

Point 2 in our last reply: For the community’s information, we would like to provide the following timeline.

On 16-Sep-2024, we published our extended RaBitQ paper along with codes.
On 25-Oct-2024, we circulated the extended RaBitQ paper to Elastic as we knew they were implementing RaBitQ in Lucene. They replied positively that they will eagerly read our new paper. At that time, we didn’t expect that they would announce a variant of our RaBitQ method as their own innovation later. In addition, it is worth highlighting that our extended RaBitQ method is the first to achieve reasonable accuracy with 1-bit and 2-bit binary/scalar quantization in the literature by then.
On 11-Nov-2024, Elastic announced the BBQ method as their new innovation (despite that BBQ is simply our RaBitQ method with some differences in implementation).
On 22-Nov-2024, we expressed our concerns over the BBQ method to Elastic via email. Elastic replied that they would need internal discussions and would get in touch soon.
On 14-Dec-2024, we had not received any replies from Elastic and Elastic kept marketing the so-called “BBQ” method as their own innovation in several blogs and press releases. We therefore expressed our concerns over the BBQ method to the Lucene community (Add a Better Binary Quantizer format for dense vectors #13651).
On 19/20-Dec-2024, Elastic announced the OSQ method in this PR and this blog. They claimed to have unlocked the performance gain for various bits. However, they ignored the fact that our extended RaBitQ method can already handle various bits, despite that the method was shared with Elastic on 25 Oct 2024 and published on arXiv in Sep 2024.
14-Dec-2024 to 07-Jan-2025, Elastic and we had a few rounds of communications via this channel at GitHub, which are visible to the public. The last response was from Elastic on 07-Jan-2025, and we stopped replying here because Elastic reached out to us after their response and proposed to have communications via other channels (email and meetings) and we agreed with the hope that Elastic would fix the acknowledgement problem.
13-Jan-2025 and 22-Jan-2025, Elastic and we had two Zoom meetings and email communications and discussed how to provide sufficient acknowledgements to RaBitQ and its extension. Afterwards, Elastic made some edits on the BBQ blog and this PR. However, (1) in the updated blog, Elastic still positions BBQ as a new methodology and does not explicitly specify that the core ideas of BBQ follow those of RaBitQ and (2) in this PR, Elastic still positions OSQ as a parallel work of our extended RaBitQ method (despite its core ideas are similar to our extended RaBitQ method and our method is a prior part and published months ago). In addition, Elastic shared a doc with us explaining the differences between OSQ and extended RaBitQ, but our understanding is that the core ideas of two methods, i.e., trying different parameters for scalar quantization on a per-vector basis, are similar.
05-Feb-2025, this is when Elastic contacted us via Email lastly (as of today), saying that they think the acknowledgement of our RaBitQ and extended RaBitQ (based on the updated blog and PR) is sufficient.
08-Feb-2025 (today), this is when we replied to Elastic via Email lastly, updating them we shall speak out for us in other channels (after getting tired of writing emails round after round).

In summary, our requirements are two-fold: (1) Elastic should position BBQ as a variant of RaBitQ but not as a new innovation; and (2) Elastic should acknowledge our extended RaBitQ method as a prior art, which has covered similar ideas of the OSQ method and been published months earlier, but not a parallel work.

As for future communications, we may not be able to respond to every future response from Elastic because we feel they can keep arguing and changing their arguing points - in this thread, Elastic have changed their attribution of OSQ several times (from the inspiration of RaBitQ to LVQ and SCANN, and to their own earlier PR). We should have been focusing on research more as academic researchers. No matter how they would argue, they cannot deny the following two points:

The so-called BBQ majorly follows our RaBitQ method;
The OSQ method (introduced in this PR) has its major idea similar to our extended RaBitQ method and our extended RaBitQ method is a prior art which achieves good accuracy at 1-bit/2-bit binary/scalar quantization for the first time.

We would also consider reaching out to more channels to speak out for ourselves if necessary.

tveasey · 2025-02-12T06:10:24Z

This pull request relates only to OSQ, and thus the proper scope of discussion is regarding the concerns raised around its attribution.

We have pursued multiple conversations and discussions in order to resolve various concerns amicably, and we continue to desire collaboration and open discourse with respect to each party's innovations on this problem space. We would like to reiterate that we highly rate the research in both RaBitQ and its extension. However, we also maintain that OSQ is not a derivative of extended RaBitQ as we have done in private communications.

In addition, Elastic shared a doc with us explaining the differences between OSQ and extended RaBitQ, but our understanding is that the core ideas of two methods, i.e., trying different parameters for scalar quantization on a per-vector basis, are similar.

This goes to crux of the disagreement around attribution. In the following we describe the two schemes: extended RaBitQ [1] and OSQ [2] and state again the points we have made privately regarding their differences.

Both methods centre the data, which was originally proposed in LVQ [3] to the best of our knowledge.

Extended RaBitQ proposes a method for constructing a codebook for which one can compute the dot product efficiently using integer arithmetic.

In a nutshell it proceeds as follows:

Apply a random orthogonal transformation of the input vectors.
Normalise resulting vectors and perform an efficient search for the nearest codebook centroid.
Snap the floating point vector to this codebook centroid and then undo the normalization procedure to obtain an estimate of the original dot product.

At its heart it is a variant of product quantization which allows for efficient similarity calculations and was positioned as such in the original paper.

The key ingredients of OSQ are as follows:

Construct per vector quantization intervals, inherited from LVQ.
Formulate quantization as a hyperparameter optimization problem for the interval [a, b] used to construct the grid of possible quantized vectors, building on our earlier work which we published in April [4].
Perform an analysis to show how to choose [a, b] to minimize the expected square error in the dot product when the data are normally distributed using sufficient statistics of the vector component distribution. (We perform an empirical study of the distribution of vector components in embedding models and show they are typically normally distributed.)
Perform an iterative procedure based on coordinate descent which optimizes an anisotropic loss based on the ideas in [5] w.r.t. the interval [a, b] initialising with the output of 3.

At an algorithmic level there is no similarity between the methods other than they both centre the data. At a conceptual level RaBitQ was developed as a variant of product quantization and OSQ was developed as an approach to hyperparameter optimization for per vector scalar quantization. For the record, we were inspired to revisit scalar quantization in light of the success of binary RaBitQ and we were exploring ideas prior to any communication between our teams regarding extended RaBitQ. The actual inspiration for the approach was our previous work on hyperparameter optimisation for quantization intervals and LVQ and we attribute this PR accordingly.

Retrospectively, and in parallel to our work the authors of RaBitQ explored the relationship between extended RaBitQ and scalar quantization directly ([6] published 12/12/2024). They show that the act of normalising the vectors, finding the nearest centroid then undoing the normalisation can be thought of as a process of rescaling the quantization grid so that the nearest quantized vector winds up closer to the original floating point vector. Whilst this bears some relation to optimising b - a they never formulate this process as an optimisation problem and at no point show exactly what quantity they in fact optimise. In our view, this misses the key point that the important idea is the formulation. As such it is very clear that they do not and could not introduce any different optimisation objective, such as anisotopic loss in the dot product. Even if they did, OSQ optimises both a and b directly which gives it an extra degree of freedom and develops an effective and non-trivial and completely distinct solution for this problem.

As we have communicated privately, we dispute that OSQ is a variant of extended RaBitQ. We laid out in broadly similar terms exactly why we disagree at a technical level (along the lines of the discussion in this comment). We have received no further technical engagement with the points we have raised and have therefore reached an impasse.

Regarding this point:

They claimed to have unlocked the performance gain for various bits. However, they ignored the fact that our extended RaBitQ method can already handle various bits, despite that the method was shared with Elastic on 25 Oct 2024 and published on arXiv in Sep 2024.

we at no point intended to imply that RaBitQ could not be extended to support more than 1 bit and that this is the only motivation for our work. The main original motivations are all the benefits we see in being able to achieve high quality representations using a variant of vanilla scalar quantization such as:

Simplicity, and
And the speed with which we can calculate of quantized vectors for any compression rate.

We performed and published a study evaluating binary variants and showed consistent improvements in comparison to binary RaBitQ [2]. The study we performed is perfectly in the spirit of evaluating competing methods and covers multiple datasets and embedding models. Our initial focus was on binary quantization, since this is the only work we were releasing in product at that point. We privately discussed plans with the authors of RaBitQ to perform benchmarking for higher bit counts compared to extended RaBitQ and also to perform larger scale benchmarks.

We were unaware of the [4] at the time the blog [2] was written and we communicated privately that we plan to publish a survey which includes a discussion of both works as well as other pertinent work in the field. We have furthermore stated privately, that this feels like a more appropriate forum in which to discuss the two methods.

[1] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search, https://arxiv.org/pdf/2409.09913
[2] Understanding optimized scalar quantization, https://www.elastic.co/search-labs/blog/scalar-quantization-optimization
[3] Similarity search in the blink of an eye with compressed indices, https://arxiv.org/abs/2304.04759
[4] Scalar quantization optimized for vector databases,
https://www.elastic.co/search-labs/blog/vector-db-optimized-scalar-quantization
[5] Accelerating Large-Scale Inference with Anisotropic Vector Quantization, https://arxiv.org/pdf/1908.10396
[6] Extended RaBitQ: an Optimized Scalar Quantization Method,
https://dev.to/gaoj0017/extended-rabitq-an-optimized-scalar-quantization-method-83m

Binary vector format for flat and hnsw vectors

fbf112a

benwtrent added this to the 10.2.0 milestone Dec 17, 2024

benwtrent mentioned this pull request Dec 17, 2024

Add a Better Binary Quantizer format for dense vectors #13651

Closed

ChrisHegarty added 2 commits December 18, 2024 15:32

test default and panama impls return the same result

62cd45b

add tests for int4BitDotProdut

db10587

mayya-sharipova reviewed Dec 18, 2024

View reviewed changes

mikemccand mentioned this pull request Jan 3, 2025

Change http:// to https:// in our ASL2 copyright header? #14099

Open

benwtrent added 2 commits January 16, 2025 12:42

Merge remote-tracking branch 'upstream/main' into feature/binary-vect…

bdd012b

…or-fmt-optimized-scalar-quant

fixing tests, addressing pr comments

b76d616

github-actions bot added the Stale label Jan 31, 2025

github-actions bot removed the Stale label Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary vector format for flat and hnsw vectors #14078

Binary vector format for flat and hnsw vectors #14078

benwtrent commented Dec 17, 2024 •

edited

Loading

gaoj0017 commented Dec 18, 2024

mayya-sharipova Dec 18, 2024

mayya-sharipova Dec 18, 2024

mayya-sharipova Dec 18, 2024

benwtrent commented Dec 19, 2024

gaoj0017 commented Dec 26, 2024

msokolov commented Dec 30, 2024

mikemccand commented Jan 3, 2025

benwtrent commented Jan 3, 2025

gaoj0017 commented Jan 6, 2025

ChrisHegarty commented Jan 6, 2025

tveasey commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 31, 2025

gaoj0017 commented Feb 8, 2025

tveasey commented Feb 12, 2025

Binary vector format for flat and hnsw vectors #14078

Are you sure you want to change the base?

Binary vector format for flat and hnsw vectors #14078

Conversation

benwtrent commented Dec 17, 2024 • edited Loading

gaoj0017 commented Dec 18, 2024

mayya-sharipova Dec 18, 2024

Choose a reason for hiding this comment

mayya-sharipova Dec 18, 2024

Choose a reason for hiding this comment

mayya-sharipova Dec 18, 2024

Choose a reason for hiding this comment

benwtrent commented Dec 19, 2024

gaoj0017 commented Dec 26, 2024

msokolov commented Dec 30, 2024

mikemccand commented Jan 3, 2025

benwtrent commented Jan 3, 2025

gaoj0017 commented Jan 6, 2025

ChrisHegarty commented Jan 6, 2025

tveasey commented Jan 6, 2025 • edited Loading

github-actions bot commented Jan 31, 2025

gaoj0017 commented Feb 8, 2025

tveasey commented Feb 12, 2025

benwtrent commented Dec 17, 2024 •

edited

Loading

tveasey commented Jan 6, 2025 •

edited

Loading