Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vecindex: add support for vector index prefix columns #142050

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andy-kimball
Copy link
Contributor

The CREATE VECTOR INDEX syntax allows indexing over multiple columns, as long as the vector column to be indexed is the last column in the index definition. The other "prefix" columns can be used to partition the index by tenants, regions, users, etc. The execution engine encodes prefix columns as a byte slice and passes it as a parameter to vector index operations like Insert and Search. While the index itself treats these bytes as an opaque "TreeKey", the CRDB Store implementation incorporates these prefix bytes into KV keys.

This prefixing mechanism has the effect of separating the index into distinct K-means trees, each identified by a unique TreeKey. CRDB partitioning can control where those trees are located, e.g. an app that stores indexed user photo embeddings in a region close to them.

Epic: CRDB-42943

Release note: None

Copy link

blathers-crl bot commented Feb 26, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

The CREATE VECTOR INDEX syntax allows indexing over multiple columns,
as long as the vector column to be indexed is the last column in the
index definition. The other "prefix" columns can be used to partition
the index by tenants, regions, users, etc. The execution engine
encodes prefix columns as a byte slice and passes it as a parameter
to vector index operations like Insert and Search. While the index
itself treats these bytes as an opaque "TreeKey", the CRDB Store
implementation incorporates these prefix bytes into KV keys.

This prefixing mechanism has the effect of separating the index into
distinct K-means trees, each identified by a unique TreeKey. CRDB
partitioning can control where those trees are located, e.g. an app
that stores indexed user photo embeddings in a region close to them.

Epic: CRDB-42943

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants