-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update vector collection tuning tips and describe efSearch hint [AI-195] [AI-192] #1521
Changes from all commits
8ba7a75
dd39b6d
8476db7
bd4ffae
ba08fbc
4ab11dd
ee6a581
8bd114e
cbc1d87
18288d0
ce86547
8ecff5e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,7 +84,7 @@ The default search algorithm is a two-stage search which works as follows: | |
|
||
At each stage, aggregation is based on score and only the best results are retained. | ||
|
||
Two important parameters in this search algorithm determine the amount of data sent between the members and the quality of the final result. These parameters are as follows: | ||
Two parameters in this search algorithm determine the amount of data sent between the members and the quality of the final result. These parameters are as follows: | ||
|
||
- `partitionLimit` - number of search results obtained from each partition | ||
- `memberLimit` - number of search results returned from member to coordinator | ||
|
@@ -93,12 +93,22 @@ To allow the system to return enough results, the following conditions must be s | |
|
||
- `partitionLimit * partitionCount >= topK`, `partitionLimit <= topK` | ||
- `memberLimit * memberCount >= topK`, `memberLimit <= topK` | ||
- `efSearch >= partitionLimit`, if `partitionLimit` is not configured explicitly this applies to the default `partitionLimit` value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIS the default value of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the exact formula is complicated (https://github.com/hazelcast/hazelcast-mono/pull/3258). I described it generally:
|
||
|
||
By default, `partitionLimit` and `memberLimit` are equal to `topK`. While this satisfies the inequalities given above, it can result in the processing of more results than requested. | ||
This improves the overall quality of the results but can have a significant performance overhead because more entries are fetched from each partition of the index and sent between the members. | ||
By default, `memberLimit` is equal to `topK` and `partitionLimit` is calculated based on `topK` and cluster configuration (number of partitions) | ||
in a way that is unlikely to cause quality degradation. | ||
|
||
[TIP] | ||
==== | ||
Consider tuning `efSearch` based on quality and throughput/latency requirements. | ||
==== | ||
|
||
[NOTE] | ||
==== | ||
Heuristics for `partitionLimit` assume that data (vectors) is distributed uniformly in partitions. If this is not the case, for example if the closest neighbours reside only in a single or a few partitions, the default value of `partitionLimit` may negatively impact search quality. In such a case consider increasing the `partitionLimit`. | ||
|
||
NOTE: Consider tuning `partitionLimit` based on quality and latency requirements. The number of partitions must also be considered and updated as required when making adjustments to `partitionLimit`. For further information on the implications of the partition count, see <<partition-count-impact, Partition Count Impact>>. | ||
`memberLimit` is less critical for overall behavior if there are only a few members. | ||
==== | ||
|
||
[graphviz] | ||
.... | ||
|
@@ -164,7 +174,7 @@ It is used where the cluster has only a single member, or can be enabled using s | |
A single-stage search request is executed in parallel on all partitions (on their owners) | ||
and partition results are aggregated directly on the coordinator member to produce the final result. | ||
|
||
This search algorithm uses the `partitionLimit` parameter, which behaves in the same way as for two-stage search. | ||
This search algorithm uses `efSearch` and `partitionLimit` parameters, which behave in the same way as for two-stage search. | ||
|
||
[graphviz] | ||
.... | ||
|
@@ -221,7 +231,7 @@ The number of partitions has a big impact on the performance of the vector colle | |
After this point, more partitions will not significantly improve ingestion speed. | ||
If there are fewer partitions than number of cores, not all available resources will be utilized during ingestion because updates on a given partition are executed by single thread. | ||
- *similarity search*: in general, having fewer partitions results in better search performance and reduced latency. | ||
However, the impact on quality/recall is complicated and depends also on `partitionLimit`. | ||
However, the impact on quality/recall is complicated and depends also on `efSearch` and `partitionLimit` values. | ||
- *migration*: avoid partitions with a large memory size, including metadata, vectors and vector index internal representation. | ||
In general, the recommendation is for a partition size of around 50-100MB per partition, which results in fast migrations and small pressure on heap during migration. | ||
However, for vector search, the partition size can be increased above that general recommendation provided that there is enough heap memory for migrations (see below). | ||
|
@@ -230,21 +240,24 @@ The number of partitions has a big impact on the performance of the vector colle | |
NOTE: It is not possible to change the number of partitions for an existing cluster. | ||
|
||
[CAUTION] | ||
.For this Beta version, the following apply: | ||
.For this Beta version, the following recommendations apply: | ||
==== | ||
. The default value of 271 partitions can result in inefficient vector similarity searches. | ||
We recommend that you tune the number of partitions for use in clusters with vector collections. | ||
|
||
. The entire collection partition is migrated as a single chunk. | ||
The entire collection partition is migrated as a single chunk. | ||
If using partitions that are larger than the recommended size, ensure that you have sufficient heap memory to run migrations. The amount of heap memory required is approximately the size of the vector collection partition multiplied by the number of parallel migrations. | ||
To decrease pressure on heap memory, you can decrease the number of parallel migrations using `hazelcast.partition.max.parallel.migrations` and `hazelcast.partition.max.parallel.replications`. | ||
==== | ||
|
||
== Tuning tips | ||
|
||
1. For searches with small `topK` (for example, 10) it may be beneficial to artificially increase `topK`, adjust `partitionLimit` accordingly, and discard extra results. If you need 10 results, a good starting point for tuning could be `topK=100` and a `partitionLimit` between 50 and 100. While this will make the search slower, it will also improve quality, sometimes significantly. Overall, this setup can be more efficient than increasing index build parameters (`max-degree`, `ef-construction`) which results in slower index builds and searches. With a very small `topK` or `paritionLimit`, the search algorithm is less able to escape local minima and find the best results. | ||
2. Vector deduplication does not incur significant overhead for uploads (usually less than 1%) and searches. You may consider disabling it to get slightly better performance and smaller memory usage if your dataset does not contain duplicated vectors. However, be aware that in the presence of many duplicated vectors with deduplication disabled, a similarity search may return poor quality results. | ||
3. For a given query, each vector index partition is searched by 1 thread. The number of concurrent partition searches is configured by specifying a pool size for `hz:query` executor, which by default has 16 threads per member. If optimizing for search, we recommend setting the `hz:query` pool size to be that of the physical core count of your host machines: this will result in a good balance between search throughput and CPU utilization. Setting `hz:query` to have a pool size greater than that of the physical core count will not deliver a significant increase in throughput but it will increase total CPU utilization. The `hz:query` pool size can be changed as follows: | ||
1. Enable xref:vector-collections.adoc#jvm-configuration[Vector API]. | ||
2. Prefer the DOT metric with normalized vectors over the COSINE metric if your use case does not require the COSINE metric. | ||
3. Adjust `efSearch` to achieve the desired balance between throughput/latency and precision. | ||
By default `efSearch = topK`. | ||
For searches with small `topK` (for example, 1 - 10), it may be beneficial to use a larger value to get better precision. | ||
For large `topK` (for example 100), a smaller `efSearch` value will give better performance with only a potentially small and acceptable decrease in precision. | ||
4. Test if adjusting `efSearch` gives satisfactory results before increasing index build parameters (`max-degree`, `ef-construction`) which would result in slower index builds and searches, and a larger index. | ||
5. Vector deduplication does not incur significant overhead for uploads (usually less than 1%) and searches. You may consider disabling it to get slightly better performance and smaller memory usage if your dataset does not contain duplicated vectors. However, be aware that in the presence of many duplicated vectors with deduplication disabled, a similarity search may return poor quality results. | ||
6. For a given query, each vector index partition is searched by one thread. The number of concurrent partition searches is configured by specifying a pool size for `hz:query` executor, which by default has 16 threads per member. If optimizing for search, we recommend setting the `hz:query` pool size to be that of the physical core count of your host machines; this will result in a good balance between search throughput and CPU utilization. Setting `hz:query` to have a pool size greater than that of the physical core count will not deliver a significant increase in throughput but it will increase total CPU utilization. The `hz:query` pool size can be changed as follows: | ||
+ | ||
[tabs] | ||
==== | ||
|
@@ -285,4 +298,6 @@ hazelcast: | |
---- | ||
==== | ||
+ | ||
4. If there are fewer partitions than available cores, not all cores will be used for single search execution. This is ok if you are focused on throughput, as in general fewer partitions means you need less resources. However, if you want to achieve the best latency for a single client, it is better to distribute the search to as many cores as possible, which requires having at least as many partitions as cores in the cluster. | ||
7. Decreasing the number of partitions can improve query performance but has xref:partition-count-impact[significant impact on the entire cluster]. | ||
8. If there are fewer partitions than available cores, not all cores will be used for single search execution. This is ok if you are focused on throughput, as in general fewer partitions means you need less resources. However, if you want to achieve the best latency for a single client, it is better to distribute the search to as many cores as possible, which requires having at least as many partitions as cores in the cluster. | ||
9. The `vectorCollection.searchIndexVisitedNodes` metric can be helpful to understand vector search performance. If the fraction of number of nodes visited per search to collection size is high, this may indicate that vector index is not beneficial in the given case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the format of this value? i.e. how should the user use this hint?
Can we add a type or default to the table overall?
Also we should explicitly introduce the examples in general e.g. The following code example shows how to add search options and hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add these hints to the example (not sure if you should combine them though) then this would suffice rather than adding detail about type of value needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added type in 8bd114e and example in 18288d0.
Most hints should not be normally used, except
efSearch
which maybe will be promoted to the full-fledged SearchOption in the future andpartitionLimit
in case of skewed distribution. Other are useful for advanced tuning and benchmarking.