-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update vector collection tuning tips and describe efSearch hint [AI-195] [AI-192] #1521
Conversation
✅ Deploy Preview for hardcore-allen-f5257d ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas...
1. For searches with small `topK` (for example, 10) it may be beneficial to artificially increase `topK`, adjust `partitionLimit` accordingly, and discard extra results. If you need 10 results, a good starting point for tuning could be `topK=100` and a `partitionLimit` between 50 and 100. While this will make the search slower, it will also improve quality, sometimes significantly. Overall, this setup can be more efficient than increasing index build parameters (`max-degree`, `ef-construction`) which results in slower index builds and searches. With a very small `topK` or `paritionLimit`, the search algorithm is less able to escape local minima and find the best results. | ||
2. Vector deduplication does not incur significant overhead for uploads (usually less than 1%) and searches. You may consider disabling it to get slightly better performance and smaller memory usage if your dataset does not contain duplicated vectors. However, be aware that in the presence of many duplicated vectors with deduplication disabled, a similarity search may return poor quality results. | ||
3. For a given query, each vector index partition is searched by 1 thread. The number of concurrent partition searches is configured by specifying a pool size for `hz:query` executor, which by default has 16 threads per member. If optimizing for search, we recommend setting the `hz:query` pool size to be that of the physical core count of your host machines: this will result in a good balance between search throughput and CPU utilization. Setting `hz:query` to have a pool size greater than that of the physical core count will not deliver a significant increase in throughput but it will increase total CPU utilization. The `hz:query` pool size can be changed as follows: | ||
1. Enable Vector API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to mention how to do that, by using --add-modules jdk.incubator.vector
or link to the appropriate document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -93,12 +93,24 @@ To allow the system to return enough results, the following conditions must be s | |||
|
|||
- `partitionLimit * partitionCount >= topK`, `partitionLimit <= topK` | |||
- `memberLimit * memberCount >= topK`, `memberLimit <= topK` | |||
- `efSearch >= partitionLimit`, if `partitionLimit` is not configured explicitly this applies to the default `partitionLimit` value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIS the default value of partitionLimit
is not mentioned anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the exact formula is complicated (https://github.com/hazelcast/hazelcast-mono/pull/3258). I described it generally:
partitionLimit
is calculated based ontopK
and cluster configuration (number of partitions)
4. Before increasing index build parameters (`max-degree`, `ef-construction`) which would result in slower index builds and searches and a larger index, | ||
test if adjusting `efSearch` gives satisfactory results. | ||
5. Vector deduplication does not incur significant overhead for uploads (usually less than 1%) and searches. You may consider disabling it to get slightly better performance and smaller memory usage if your dataset does not contain duplicated vectors. However, be aware that in the presence of many duplicated vectors with deduplication disabled, a similarity search may return poor quality results. | ||
6. For a given query, each vector index partition is searched by 1 thread. The number of concurrent partition searches is configured by specifying a pool size for `hz:query` executor, which by default has 16 threads per member. If optimizing for search, we recommend setting the `hz:query` pool size to be that of the physical core count of your host machines: this will result in a good balance between search throughput and CPU utilization. Setting `hz:query` to have a pool size greater than that of the physical core count will not deliver a significant increase in throughput but it will increase total CPU utilization. The `hz:query` pool size can be changed as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6. For a given query, each vector index partition is searched by 1 thread. The number of concurrent partition searches is configured by specifying a pool size for `hz:query` executor, which by default has 16 threads per member. If optimizing for search, we recommend setting the `hz:query` pool size to be that of the physical core count of your host machines: this will result in a good balance between search throughput and CPU utilization. Setting `hz:query` to have a pool size greater than that of the physical core count will not deliver a significant increase in throughput but it will increase total CPU utilization. The `hz:query` pool size can be changed as follows: | |
6. For a given query, each vector index partition is searched by one thread. The number of concurrent partition searches is configured by specifying a pool size for `hz:query` executor, which by default has 16 threads per member. If optimizing for search, we recommend setting the `hz:query` pool size to be that of the physical core count of your host machines: this will result in a good balance between search throughput and CPU utilization. Setting `hz:query` to have a pool size greater than that of the physical core count will not deliver a significant increase in throughput but it will increase total CPU utilization. The `hz:query` pool size can be changed as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is hz:query
is changed? It would be good to link to the relevant doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an example below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor edits but a few questions on the table values. Not suggested any changes to these pages that are not in this PR but I do see a few issues which we can come back to in a later update e.g. page name vs. title, explicitly introducing code samples (helps with Ask AI)
@@ -718,6 +719,9 @@ You can use hints to fine-tune search precision, especially with smaller `limit` | |||
|=== | |||
|Hint|Description | |||
|
|||
|efSearch | |||
|Size of list of potential candidates during search. Larger value results in better precision but slower execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the format of this value? i.e. how should the user use this hint?
Can we add a type or default to the table overall?
Also we should explicitly introduce the examples in general e.g. The following code example shows how to add search options and hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add these hints to the example (not sure if you should combine them though) then this would suffice rather than adding detail about type of value needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Oliver Howell <[email protected]>
…tor-tuning-6.0-part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing comments - not sure if you've finished edits but approving from docs perspective
There were important changes in 6.0 impacting vector collection tuning: introduction of
efSearch
hint and default value forpartitionLimit
.After https://hazelcast.atlassian.net/browse/AI-192 mutating operations do not fail during optimization.