Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
User story
When developing the erase_if and export_batch_if APIs, HKV supported customized predicate functors as input to check whether to erase/export or not.
From the perspective of HKV, value is an array of type V, it only stores the embedding, therefore the interface of predicate functors ignores the value.
As HKV’s user wants to erase/export the table after evaluating the value, we can also see similar usage in std::erase_if(std::map).
So we decided to develop new APIs export_batch_if_v2 and erase_if_v2 to support this feature.
And we think it's more general for HKV to evaluate the whole item [key, score, value] than [key, score].
So we will keep supporting export_batch_if and erase_if for a short term, but will deprecate it in the future.
Design
User predicate functor
The user needs to provide a functor whose template parameters and input and output need to be aligned with the following code.
Please note that the device functor assumes that each thread deals with a KV-pair.
The GroupSize is used when users want to evaluate the value using multi-threads, and HKV supports using a cooperative group to deal with cooperatively. However, the GroupSize is configured by HKV, so users don’t need to instantiate the device function. Instantiating the struct is enough.
Provide two use cases here:
Use case 1: evaluate key, score and partial value.
When users want to evaluate the value using more than one thread, param g comes in handy.
Use case 2: evaluate the whole value, if there exists item is not 0, then return true.
APIs