-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44393: [C++][Compute] Swizzle vector functions #44394
base: main
Are you sure you want to change the base?
Conversation
|
Hi, @felipecrv @pitrou . Could you help to review this? I think this is a necessary building block for special forms (#41094 (comment)). Much appreciated! |
@zanmato1984 I'm not doing code reviews with the same frequency lately, but I might take a look at this one during the week. |
/// | ||
/// For indices[i] = x, reverse_indices[x] = i. And reverse_indices[x] = null if x does | ||
/// not appear in the input indices. For indices[i] = x where x < 0 or x >= output_length, | ||
/// it is ignored. If multiple indices point to the same value, the last one is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this explanation is confusing, but we can work on this later.
Hi @felipecrv , the renaming is done. Would you like to proceed with the review? Thank you. |
/// in the input indices. For indices[i] = x where x < 0 or x > max_index, values[i] | ||
/// is ignored. If multiple indices point to the same value, the last one is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// in the input indices. For indices[i] = x where x < 0 or x > max_index, values[i] | |
/// is ignored. If multiple indices point to the same value, the last one is used. | |
/// in the input indices. For indices[i] = x where x < 0 or x > max_index, values[i] | |
/// is ignored. If multiple indices point to the same value, the last one is used. |
Can you explain the point of the lenient behavior wrt. negative indices and the max_index option?
Is there a use case that this enables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No there isn't a particular case in my mind that index < 0
or index > max_index
can be very useful, so we can alternatively throw exceptions if we see such values, or even not detect them at all.
(And just in case you are curious about the option max_index
itself, it is essential for cases that the input indices are "sparse", please see my other comment #44394 (comment) for details of such usage.)
Thank you.
ArrayKernelExec GenerateInteger(detail::GetTypeId get_id) { | ||
template <template <typename...> class Generator, typename Type0, | ||
typename KernelType = ArrayKernelExec, typename... Args> | ||
KernelType GenerateInteger(detail::GetTypeId get_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this change is for generate exec_chunked
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Just slightly extending it like GenerateNumeric.
/// the same type as the input indices, otherwise must be integer types. An invalid | ||
/// error will be reported if this type is not able to store the length of the input | ||
/// indices. | ||
std::shared_ptr<DataType> output_type = NULLPTR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should nullable being considered here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be decided during the actual computation. If there are "holes" in the output, validity buffer will be allocated and filled on demand.
Rationale for this change
For background please see #44393.
When implementing the "scatter" function requested in #44393, I found it also useful to make it a public vector API. After a painful thinking, I decided to name it "permute". And when implementing permute, I found it fairly easy to implement it by first computing the "reverse indices" of the positions, and then invoking the existing "take", where I think "reverse_indices" itself can also be a useful public vector API. Thus the PR categorized them as "placement functions".
What changes are included in this PR?
Implement placement API
reverse_indices
andpermute
, wherepermute(values, indices)
is implemented astake(values, reverse_indices(indices))
.I also put a small UT
Permute.IfElse
demonstrating howpermute
can serve as a building block of implementing special forms.Are these changes tested?
UT included.
Are there any user-facing changes?
Yes, new public APIs added. Documents updated.