-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36905: [C++] Add support for SparseUnion to selection functions #36906
Conversation
|
It seems that it reuses the DenseUnion approach, but it would be more efficient to reuse the Struct approach. What do you think? |
Right, I've changed it to the struct approach. But there is room for improvement for SparseUnion: the unselect children can have any value, so we don't have to call take with the same indices for every child. I've left a TODO comment in the code for this. |
We don't, but would it improve anything to use different indices for each child? |
int8_t child_id = typed_values.child_id(index); | ||
child_id_buffer_builder_.UnsafeAppend(type_codes_[child_id]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be doing a pointless back-and-forth between type codes and child ids?
int8_t child_id = typed_values.child_id(index); | |
child_id_buffer_builder_.UnsafeAppend(type_codes_[child_id]); | |
child_id_buffer_builder_.UnsafeAppend(typed_values.type_code(index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -863,6 +920,22 @@ Status DenseUnionFilterExec(KernelContext* ctx, const ExecSpan& batch, ExecResul | |||
return FilterExec<DenseUnionSelectionImpl>(ctx, batch, out); | |||
} | |||
|
|||
Status SparseUnionFilterExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move this into vector_filter_internal.cc
along StructFilterExec
? (can probably also share some code between them...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved and extracted a FilterWithTakeExec
.
.Value(&indices)); | ||
|
||
Datum result; | ||
RETURN_NOT_OK(Take(batch[0].array.ToArrayData(), Datum(indices), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can call SparseUnionTakeExec
directly instead of going through the function lookup and execution machinery again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/source/cpp/compute.rst
Outdated
+---------------+--------+--------------+--------------+--------------+-------------------------+-----------+ | ||
| take | Binary | Any | Integer | Input type 1 | :struct:`TakeOptions` | \(1) \(4) | | ||
| take | Binary | Any | Integer | Input type 1 | :struct:`TakeOptions` | \(4) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this should be
| take | Binary | Any | Integer | Input type 1 | :struct:`TakeOptions` | \(4) | | |
| take | Binary | Any | Integer | Input type 1 | :struct:`TakeOptions` | \(3) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, and the previous line was also wrong.
Judging from https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc#L369, we can make the unneeded indices the same as the needed ones, so that accessing |
Yes, this is quite subtle. Generating the indices arrays would cost much more, so I'm not sure it would be beneficial at the end. |
OK I'll remove that comment. |
- avoid copying type codes array - remove skipping of sliced tests on dense unions - add union-take test with null indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks a lot for this @js8544
Thanks for the improvements! |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit ebcf7bc. There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. |
…ons (apache#36906) ### Rationale for this change Dense unions are already supported in Take, Filter and DropNull but sparse ones are not. ### What changes are included in this PR? Add kernels for sparse unions to those functions. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#36905 Lead-authored-by: Jin Shang <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
Dense unions are already supported in Take, Filter and DropNull but sparse ones are not.
What changes are included in this PR?
Add kernels for sparse unions to those functions.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.