-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38432: [C++][Parquet] Trying to use original logic of BinaryBuilder #38437
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
@ursabot please benchmark |
Benchmark runs are scheduled for commit 2ff9ab4. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
2ff9ab4
to
eba5044
Compare
@ursabot please benchmark |
Benchmark runs are scheduled for commit eba5044. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 2ff9ab4. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit eba5044. There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. |
Looking at the benchmark run, there is a clear improvement in the affected So this seems to solve the issue that is discussed in #38432 |
I didn't find some stable improvement from local benchmark, profile and others But maybe we can first move forward. |
I've revert some BinaryHelper change. I don't know the reason because I don't have the regression workload. Maybe you can take a look |
|
std::optional<int64_t> estimated_remaining_data_length = {}) { | ||
Status PrepareNextInput(int64_t next_value_length) { | ||
if (ARROW_PREDICT_FALSE(!CanFit(next_value_length))) { | ||
// This element would exceed the capacity of a chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part doesn't call RETURN_NOT_OK(acc_->builder->Reserve(entries_remaining_));
, this keeps some logic same as before https://github.com/apache/arrow/pull/14341/files , it leave Prepare()
to allocate the data and don't push when chunk is switched. But I don't know if this is right
return Status::OK(); | ||
} | ||
|
||
Status PrepareNextInputWithEstimatedLength(int64_t next_value_length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split into two functions to make it fast and clear
@@ -1915,7 +1923,6 @@ class DictByteArrayDecoderImpl : public DictDecoderImpl<ByteArrayType>, | |||
int32_t indices[kBufferSize]; | |||
|
|||
ArrowBinaryHelper<ByteArrayType> helper(out, num_values); | |||
RETURN_NOT_OK(helper.Prepare()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I think this Prepare is ok, original code doesn't have this Prepare, perhaps it would make entries large?
@@ -1983,7 +1990,6 @@ class DictByteArrayDecoderImpl : public DictDecoderImpl<ByteArrayType>, | |||
int values_decoded = 0; | |||
|
|||
ArrowBinaryHelper<ByteArrayType> helper(out, num_values); | |||
RETURN_NOT_OK(helper.Prepare(len_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
Can you please explain clearly why you're doing these changes? It's not obvious at all that they are correct. |
That's because:
|
I've created a simpler patch for this. |
Rationale for this change
Do some changes mentioned in #38432
What changes are included in this PR?
std::optional
argument, it might be used in loop, so separate itPrepare
.Are these changes tested?
Already has tests
Are there any user-facing changes?
no