-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38432: [C++][Parquet] Try to fix performance regression in the DictByteArrayDecoderImpl #38784
Conversation
Thanks for working on this @mapleFU ! |
@ursabot please benchmark |
Benchmark runs are scheduled for commit c75517b. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
c75517b
to
a7633d0
Compare
Co-authored-by: Gang Wu <[email protected]>
Thanks for your patience. Conbench analyzed the 5 benchmarking runs that have been run so far on PR commit c75517b. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we interpret the benchmark report to verify the regression is fixed?
The problem is that the machine |
Would you mind help re-run or trigger that running? @jorisvandenbossche 🤔 |
@ursabot please benchmark |
Benchmark runs are scheduled for commit f74b4c1. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
@mapleFU Can you please make your PR message more descriptive? It should help understand what this PR is about. |
Also, I don't understand why this would fix dictionary decoding, and why this would be ok for non-dictionary decoding. This lacks a serious analysis IMHO. |
Thanks for your patience. Conbench analyzed the 5 benchmarking runs that have been run so far on PR commit f74b4c1. There were 6 benchmark results indicating a performance regression:
The full Conbench report has more details. |
This is easy because:
|
@ursabot please benchmark |
Benchmark runs are scheduled for commit 60cdb80. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
@jorisvandenbossche I think problem is fixed, aha I'll merge this later if no negative comments |
RETURN_NOT_OK(acc_->builder->Reserve(entries_remaining_)); | ||
} | ||
return Status::OK(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What difference does it make to have a separate method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://godbolt.org/z/5bf1K55a5
Found it might not optimized with hot path. Prepare
is not in hot path, but PrepareNextInput
is.
@mapleFU Please don't merge a change if you don't understand how it works. |
Note that @jorisvandenbossche above didn't test the same changes... |
I've try benchmark here: #38784 (comment) |
Are you able to measure a difference between those two versions of the code? Is there a micro-benchmark? |
Sure, please wait for a minute. Also I'm trying on LLVM-17, maybe I can simplify the case using godbolt |
Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 60cdb80. There were 2 benchmark results indicating a performance regression:
The full Conbench report has more details. |
The current benchmark is out. @jorisvandenbossche would you mind take a look? @pitrou https://godbolt.org/z/5bf1K55a5 I use same compiler-options, and the |
Strangely on the latest version I don't directly see any effect locally, but, at least the online benchmarks now seem to confirm some improvement (several of the similar benchmarks are consistent): for example https://conbench.ursa.dev/compare/benchmark-results/0655f5af97857ba780005944ff57195f...06560d57f1f772cc800009452aab4863/ |
The difference between current version and 13.0.0 version is it call more This patch also separate You can decide how to handle this patch later, maybe I'm becoming crazy because spending long time but don't know why previously, at least we found that:
|
Will wait util monday night to see any futher request for review |
@pitrou Would you like to checkin this? |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 6070815. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…tByteArrayDecoderImpl (#38784) ### Rationale for this change Do some changes mentioned in #38432 I believe this might fix #38577 Problem1: The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because: 1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related. 2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. Problem2: `Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`. Problem3: `std::optional<int64_t>` is hard to optimize for some compilers ### What changes are included in this PR? Mention the behavior of BinaryHelper. And trying to fix it. ### Are these changes tested? No ### Are there any user-facing changes? Regression fixes * Closes: #38432 Lead-authored-by: mwish <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Gang Wu <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…he DictByteArrayDecoderImpl (apache#38784) ### Rationale for this change Do some changes mentioned in apache#38432 I believe this might fix apache#38577 Problem1: The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because: 1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related. 2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. Problem2: `Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`. Problem3: `std::optional<int64_t>` is hard to optimize for some compilers ### What changes are included in this PR? Mention the behavior of BinaryHelper. And trying to fix it. ### Are these changes tested? No ### Are there any user-facing changes? Regression fixes * Closes: apache#38432 Lead-authored-by: mwish <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Gang Wu <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
Do some changes mentioned in #38432
I believe this might fix #38577
Problem1:
The
BinaryHelper
might callPrepare()
andPrepare(estimated-output-binary-length)
for data. This might because:len_
is similar to the data-page size, soReserve
is related.len_
might didn't directly related to output-binary.Problem2:
Prepare
using::arrow::kBinaryMemoryLimit
as min-value, we should usethis->chunk_space_remaining_
.Problem3:
std::optional<int64_t>
is hard to optimize for some compilersWhat changes are included in this PR?
Mention the behavior of BinaryHelper. And trying to fix it.
Are these changes tested?
No
Are there any user-facing changes?
Regression fixes