Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38432: [C++][Parquet] Try to fix performance regression in the DictByteArrayDecoderImpl #38784

Merged
merged 11 commits into from
Nov 27, 2023

Conversation

mapleFU
Copy link
Member

@mapleFU mapleFU commented Nov 19, 2023

Rationale for this change

Do some changes mentioned in #38432

I believe this might fix #38577

Problem1:

The BinaryHelper might call Prepare() and Prepare(estimated-output-binary-length) for data. This might because:

  1. For Plain Encoding ByteArray, the len_ is similar to the data-page size, so Reserve is related.
  2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's len_ might didn't directly related to output-binary.

Problem2:

Prepare using ::arrow::kBinaryMemoryLimit as min-value, we should use this->chunk_space_remaining_.

Problem3:

std::optional<int64_t> is hard to optimize for some compilers

What changes are included in this PR?

Mention the behavior of BinaryHelper. And trying to fix it.

Are these changes tested?

No

Are there any user-facing changes?

Regression fixes

@mapleFU
Copy link
Member Author

mapleFU commented Nov 19, 2023

@rok @pitrou I try to figure out how the regression happened, and add comment for it. Would you mind take a look?

cpp/src/parquet/encoding.cc Outdated Show resolved Hide resolved
@rok
Copy link
Member

rok commented Nov 20, 2023

Thanks for working on this @mapleFU !

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 20, 2023
@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2023

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Nov 20, 2023

Benchmark runs are scheduled for commit c75517b. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 20, 2023
@wgtmac wgtmac changed the title GH-38432: [C++][Parquet] Encoding: Dict Arrow Decoder Regression Fix GH-38432: [C++][Parquet] Fix regression in the DictByteArrayDecoderImpl Nov 20, 2023
cpp/src/parquet/encoding.cc Outdated Show resolved Hide resolved
cpp/src/parquet/encoding.cc Outdated Show resolved Hide resolved
Copy link

Thanks for your patience. Conbench analyzed the 5 benchmarking runs that have been run so far on PR commit c75517b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we interpret the benchmark report to verify the regression is fixed?

@jorisvandenbossche
Copy link
Member

How should we interpret the benchmark report to verify the regression is fixed?

The problem is that the machine ursa-i9-9960x is currently not running (not fully sure why), and it's on this one that the affected benchmark is run (see #38437 (comment) in the previous PR where I linked to the relevant in the benchmark run back then)

@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2023

Would you mind help re-run or trigger that running? @jorisvandenbossche 🤔

@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2023

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Nov 20, 2023

Benchmark runs are scheduled for commit f74b4c1. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@pitrou
Copy link
Member

pitrou commented Nov 20, 2023

@mapleFU Can you please make your PR message more descriptive? It should help understand what this PR is about.

@pitrou
Copy link
Member

pitrou commented Nov 20, 2023

Also, I don't understand why this would fix dictionary decoding, and why this would be ok for non-dictionary decoding. This lacks a serious analysis IMHO.

Copy link

Thanks for your patience. Conbench analyzed the 5 benchmarking runs that have been run so far on PR commit f74b4c1.

There were 6 benchmark results indicating a performance regression:

The full Conbench report has more details.

@mapleFU
Copy link
Member Author

mapleFU commented Nov 20, 2023

Also, I don't understand why this would fix dictionary decoding, and why this would be ok for non-dictionary decoding. This lacks a serious analysis IMHO.

This is easy because:

  1. len_ is the size of Page's data payload
  2. PlainByteArrayDecoder's len_ is similiar to the final result(each record might have a length). But for dictionary, the len_ might be unrelated to the ByteArray size, which could make the Program reserve unneccessary memory. Evenmore, it doesn't decrease len_ after each decode.

@mapleFU
Copy link
Member Author

mapleFU commented Nov 24, 2023

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Nov 24, 2023

Benchmark runs are scheduled for commit 60cdb80. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@mapleFU mapleFU marked this pull request as ready for review November 24, 2023 15:03
@mapleFU
Copy link
Member Author

mapleFU commented Nov 24, 2023

@jorisvandenbossche I think problem is fixed, aha

See https://conbench.ursa.dev/compare/runs/f12445ad2fb84f32a924a483ab75e0db...a478ff692d97498eade4936be3d54fcf/

I'll merge this later if no negative comments

RETURN_NOT_OK(acc_->builder->Reserve(entries_remaining_));
}
return Status::OK();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What difference does it make to have a separate method?

Copy link
Member Author

@mapleFU mapleFU Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godbolt.org/z/5bf1K55a5

Found it might not optimized with hot path. Prepare is not in hot path, but PrepareNextInput is.

@pitrou
Copy link
Member

pitrou commented Nov 24, 2023

@mapleFU Please don't merge a change if you don't understand how it works.

@pitrou
Copy link
Member

pitrou commented Nov 24, 2023

Note that @jorisvandenbossche above didn't test the same changes...

@mapleFU
Copy link
Member Author

mapleFU commented Nov 24, 2023

#38784 (comment)

@pitrou I think the critical reason is here. The compiler is hard to optimize:

Status PrepareNextInput(int64_t, std::optional<int64_t> = std::nullopt)

The function above is called frequently in a hot-path. Another two reason is about the #38577

@mapleFU
Copy link
Member Author

mapleFU commented Nov 24, 2023

I've try benchmark here: #38784 (comment)

@pitrou
Copy link
Member

pitrou commented Nov 24, 2023

@pitrou I think the critical reason is here. The compiler is hard to optimize:

Are you able to measure a difference between those two versions of the code? Is there a micro-benchmark?

@mapleFU
Copy link
Member Author

mapleFU commented Nov 24, 2023

Sure, please wait for a minute.

Also I'm trying on LLVM-17, maybe I can simplify the case using godbolt

Copy link

Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 60cdb80.

There were 2 benchmark results indicating a performance regression:

The full Conbench report has more details.

@mapleFU
Copy link
Member Author

mapleFU commented Nov 24, 2023

The current benchmark is out. @jorisvandenbossche would you mind take a look?

@pitrou https://godbolt.org/z/5bf1K55a5 I use same compiler-options, and the std::optional one seems contruct more instr in hot path?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Nov 24, 2023
@jorisvandenbossche
Copy link
Member

Strangely on the latest version I don't directly see any effect locally, but, at least the online benchmarks now seem to confirm some improvement (several of the similar benchmarks are consistent): for example https://conbench.ursa.dev/compare/benchmark-results/0655f5af97857ba780005944ff57195f...06560d57f1f772cc800009452aab4863/

@mapleFU
Copy link
Member Author

mapleFU commented Nov 24, 2023

The difference between current version and 13.0.0 version is it call more BinaryBuilder->Reserve(), I don't think it would make performance worse, so I revert some change in 60cdb80 . Maybe we can considering this later.

This patch also separate PrepareNextInput. Maybe it's a compiler specific problem, I've check a similar case in quick-bench, it doesn't benfit the performance if we remove std::optional<> as argument in gcc-12.3 with C++17. However benchmark shows that the performance has benefits. Maybe we can ask some C++ experts for help.

You can decide how to handle this patch later, maybe I'm becoming crazy because spending long time but don't know why previously, at least we found that:

  1. Reserve might related to the problem, but in ursabot it don't affect the performance a lot
  2. PrepareInput in the patch might make compiler doesn't do some optimizations in ursa bot benchmark

@mapleFU
Copy link
Member Author

mapleFU commented Nov 25, 2023

Will wait util monday night to see any futher request for review

@mapleFU
Copy link
Member Author

mapleFU commented Nov 27, 2023

@pitrou Would you like to checkin this?

@pitrou pitrou changed the title GH-38432: [C++][Parquet] Trying to Fix regression in the DictByteArrayDecoderImpl GH-38432: [C++][Parquet] Try to fix performance regression in the DictByteArrayDecoderImpl Nov 27, 2023
@pitrou pitrou merged commit 6070815 into apache:main Nov 27, 2023
39 of 40 checks passed
@pitrou pitrou removed the awaiting changes Awaiting changes label Nov 27, 2023
@mapleFU mapleFU deleted the dict-decoder-regression-fix branch November 27, 2023 17:27
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 6070815.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

raulcd pushed a commit that referenced this pull request Nov 28, 2023
…tByteArrayDecoderImpl (#38784)

### Rationale for this change

Do some changes mentioned in #38432

I believe this might fix #38577

Problem1:

The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because:

1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related.
2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. 

Problem2:

`Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`.

Problem3:

`std::optional<int64_t>` is hard to optimize for some compilers

### What changes are included in this PR?

Mention the behavior of BinaryHelper. And trying to fix it.

### Are these changes tested?

No

### Are there any user-facing changes?

Regression fixes

* Closes: #38432

Lead-authored-by: mwish <[email protected]>
Co-authored-by: mwish <[email protected]>
Co-authored-by: Gang Wu <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…he DictByteArrayDecoderImpl (apache#38784)

### Rationale for this change

Do some changes mentioned in apache#38432

I believe this might fix apache#38577

Problem1:

The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because:

1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related.
2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. 

Problem2:

`Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`.

Problem3:

`std::optional<int64_t>` is hard to optimize for some compilers

### What changes are included in this PR?

Mention the behavior of BinaryHelper. And trying to fix it.

### Are these changes tested?

No

### Are there any user-facing changes?

Regression fixes

* Closes: apache#38432

Lead-authored-by: mwish <[email protected]>
Co-authored-by: mwish <[email protected]>
Co-authored-by: Gang Wu <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading parquet file behavior change from 13.0.0 to 14.0.0 [C++] Parquet reading performance regressions
7 participants