-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Parquet reading performance regressions #38432
Comments
Although I have to add that we also have "dataset-read" benchmarks (https://conbench.ursa.dev/c-benchmarks/dataset-read), which also benchmark Parquet reading, and there I don't directly see any visible regressions (although those are reading from S3, and might be less sensitive to small regressions in the reading part) Also, the "file-read" benchmark is actually using |
@jorisvandenbossche I can't seem to find the exact benchmark that is producing this metric. |
No, that benchmark is just using a default |
Emmm any profiling message between the benchmark? Or what's the simplest way to re-produce this problem? I think it might related to cache, since pre-buffer on a SSD might not a good way. I don't think encoding is related. |
To be clear, there are potentially two separate issues (maybe it was not the best idea to open a single issue for it). One related to the encoding change, and one related to the pre_buffer/caching change. |
Looking at https://github.com/apache/arrow/pull/14341/files - we abstracted some logic to |
I'll try to run micro-benchmark with https://github.com/apache/arrow/pull/14341/files |
I found Updated: Oh I found |
Looking at the commits in vicinity it seems most likely #14341 has caused this and looking at the timeline it doesn't seem like it's noise (except if something changed with the benchmarking infra). Conbench comparison. |
#38437 |
On eba5044 I ran:
And actually got a small regression (but could be noise):
|
Because DictDecodingByteArray itself is un-stable...T_T |
After revert some change seems the dataset-read doesn't become faster. Maybe we can try to profile the two runing for detail.. |
I posted on the PR as well (#38437 (comment)), but from what I can see, the |
…tByteArrayDecoderImpl (#38784) ### Rationale for this change Do some changes mentioned in #38432 I believe this might fix #38577 Problem1: The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because: 1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related. 2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. Problem2: `Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`. Problem3: `std::optional<int64_t>` is hard to optimize for some compilers ### What changes are included in this PR? Mention the behavior of BinaryHelper. And trying to fix it. ### Are these changes tested? No ### Are there any user-facing changes? Regression fixes * Closes: #38432 Lead-authored-by: mwish <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Gang Wu <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…tByteArrayDecoderImpl (#38784) ### Rationale for this change Do some changes mentioned in #38432 I believe this might fix #38577 Problem1: The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because: 1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related. 2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. Problem2: `Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`. Problem3: `std::optional<int64_t>` is hard to optimize for some compilers ### What changes are included in this PR? Mention the behavior of BinaryHelper. And trying to fix it. ### Are these changes tested? No ### Are there any user-facing changes? Regression fixes * Closes: #38432 Lead-authored-by: mwish <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Gang Wu <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…he DictByteArrayDecoderImpl (apache#38784) ### Rationale for this change Do some changes mentioned in apache#38432 I believe this might fix apache#38577 Problem1: The `BinaryHelper` might call `Prepare()` and `Prepare(estimated-output-binary-length)` for data. This might because: 1. For Plain Encoding ByteArray, the `len_` is similar to the data-page size, so `Reserve` is related. 2. For Dict Encoding. The Data Page is just a RLE encoding Page, it's `len_` might didn't directly related to output-binary. Problem2: `Prepare` using `::arrow::kBinaryMemoryLimit` as min-value, we should use `this->chunk_space_remaining_`. Problem3: `std::optional<int64_t>` is hard to optimize for some compilers ### What changes are included in this PR? Mention the behavior of BinaryHelper. And trying to fix it. ### Are these changes tested? No ### Are there any user-facing changes? Regression fixes * Closes: apache#38432 Lead-authored-by: mwish <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Gang Wu <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Looking at some of the Parquet read benchmarks we have (https://conbench.ursa.dev/c-benchmarks/file-read), there are two small regressions to be observed the last months.
For example, zooming in on that time for nyctaxi_2010 with snappy compression (https://conbench.ursa.dev/benchmark-results/0653775157e07b6980005bce6e59a2ad/):
And for fanniemae (https://conbench.ursa.dev/benchmark-results/0653774cc429733d8000b2db91fa63fa/); here only the first bump is noticeable:
There are two small bumps, which I think are caused by the following two PRs, in order of the bumps:
This PR added a new encoding option, but so my expectation is that it shouldn't have to slow down the default (the new encoding isn't used)? (cc @rok)
While this change is mostly useful for cloud filesystems, and could be detrimental for fast local disk, the general assumption was that this effect for local disk would not be significant, so we could simply enable it for all filesystems. But based on this benchmark it seems the impact might actually not be fully insignificant? Should we consider making the default for
pre_buffer
depend on the type of filesystem?The text was updated successfully, but these errors were encountered: