-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-32863: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer #14341
Conversation
37077b0
to
8448815
Compare
note: this should support BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY. PARQUET-2231 |
(Can this patch go ahead now?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this patch is still draft, I run it for fun. Seems that here is some issue. May that helps
I'm going to merge this now. |
Thats a long time, bravo! |
Thanks all for helping this along! I'm very happy we got this in! |
Congratulations, @rok!!! |
Great work everybody, congrats!! |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 94bd0d2. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
We should probably update the python docstring as well: arrow/python/pyarrow/parquet/core.py Lines 822 to 827 in fe750ed
(which was clearly already outdated before this PR as well! ;)) |
Feel free to open a PR :-) |
I created an issue :) #37312 |
RETURN_NOT_OK(helper.builder->ReserveData( | ||
std::min<int64_t>(len_, helper.chunk_space_remaining))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why previously here ReserveData
for min(len_, helper.chunk_space_remaining)
here, wouldn't len_
be too large @pitrou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand: are you commented on the removed code? I'd rather not try to understand code that was removed months ago...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #38437 for context, where this code is being added back partially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand them T_T.
Just confused by this change, so I tried to understand the origin code and find out why this cause the regression, what should I do to fix it
…t writer (apache#14341) This is to add DELTA_BYTE_ARRAY encoder. * Closes: apache#32863 Lead-authored-by: Rok Mihevc <[email protected]> Co-authored-by: Rok <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Gang Wu <[email protected]> Co-authored-by: mwish <[email protected]> Co-authored-by: Will Jones <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
This is to add DELTA_BYTE_ARRAY encoder.