-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add sizing explanation to bloom filter docs in parquet #5705
Conversation
Added documentation detailing the sizing of bloom filters in the parquet crate.
//! ```text | ||
//! b = NP2(m/8) / 32 | ||
//! ``` | ||
//! | ||
//! Where, `NP2` denotes *the next power of two*, and `m` is divided by 8 to be represented as bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a more idiomatic way of expressing the next power of two as a function than NP2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 I don't know of one off the top of my head
Looks like Clippy is unhappy after 1.78. I will likely wait for that to get fixed upstream and then merge/rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! ```text | ||
//! b = NP2(m/8) / 32 | ||
//! ``` | ||
//! | ||
//! Where, `NP2` denotes *the next power of two*, and `m` is divided by 8 to be represented as bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 I don't know of one off the top of my head
I believe the clippy failure is unrelated (it is due to the new version of rust, nothing in this PR) |
Updated the bloom filter module doc comment to clarify that the metadata stores the offset/length of the bloom filter, and not the bloom filter in its entirety. Co-authored-by: Andrew Lamb <[email protected]>
PR to fix clippy is here: #5710 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thanks again @hiltontj
I merged up from main to get the clippy fixes The archery integration test failure does not appear to be related: #5719 |
Thanks again @hiltontj ! |
Thanks for closing it out @alamb! |
Added documentation detailing the sizing of bloom filters in the parquet crate.
Which issue does this PR close?
There is currently no issue for this change.
Rationale for this change
There is a storage penalty for using bloom filters in parquet, so I felt having the explanation of how they are sized, based on the chosen FPP and NDV parameters, would be helpful to users.
What changes are included in this PR?
This only includes documentation changes to the
bloom_filter
module within theparquet
crate.Are there any user-facing changes?
This is a documentation change, so will be user-facing.