-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove AsContiguousFn #346
Conversation
for (field_idx, field_dtype) in field_dtypes.iter().enumerate() { | ||
let mut field_chunks = Vec::new(); | ||
for chunk in &chunks { | ||
field_chunks.push(chunk.field(field_idx).expect("structarray should contain field")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expect is only really valid if you assert all dtypes are equal at the start of this function. Maybe you check that elsewhere, in which case just document the assumed precondition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by document do you mean in the expect()
string, or document as a comment and then call unwrap
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assumed you meant the former since it seems there's a general preference for expect over unwrap, in which case this is done
vortex-array/src/validity.rs
Outdated
// Else, construct the boolean buffer | ||
let mut buffer = BooleanBufferBuilder::new(validities.iter().map(|v| v.len()).sum()); | ||
for validity in validities { | ||
let present = validity.to_present_null_buffer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a fan of this to_present_null_buffer
function, it can very easily lead to unnecessary allocations. Can just match on the returned optional instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so if I remove the use of to_present_null_buffer
, I pretty much have to write
let present = match validity {
LogicalValidity::AllValid(count) => {
BooleanBuffer::new_set(count)
}
LogicalValidity::AllInvalid(count) => {
BooleanBuffer::new_unset(count)
}
LogicalValidity::Array(array) => {
array.into_array().flatten_bool().expect("validity must flatten to BoolArray").boolean_buffer()
}
};
buffer.append_buffer(&present);
Which is basically identical to the implementation of to_present_null_buffer
, except without the wrapping NullBuffer
. I'm not sure where the unnecessary allocations happen
@@ -29,6 +30,11 @@ impl EncodingCompression for FoREncoding { | |||
return None; | |||
} | |||
|
|||
// For all-null, cannot encode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this, arrays that have all-nulls would fail on the compute_as_cast
a few lines below
Just pushed one more patch
|
@@ -11,6 +12,7 @@ use crate::{Array, IntoArray}; | |||
|
|||
/// The set of encodings that can be converted to Arrow with zero-copy. | |||
pub enum Flattened { | |||
Null(ConstantArray), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for now, we may just want a FLUP issue to add a NullArray so we're 1:1 with Arrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: #348
Since #346, we've been canonicalizing chunked ExtensionArray's incorrectly. Rather than unwrapping each chunk to its backing storage array we've just been building a new chunkedarray of the ExtensionArray chunks. Before #480 , this was actually causing DateTimePartsCompressor to fail in can_compress, so we weren't going down the compression codepath. ## The Fix Fix `into_canonical` so that when we encounter a `ChunkedArray(ExtensionArray(storage))` to yield an `ExtensionArray(ChunkedArray(storage))` where each chunk is canonicalized first (e.g. if you have chunks of DateTimePartsArray you will end up with chunks of ExtensionArray(i64)). Fix the DateTimePartsCompressor to canonicalize its input, to handle that case where it may be a ChunkedArray.
Per the comments on #341 , this PR removes the AsContiguousFn trait and as_contiguous function entirely, and instead pushes the relevant details into the implementation of
ArrayFlatten::flatten
for theChunkedArray
type.A few questions I've bumped into while doing this refactor:
List<Struct>
orList<List>