forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-35598: [MATLAB] Add a public
Valid
property to to the `MAT…
…LAB arrow.array.<Array>` classes to query Null values (i.e. validity bitmap support) (apache#35655) ### Rationale for this change Currently, the `arrow.array.<Array>` classes do not support querying the Null values (i.e. validity bitmap) on an Arrow array. Support for encoding Null values is an important part of the Arrow memory format, so the MATLAB Interface to Arrow should support it. There are likely multiple different APIs that the MATLAB interface should have to support Null values robustly. However, to focus on incremental delivery, we can start by adding a public `Valid` property to the `arrow.array.<Array>` classes, which would return a `logical` array of null values in the given array. ### What changes are included in this PR? 1. Added a new public property `Valid` to the `arrow.array.Array` superclass. 2. Implemented basic null value handling for `arrow.array.Float64Array` (i.e. treat `NaN` values in the input MATLAB array as null values in the corresponding `arrow.array.Float64Array`). 3. Implement null value substitution (i.e. substitute null values with `NaN`) for `Float64Array` in `toMATLAB` and `double` conversion methods. Example of creating an `arrow.array.Float64Array` from a MATLAB `double` array containing `NaN` values: ```matlab >> matlabArray = [1, 2, NaN, 4, NaN]' matlabArray = 1 2 NaN 4 NaN >> arrowArray = arrow.array.Float64Array(matlabArray) arrowArray = [ 1, 2, null, 4, null ] >> arrowArray.Valid ans = 5×1 logical array 1 1 0 1 0 >> all(~isnan(matlabArray) == arrowArray.Valid) ans = logical 1 ``` ### Are these changes tested? Yes, we have added the following test points for the `Valid` property of `arrow.array.Float64Array`: 1. `ValidBasic` 2. `ValidNoNulls` 4. `ValidAllNulls` 5. `ValidEmpty` ### Are there any user-facing changes? Yes. There is now a public property `Valid` on the arrow.array.Float64Array` class which is a MATLAB `logical` array encoding the null values in the underlying Arrow array, where `true` indicates an element is valid (i.e. not null) and `false` indicates that an element is invalid (i.e. null). ### Future Directions 1. Implement more null value related methods like `isvalid`, `isnull`, `packagedValidityBitmap`, etc. 2. Add null value (i.e. `Valid` property) support to the rest of the `arrow.array.Array` subclasses. ### Notes 1. Thank you to @ sgilmore10 for your help with this pull request! Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: sgilmore10 <[email protected]> Co-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
- Loading branch information
1 parent
d14b42a
commit 05fe0d2
Showing
11 changed files
with
269 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
63 changes: 63 additions & 0 deletions
63
matlab/src/cpp/arrow/matlab/bit/bit_pack_matlab_logical_array.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#include <cmath> // std::ceil | ||
|
||
#include <arrow/util/bit_util.h> | ||
#include <arrow/util/bitmap_generate.h> | ||
|
||
#include "arrow/matlab/bit/bit_pack_matlab_logical_array.h" | ||
|
||
namespace arrow::matlab::bit { | ||
|
||
// Calculate the number of bytes required in the bit-packed validity buffer. | ||
int64_t bitPackedLength(int64_t num_elements) { | ||
// Since MATLAB logical values are encoded using a full byte (8 bits), | ||
// we can divide the number of elements in the logical array by 8 to get | ||
// the bit packed length. | ||
return static_cast<int64_t>(std::ceil(num_elements / 8.0)); | ||
} | ||
|
||
// Pack an unpacked MATLAB logical array into into a bit-packed arrow::Buffer. | ||
arrow::Result<std::shared_ptr<arrow::Buffer>> bitPackMatlabLogicalArray(const ::matlab::data::TypedArray<bool> matlab_logical_array) { | ||
// Validate that the input arrow::Buffer has sufficient size to store a full bit-packed | ||
// representation of the input MATLAB logical array. | ||
const auto unpacked_buffer_length = matlab_logical_array.getNumberOfElements(); | ||
|
||
// Compute the bit packed length from the unpacked length. | ||
const auto packed_buffer_length = bitPackedLength(unpacked_buffer_length); | ||
|
||
ARROW_ASSIGN_OR_RAISE(auto packed_validity_bitmap_buffer, arrow::AllocateResizableBuffer(packed_buffer_length)); | ||
|
||
// Get pointers to the internal uint8_t arrays behind arrow::Buffer and mxArray | ||
// Get raw bool array pointer from MATLAB logical array. | ||
// Get an iterator to the raw bool data behind the MATLAB logical array. | ||
auto unpacked_bool_data_iterator = matlab_logical_array.cbegin(); | ||
|
||
// Iterate over the mxLogical array and write bit-packed bools to the arrow::Buffer. | ||
// Call into a loop-unrolled Arrow utility for better performance when bit-packing. | ||
auto generator = [&]() -> bool { return *(unpacked_bool_data_iterator++); }; | ||
const int64_t start_offset = 0; | ||
|
||
auto mutable_data = packed_validity_bitmap_buffer->mutable_data(); | ||
|
||
arrow::internal::GenerateBitsUnrolled(mutable_data, start_offset, unpacked_buffer_length, generator); | ||
|
||
return packed_validity_bitmap_buffer; | ||
} | ||
|
||
} |
30 changes: 30 additions & 0 deletions
30
matlab/src/cpp/arrow/matlab/bit/bit_pack_matlab_logical_array.h
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#pragma once | ||
|
||
#include <arrow/buffer.h> | ||
#include <arrow/result.h> | ||
|
||
#include "MatlabDataArray.hpp" | ||
|
||
namespace arrow::matlab::bit { | ||
// Calculate the number of bytes required in the bit-packed validity buffer. | ||
int64_t bitPackedLength(int64_t num_elements); | ||
// Pack an unpacked MATLAB logical array into into a bit-packed arrow::Buffer. | ||
arrow::Result<std::shared_ptr<arrow::Buffer>> bitPackMatlabLogicalArray(const ::matlab::data::TypedArray<bool> matlab_logical_array); | ||
} |
41 changes: 41 additions & 0 deletions
41
matlab/src/cpp/arrow/matlab/bit/bit_unpack_arrow_buffer.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#include "arrow/matlab/bit/bit_unpack_arrow_buffer.h" | ||
|
||
#include "arrow/util/bitmap_visit.h" | ||
|
||
namespace arrow::matlab::bit { | ||
::matlab::data::TypedArray<bool> bitUnpackArrowBuffer(const std::shared_ptr<arrow::Buffer>& packed_buffer, int64_t length) { | ||
const auto packed_buffer_ptr = packed_buffer->data(); | ||
|
||
::matlab::data::ArrayFactory factory; | ||
|
||
const auto array_length = static_cast<size_t>(length); | ||
|
||
auto unpacked_buffer = factory.createBuffer<bool>(array_length); | ||
auto unpacked_buffer_ptr = unpacked_buffer.get(); | ||
auto visitFcn = [&](const bool is_valid) { *unpacked_buffer_ptr++ = is_valid; }; | ||
|
||
const int64_t start_offset = 0; | ||
arrow::internal::VisitBitsUnrolled(packed_buffer_ptr, start_offset, length, visitFcn); | ||
|
||
::matlab::data::TypedArray<bool> unpacked_matlab_logical_Array = factory.createArrayFromBuffer({array_length, 1}, std::move(unpacked_buffer)); | ||
|
||
return unpacked_matlab_logical_Array; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#pragma once | ||
|
||
#include "arrow/buffer.h" | ||
|
||
#include "MatlabDataArray.hpp" | ||
|
||
namespace arrow::matlab::bit { | ||
::matlab::data::TypedArray<bool> bitUnpackArrowBuffer(const std::shared_ptr<arrow::Buffer>& packed_buffer, int64_t length); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
|
||
properties (Hidden, SetAccess=private) | ||
MatlabArray | ||
NullSubstitionValue = NaN; | ||
end | ||
|
||
methods | ||
|
@@ -29,13 +30,23 @@ | |
|
||
validateattributes(data, "double", ["2d", "nonsparse", "real"]); | ||
if ~isempty(data), validateattributes(data, "double", "vector"); end | ||
[email protected]("Name", "arrow.array.proxy.Float64Array", "ConstructorArguments", {data, opts.DeepCopy}); | ||
% Extract missing (i.e. null) values. | ||
% TODO: Determine a more robust approach to handling "detection" of null values. | ||
% For example - add a name-value pair to allow clients to choose which values | ||
% should be considered null (if any). | ||
validElements = ~isnan(data); | ||
[email protected]("Name", "arrow.array.proxy.Float64Array", "ConstructorArguments", {data, opts.DeepCopy, validElements}); | ||
% Store a reference to the array if not doing a deep copy | ||
if (~opts.DeepCopy), obj.MatlabArray = data; end | ||
end | ||
|
||
function data = double(obj) | ||
data = obj.Proxy.toMATLAB(); | ||
data = obj.toMATLAB(); | ||
end | ||
|
||
function matlabArray = toMATLAB(obj) | ||
matlabArray = obj.Proxy.toMATLAB(); | ||
matlabArray(~obj.Valid) = obj.NullSubstitionValue; | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters