Skip to content

Commit

Permalink
apacheGH-37597: [MATLAB] Add toMATLAB method to `arrow.array.Chunke…
Browse files Browse the repository at this point in the history
…dArray` class (apache#37613)

### Rationale for this change

Currently, there is no way to easily convert an `arrow.array.ChunkedArray` into a corresponding MATLAB array, other than (1) manually iterating chunk by chunk, (2) calling `toMATLAB` on each chunk, and then (3) concatenating all of the converted chunks together into one contiguous MATLAB array.

It would be helpful to add a toMATLAB method to `arrow.array.ChunkedArray` that abstracts away all of these steps.

### What changes are included in this PR?

1. Added `toMATLAB` method to `arrow.array.ChunkedArray` class
2. Added `preallocateMATLABArray` abstract method to `arrow.type.Type` class. This method is used by the `ChunkedArray` `toMATLAB` to pre-allocate a MATLAB array of the expected class type and shape. This is necessary to ensure `toMATLAB` returns the correct MATLAB array when the `ChunkedArray` has zero chunks. If `toMATLAB` stored the result of calling `toMATLAB` on each chunk in a `cell` array before concatenating the values, `toMATLAB` would return a 0x0 `double` array for zero-chunked arrays. The pre-allocation approach avoids this issue.
3. Implement `preallocateMATLABArray` on all `arrow.type.Type` classes.
4. Added an abstract class `arrow.type.NumericType` that all classes representing numeric data types inherit from. `NumericType` implements `preallocateMATLABArray` for its subclasses.

### Are these changes tested?

Yes. Added unit tests to `tChunkedArray.m`.

### Are there any user-facing changes?

Yes. Users can now call `toMATLAB` on `ChunkedArray`s.

**Example**

```matlab

>> a = arrow.array([1 2 NaN 4 5]);
>> b = arrow.array([6 7 8 9 NaN 11]);
>> c = arrow.array.ChunkedArray.fromArrays(a, b);
>> data = toMATLAB(c)

data =

     1
     2
   NaN
     4
     5
     6
     7
     8
     9
   NaN
    11

```

* Closes: apache#37597

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
  • Loading branch information
sgilmore10 authored Sep 7, 2023
1 parent 3b4a6b1 commit 65e2f22
Show file tree
Hide file tree
Showing 19 changed files with 352 additions and 102 deletions.
11 changes: 11 additions & 0 deletions matlab/src/matlab/+arrow/+array/ChunkedArray.m
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,17 @@
array = traits.ArrayConstructor(proxy);
end

function data = toMATLAB(obj)
data = preallocateMATLABArray(obj.Type, obj.Length);
startIndex = 1;
for ii = 1:obj.NumChunks
chunk = obj.chunk(ii);
endIndex = startIndex + chunk.Length - 1;
data(startIndex:endIndex) = toMATLAB(chunk);
startIndex = endIndex + 1;
end
end

function tf = isequal(obj, varargin)
narginchk(2, inf);

Expand Down
6 changes: 6 additions & 0 deletions matlab/src/matlab/+arrow/+type/BooleanType.m
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,10 @@
groups = matlab.mixin.util.PropertyGroup(targets);
end
end

methods(Hidden)
function data = preallocateMATLABArray(~, length)
data = false([length 1]);
end
end
end
6 changes: 6 additions & 0 deletions matlab/src/matlab/+arrow/+type/DateType.m
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,10 @@
end
end

methods(Hidden)
function data = preallocateMATLABArray(~, length)
data = NaT([length 1]);
end
end

end
14 changes: 4 additions & 10 deletions matlab/src/matlab/+arrow/+type/Float32Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%FLOAT32TYPE Type class for float32 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,23 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Float32Type < arrow.type.FixedWidthType
%FLOAT32TYPE Type class for float32 data.
classdef Float32Type < arrow.type.NumericType

methods
function obj = Float32Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.Float32Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end

methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
14 changes: 4 additions & 10 deletions matlab/src/matlab/+arrow/+type/Float64Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%FLOAT64TYPE Type class for float64 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,23 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Float64Type < arrow.type.FixedWidthType
%FLOAT64Type Type class for float64 data.
classdef Float64Type < arrow.type.NumericType

methods
function obj = Float64Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.Float64Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end

methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
14 changes: 4 additions & 10 deletions matlab/src/matlab/+arrow/+type/Int16Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%INT16TYPE Type class for int16 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,23 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Int16Type < arrow.type.FixedWidthType
%INT16TYPE Type class for int8 data.
classdef Int16Type < arrow.type.NumericType

methods
function obj = Int16Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.Int16Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end

methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
Expand Down
14 changes: 4 additions & 10 deletions matlab/src/matlab/+arrow/+type/Int32Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%INT32TYPE Type class for int32 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,23 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Int32Type < arrow.type.FixedWidthType
%INT32TYPE Type class for int32 data.
classdef Int32Type < arrow.type.NumericType

methods
function obj = Int32Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.Int32Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end

methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
Expand Down
14 changes: 4 additions & 10 deletions matlab/src/matlab/+arrow/+type/Int64Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%INT64TYPE Type class for int64 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,23 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Int64Type < arrow.type.FixedWidthType
%INT64TYPE Type class for int64 data.
classdef Int64Type < arrow.type.NumericType

methods
function obj = Int64Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.Int64Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end

methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
14 changes: 4 additions & 10 deletions matlab/src/matlab/+arrow/+type/Int8Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%INT8TYPE Type class for int8 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,23 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Int8Type < arrow.type.FixedWidthType
%INT8TYPE Type class for int8 data.
classdef Int8Type < arrow.type.NumericType

methods
function obj = Int8Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.Int8Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end

methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
Expand Down
43 changes: 43 additions & 0 deletions matlab/src/matlab/+arrow/+type/NumericType.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
%NUMERICTYPE Type class for numeric data

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef NumericType < arrow.type.FixedWidthType

methods
function obj = NumericType(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy
end

[email protected](proxy);
end
end

methods(Hidden)
function data = preallocateMATLABArray(obj, length)
traits = arrow.type.traits.traits(obj.ID);
data = zeros([length 1], traits.MatlabClassName);
end
end

methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
end
end
end
6 changes: 6 additions & 0 deletions matlab/src/matlab/+arrow/+type/StringType.m
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,11 @@
groups = matlab.mixin.util.PropertyGroup(targets);
end
end

methods(Hidden)
function data = preallocateMATLABArray(~, length)
data = strings(length, 1);
end
end
end

7 changes: 7 additions & 0 deletions matlab/src/matlab/+arrow/+type/TimeType.m
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,11 @@
end
end

methods(Hidden)
function data = preallocateMATLABArray(~, length)
data = NaN([length 1]);
data = seconds(data);
end
end

end
6 changes: 6 additions & 0 deletions matlab/src/matlab/+arrow/+type/TimestampType.m
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,10 @@
groups = matlab.mixin.util.PropertyGroup(targets);
end
end

methods(Hidden)
function data = preallocateMATLABArray(obj, length)
data = NaT([length, 1], TimeZone=obj.TimeZone);
end
end
end
4 changes: 4 additions & 0 deletions matlab/src/matlab/+arrow/+type/Type.m
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ function displayScalarHandleToDeletedObject(obj)
end
end

methods(Abstract, Hidden)
data = preallocateMATLABArray(obj, length)
end

methods (Sealed)
function tf = isequal(obj, varargin)

Expand Down
15 changes: 4 additions & 11 deletions matlab/src/matlab/+arrow/+type/UInt16Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%UINT16TYPE Type class for uint16 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,24 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef UInt16Type < arrow.type.FixedWidthType
%UINT16TYPE Type class for uint16 data.
classdef UInt16Type < arrow.type.NumericType

methods
function obj = UInt16Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.UInt16Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end


methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
15 changes: 4 additions & 11 deletions matlab/src/matlab/+arrow/+type/UInt32Type.m
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
%UINT32TYPE Type class for uint32 data.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
Expand All @@ -13,24 +15,15 @@
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef UInt32Type < arrow.type.FixedWidthType
%UINT32TYPE Type class for uint32 data.
classdef UInt32Type < arrow.type.NumericType

methods
function obj = UInt32Type(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.type.proxy.UInt32Type")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end
end


methods (Access=protected)
function groups = getDisplayPropertyGroups(~)
targets = "ID";
groups = matlab.mixin.util.PropertyGroup(targets);
[email protected](proxy);
end
end
end
Loading

0 comments on commit 65e2f22

Please sign in to comment.