Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-ndarray computations, cache unit calls, and add slots to DataArray #47

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

stubbiali
Copy link

A few disparate minor changes.

  • Improve integration with packages implementing NumPy's ndarray API (in the specific case: GT4Py) by (i) avoiding axes transposition unless necessary, (ii) accessing the data rather than the values attribute of DataArray, and (iii) avoiding explicit coercion.
  • Cache calls to UnitRegistry to improve performance.
  • Add __slots__ class attribute to DataArray to suppress warning.

for name, value in array_dict.items():
if not isinstance(value, np.ndarray):
array_dict[name] = np.asarray(value)
pass
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary and dirty solution. We could think of a mechanism to control whether arrays must be coerced or not.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't merge this change as-is, what problem is being solved here and what other solutions are available for it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asarray function of Numpy seeks to coerce the input array-like storage value into a ndarray. This operation could break e.g. the data layout and the memory alignment of value. In the specific case of Tasmania, value could be a GT4Py storage, whose low-level details and features are fitted to the target computing architecture and thus must be preserved. The problem can be circumvented by monkey-patching Numpy via the function gt4py.storage.prepare_numpy(), but this is much GT4Py-specific. We could think of a more organic solution, or just pass value to DataArray as it is and let DataArray perform all type checks (and eventually throw exceptions).

@@ -4,6 +4,7 @@


class DataArray(xr.DataArray):
__slots__ = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications of setting this? What warning is it suppressing, and what behavior does it cause when you set this to an empty list?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is aimed to suppress FutureWarning: xarray subclass DataArray should explicitly define __slots__. Here is a nice explanation of how __slots__ work.

@@ -43,13 +43,16 @@ def get_numpy_array(data_array, out_dims, dim_lengths):
dict of dim_lengths that will give the length of any missing dims in the
data_array.
"""
if len(data_array.values.shape) == 0 and len(out_dims) == 0:
return data_array.values # special case, 0-dimensional scalar array
if len(data_array.data.shape) == 0 and len(out_dims) == 0:
Copy link
Owner

@mcgibbon mcgibbon Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters the behavior of this function, which is OK, but the variable names, function name, file name, and docstring need to be updated. For example, I would suggest naming the function something like get_underlying_data.

The tests in test_get_restore_numpy_array.py should also be updated to cover cases where data is not a numpy array (and have similar re-namings).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I will rename the function and update the tests.

for name, value in array_dict.items():
if not isinstance(value, np.ndarray):
array_dict[name] = np.asarray(value)
pass
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't merge this change as-is, what problem is being solved here and what other solutions are available for it?

@mcgibbon
Copy link
Owner

Could you also please update the PR name with a brief description of what these changes do? e.g. "support non-ndarray computations, cache unit calls, add slots to DataArray"? If the name is too long, these can be put into separate PRs. The unit caching for example is mergeable as-is.

@stubbiali stubbiali changed the title Miscellaneous Support non-ndarray computations, cache unit calls, and add slots to DataArray Feb 2, 2022
@stubbiali
Copy link
Author

I think the new name is not too long ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants