dtype=object array problems in function-on-grid evaluation #1656

leftaroundabout · 2024-08-29T16:58:45Z

This is a cleaned-up version of parts of #1649. See that PR for discussion.

In old versions of NumPy, ODL relied on its capability to represent ragged arrays automatically as arrays of arrays (i.e., of objects). This was in particular used for meshgrids, which are a kind of discretization supported by the interpolation classes in odl.discr.

Current NumPy does not automatically convert to dtype=object anymore, and for good reasons: it is error-prone (shapes become ambiguous, whether to consider the nested array or just its outer structure) and performance / memory locality suffers. In #1633, this was addressed by explicitly generating an object-array specifically for the meshgrid-specifying inputs, but further testing (#1648) revealed that this was not sufficient: the dtype=object property would percolate into the calculations for interpolation and grid generation, and there cause new failures due to required implicit conversion (as well as performance degradation).

This PR goes into the details of the grid generation routines and ensures linear arrays are stored with primitive dtype. It fixes the discretization tests up to NumPy-1.26.

…er of arguments. The ODL tests checked against the previously raised `ValueError`, but in newer versions `TypeError` is raised instead, which these tests could not handle.

…cation. The partition builders are quite liberal (arguably, unnecessarily) in the format of the boolean lists specifying along which dimensions the boundary should be included in the grid and along which not. Previously, these free-form lists were passed through NumPy as a first step of analysing their form, but NumPy itself is now more strict in what can be in an array (unless explicitly asked with `dtype=object`, in which case however a plain list could be used just as well). These checks can also be done in a more direct fashion, presuming that the caller actually follows the specification. Amended due to feedback by Jevgenija in odlgroup#1649 (review).

pep8speaks · 2024-08-29T16:58:53Z

Checking updated PR...

In the file odl/discr/discr_utils.py:

Line 1360:46: E126 continuation line over-indented for hanging indent
Line 1022:36: E126 continuation line over-indented for hanging indent
Line 943:18: E127 continuation line over-indented for visual indent

In the file odl/test/discr/discr_space_test.py:

Line 910:56: E225 missing whitespace around operator
Line 869:11: E127 continuation line over-indented for visual indent

In the file odl/test/space/tensors_test.py:

Line 1545:56: E225 missing whitespace around operator
Line 1505:11: E127 continuation line over-indented for visual indent

Comment last updated at 2024-08-30 14:41:17 UTC

…ns case. Previously, this was more or less automatically done in NumPy, but not anymore. Normally, this is more likely to be user mistake. However, ODL actually relies somewhat on such broadcasts when defining functions on "sparse" mesh grids, so I added this functionality back be recursively transversing lists of different-shape arrays (like in the old version NumPy did, manually generating ragged dtype=object arrays).

JevgenijaAksjonova · 2024-08-30T07:43:23Z

odl/discr/discr_utils.py

-                            np.broadcast_to(res, scalar_out_shape))
+                try:
+                    out_arr = np.asarray(out)
+                except ValueError:


Previously there was a check if scalar_in ==True. Now it is gone. I can't see why it was necessary....

Honestly I can't see why it was necessary in the old version either, but the new check np.isscalar(arr_lists) in line 930 certainly makes sure that scalars are handled correctly.

leftaroundabout · 2024-08-30T12:37:46Z

I seem to have made a mistake while rebasing the changes, some tests are failing. I'll figure it out.

There is a check that the result must have the correct dtype, even if it does not need to have the correct shape.

leftaroundabout · 2024-08-30T14:42:04Z

There was just one more commit missing.

leftaroundabout added 2 commits August 29, 2024 18:44

NumPy changed what exception is raised for ufunc-call with wrong numb…

d664f5f

…er of arguments. The ODL tests checked against the previously raised `ValueError`, but in newer versions `TypeError` is raised instead, which these tests could not handle.

leftaroundabout force-pushed the compat/numpy-noimplicitobjectarray/discrops-grid branch from f17aab5 to 56280be Compare August 29, 2024 17:02

leftaroundabout mentioned this pull request Aug 29, 2024

Fixing issues related to dtype=object arrays in interpolation routines #1649

Closed

JevgenijaAksjonova reviewed Aug 30, 2024

View reviewed changes

Change doc test to have floating zero.

6ead95e

There is a check that the result must have the correct dtype, even if it does not need to have the correct shape.

JevgenijaAksjonova approved these changes Sep 4, 2024

View reviewed changes

JevgenijaAksjonova merged commit dd28ae9 into odlgroup:master Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dtype=object array problems in function-on-grid evaluation #1656

dtype=object array problems in function-on-grid evaluation #1656

leftaroundabout commented Aug 29, 2024

pep8speaks commented Aug 29, 2024 •

edited

Loading

JevgenijaAksjonova Aug 30, 2024

leftaroundabout Aug 30, 2024

leftaroundabout commented Aug 30, 2024

leftaroundabout commented Aug 30, 2024

dtype=object array problems in function-on-grid evaluation #1656

dtype=object array problems in function-on-grid evaluation #1656

Conversation

leftaroundabout commented Aug 29, 2024

pep8speaks commented Aug 29, 2024 • edited Loading

Comment last updated at 2024-08-30 14:41:17 UTC

JevgenijaAksjonova Aug 30, 2024

Choose a reason for hiding this comment

leftaroundabout Aug 30, 2024

Choose a reason for hiding this comment

leftaroundabout commented Aug 30, 2024

leftaroundabout commented Aug 30, 2024

pep8speaks commented Aug 29, 2024 •

edited

Loading