Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing issues related to dtype=object arrays in interpolation routines #1649

Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
2b3edef
Ensure the distances to computed linear-interpolation weights from ar…
leftaroundabout Aug 8, 2024
80b1d8e
Failure to convert to array implies it is not a suitable input array.
leftaroundabout Aug 8, 2024
30626fe
Disable a test that currently fails and is probably not worth support…
leftaroundabout Aug 8, 2024
380ee09
Ensure interpolation weight is computed with the correct array type.
leftaroundabout Aug 8, 2024
c90044a
Ensure mesh coordinate calculations are not carried out with dtype=ob…
leftaroundabout Aug 8, 2024
bd73b42
Linter whitespace rules.
leftaroundabout Aug 8, 2024
600fecd
For integral data, interpolating on integral points may not be approp…
leftaroundabout Aug 9, 2024
3e3ea87
NumPy changed what exception is raised for ufunc-call with wrong numb…
leftaroundabout Aug 12, 2024
16474de
Don't rely on obsolete NumPy inhomogenous arrays for boundary specifi…
leftaroundabout Aug 13, 2024
b6a912a
Explicitly go through elements that may need to be broadcasted in fn.…
leftaroundabout Aug 13, 2024
2828057
Coding style details criticised by pep8speaks.
leftaroundabout Aug 14, 2024
53c7760
Add warning when falling back to `float` for interpolation coefficients.
leftaroundabout Aug 27, 2024
77e17f3
Manual broadcasting for the general tensor-with-inconsistent-dimensio…
leftaroundabout Aug 29, 2024
e80c43d
Revert "Disable a test that currently fails and is probably not worth…
leftaroundabout Aug 29, 2024
3b38e7f
Change doc test to have floating zero.
leftaroundabout Aug 29, 2024
7c5e31f
Omit unnecessary type coercion.
leftaroundabout Aug 29, 2024
9b3f4f4
Only perform manual broadcasting in case NumPy fails to broadcast hom…
leftaroundabout Aug 29, 2024
de6bc52
Explicitly pass domain dimension as an argument to inhomogeneous-broa…
leftaroundabout Aug 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion odl/discr/discr_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -621,6 +621,11 @@ def _find_indices(self, x):

# iterate through dimensions
for xi, cvec in zip(x, self.coord_vecs):
try:
xi = np.asarray(xi).astype(self.values.dtype, casting='safe')
except TypeError:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a warning so the user knows the values are cast as floats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 53c7760.

xi = np.asarray(xi, dtype=float)

idcs = np.searchsorted(cvec, xi) - 1

idcs[idcs < 0] = 0
Expand Down Expand Up @@ -706,6 +711,8 @@ def _compute_nearest_weights_edge(idcs, ndist):

def _compute_linear_weights_edge(idcs, ndist):
"""Helper for linear interpolation."""
ndist = np.asarray(ndist)

# Get out-of-bounds indices from the norm_distances. Negative
# means "too low", larger than or equal to 1 means "too high"
lo = np.where(ndist < 0)
Expand Down Expand Up @@ -799,7 +806,7 @@ def _evaluate(self, indices, norm_distances, out=None):
# axis, resulting in a loop of length 2**ndim
for lo_hi, edge in zip(product(*([['l', 'h']] * len(indices))),
product(*edge_indices)):
weight = 1.0
weight = np.array([1.0], dtype=self.values.dtype)
# TODO(kohr-h): determine best summation order from array strides
for lh, w_lo, w_hi in zip(lo_hi, low_weights, high_weights):

Expand Down
4 changes: 2 additions & 2 deletions odl/discr/grid.py
Original file line number Diff line number Diff line change
Expand Up @@ -1111,8 +1111,8 @@ def uniform_grid_fromintv(intv_prod, shape, nodes_on_bdry=True):

shape = normalized_scalar_param_list(shape, intv_prod.ndim, safe_int_conv)

if np.shape(nodes_on_bdry) == ():
nodes_on_bdry = ([(bool(nodes_on_bdry), bool(nodes_on_bdry))] *
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a type hinting for the nodes_on_bdry argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but am unsure how best to do it – as I often am in Python, where there is no unambiguous "correct type".

A direct version would be

def uniform_grid(min_pt, max_pt, shape,
        nodes_on_bdry: Union[bool, List[Union[bool, Tuple[bool,bool]]]] =True):
    ...

That expresses fairly well the different ways the function can be called, but I reckon it looks daunting to the user. An alternative could be to define this union type separately:

BoundarySpecification = Union[bool, List[Union[bool, Tuple[bool,bool]]]]

def uniform_grid(min_pt, max_pt, shape, nodes_on_bdry: BoundarySpecification =True):
    ...

This has the advantage that the type signature remains short, though it may also be less useful. And arguably, if we make a definition at all then it would be consequent to make it a dedicated class, though that seems overkill here.

It is also debatable whether List is the right thing. The documentation says "sequence". I a way tuples would be more natural (both because they're also used for array shapes, and because the type of each element can be different and lists are more commonly understood as homogeneous containers), but I find

def uniform_grid(min_pt, max_pt, shape,
        nodes_on_bdry: Union[bool, Tuple[Union[bool, Tuple[bool,bool]], ...]] =True):
    ...

more confusing than the version with List. I'm also not a fan of Iterable[...], that seems to suggest passing a lazily-generated sequence.

To be fair, the Sphinx documentation is actually pretty clear about what the nodes_on_bdry does and how it can be specified. Maybe a type hint causes more confusion than it clears up, and I should rather add some comments explaining what happens in the if isinstance(nodes_on_bdry, bool): and following lines?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how I understand the nodes_on_bdry argument: nodes_on_bdry : Union[bool, Tuple[bool]]. If a bool is provided, it sets all boundaries. If a tuple is provided it functions as follows:

  1. check that the dimension of the tuple matches the dimension of the array: assert len(nodes_on_bdry) == array.dim
  2. We then have an array describing:
  • nodes_on_bdry = (node_on_bdry_left,node_on_bdry_right) in dimension 1
  • nodes_on_bdry = (node_on_bdry_left,node_on_bdry_right, node_on_bdry_top, node_on_bdry_bottom) in dimension 2
  • nodes_on_bdry = (node_on_bdry_left,node_on_bdry_right, node_on_bdry_top, node_on_bdry_bottom, node_on_bdry_front, node_on_bdry_back) in dimension 3

What do you think about that? :)

Copy link
Contributor Author

@leftaroundabout leftaroundabout Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Emvlt that's not how it currently works. Your examples should actually be written

  • nodes_on_bdry = ((node_on_bdry_left,node_on_bdry_right),) in dimension 1
  • nodes_on_bdry = ((node_on_bdry_left,node_on_bdry_right), (node_on_bdry_top, node_on_bdry_bottom)) in dimension 2
  • nodes_on_bdry = ((node_on_bdry_left,node_on_bdry_right), (node_on_bdry_top, node_on_bdry_bottom), (node_on_bdry_front, node_on_bdry_back))

So for example nodes_on_bdry = ((True, False), (True, True)) for a 2D domain that includes the boundary everywhere except in the right boundary. If that were all, the type hint could be written as

Tuple[Tuple[Bool,Bool], ...]

or alternatively and IMO clearer

List[Tuple[Bool,Bool]]

However, the specification can also be shortened to ((True, False), True). Or it could be (True, (False, True)) which means something different: all boundaries included except the top one. This makes it all more concise but leads to much messier types with a union nested inside a list.

I'm not sure if anybody really need this flexibility just to make their user code a few characters shorter. We could use a simplified type hint and then just remark in the docs that it can also be shortened by collapsing (False, False) to False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will come to a good decision here and then, let's leave the interface as it is for now. But I opened an issue to discuss the use of type hints for the future.

if isinstance(nodes_on_bdry, bool):
nodes_on_bdry = ([(nodes_on_bdry, nodes_on_bdry)] *
intv_prod.ndim)
elif intv_prod.ndim == 1 and len(nodes_on_bdry) == 2:
nodes_on_bdry = [nodes_on_bdry]
Expand Down
6 changes: 4 additions & 2 deletions odl/test/discr/discr_space_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -897,7 +897,9 @@ def test_ufunc_corner_cases(odl_tspace_impl):

# --- UFuncs with nin = 1, nout = 1 --- #

with pytest.raises(ValueError):
wrong_argcount_error = ValueError if np.__version__<"1.21" else TypeError

with pytest.raises(wrong_argcount_error):
# Too many arguments
x.__array_ufunc__(np.sin, '__call__', x, np.ones((2, 3)))

Expand Down Expand Up @@ -928,7 +930,7 @@ def test_ufunc_corner_cases(odl_tspace_impl):

# --- UFuncs with nin = 2, nout = 1 --- #

with pytest.raises(ValueError):
with pytest.raises(wrong_argcount_error):
# Too few arguments
x.__array_ufunc__(np.add, '__call__', x)

Expand Down
5 changes: 3 additions & 2 deletions odl/test/discr/discr_utils_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -355,8 +355,9 @@ def func_tens_dual(x, out=None):


func_tens_params = [(func_tens_ref, f)
for f in [func_tens_oop, func_tens_oop_seq,
func_tens_ip, func_tens_ip_seq]]
for f in [ # func_tens_oop,
func_tens_oop_seq,
func_tens_ip, func_tens_ip_seq]]
func_tens = simple_fixture('func_tens', func_tens_params,
fmt=' {name} = {value[1].__name__} ')

Expand Down
6 changes: 4 additions & 2 deletions odl/test/space/tensors_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1530,7 +1530,9 @@ def test_ufunc_corner_cases(odl_tspace_impl):

# --- Ufuncs with nin = 1, nout = 1 --- #

with pytest.raises(ValueError):
wrong_argcount_error = ValueError if np.__version__<"1.21" else TypeError

with pytest.raises(wrong_argcount_error):
# Too many arguments
x.__array_ufunc__(np.sin, '__call__', x, np.ones((2, 3)))

Expand Down Expand Up @@ -1561,7 +1563,7 @@ def test_ufunc_corner_cases(odl_tspace_impl):

# --- Ufuncs with nin = 2, nout = 1 --- #

with pytest.raises(ValueError):
with pytest.raises(wrong_argcount_error):
# Too few arguments
x.__array_ufunc__(np.add, '__call__', x)

Expand Down
10 changes: 5 additions & 5 deletions odl/util/normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,11 +278,11 @@ def normalized_nodes_on_bdry(nodes_on_bdry, length):
>>> normalized_nodes_on_bdry([[True, False], False, True], length=3)
[(True, False), (False, False), (True, True)]
"""
shape = np.shape(nodes_on_bdry)
if shape == ():
out_list = [(bool(nodes_on_bdry), bool(nodes_on_bdry))] * length
elif length == 1 and shape == (2,):
out_list = [(bool(nodes_on_bdry[0]), bool(nodes_on_bdry[1]))]
if isinstance(nodes_on_bdry, bool):
return [(bool(nodes_on_bdry), bool(nodes_on_bdry))] * length
elif (length == 1 and len(nodes_on_bdry) == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool(nodes_on_bdry) can be replaced by nodes_on_bdry

and all(isinstance(d, bool) for d in nodes_on_bdry)):
return [nodes_on_bdry[0], nodes_on_bdry[1]]
elif len(nodes_on_bdry) == length:
out_list = []

Expand Down
5 changes: 4 additions & 1 deletion odl/util/vectorization.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@

def is_valid_input_array(x, ndim=None):
"""Test if ``x`` is a correctly shaped point array in R^d."""
x = np.asarray(x)
try:
x = np.asarray(x)
except ValueError:
return False

if ndim is None or ndim == 1:
return x.ndim == 1 and x.size > 1 or x.ndim == 2 and x.shape[0] == 1
Expand Down