`floor_divide` returns different result for arrays of floating dtype on GPU and CPU devices #1652

antonwolfy · 2024-04-23T12:43:08Z

In below example the behavior is different between CPU and GPU devices:

import numpy, dpctl, dpctl.tensor as dpt

dpctl.__version__
# Out: '0.17.0dev0+300.g7757857466'

a = dpt.arange(1, 10, dtype='f', device='gpu')
b = dpt.arange(1, 10, dtype='f', device='gpu')
dpt.floor_divide(a, b)
# Out: usm_ndarray([1., 1., 1., 1., 1., 1., 0., 1., 1.], dtype=float32)

a = dpt.arange(1, 10, dtype='f', device='cpu')
b = dpt.arange(1, 10, dtype='f', device='cpu')
dpt.floor_divide(a, b)
# Out: usm_ndarray([1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)

na = numpy.arange(1, 10, dtype='f')
nb = numpy.arange(1, 10, dtype='f')
numpy.floor_divide(na, nb)
# Out: array([1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)

So we have 0 as 7th element of the result array for GPU device and 1 on CPU and in numpy.

If we look into divide function output:

a = dpt.arange(1, 10, dtype='f')
b = dpt.arange(1, 10, dtype='f')

dpt.divide(a, b)
# Out:
# usm_ndarray([1.        , 1.        , 1.        , 1.        , 1.        ,
#             1.        , 0.99999994, 1.        , 1.        ], dtype=float32)

there will be the value 0.99999994 < 1. for GPU device.
Based on the code:

auto div = in1 / in2;
return (div == resT(0)) ? div : resT(sycl::floor(div));

dpctl uses sycl::floor() function, which is intended to return

The value x rounded to an integral value using the round to negative infinity rounding mode

And I guess this is the reason why 0.99999994 rounds to 0 here.
While in Python array API it states that:

Rounds the result of dividing each element x1_i of the input array x1 by the respective element x2_i of the input array x2 to the greatest (i.e., closest to +infinity) integer-value number that is not greater than the division result.

Thus I wonder if it is expected dpctl behavior or an issue.

The text was updated successfully, but these errors were encountered:

ndgrigorian · 2024-04-23T16:41:27Z

The rounding mode is not exactly at fault here. Per array API

Rounds the result of dividing each element x1_i of the input array x1 by the respective element x2_i of the input array x2 to the greatest (i.e., closest to +infinity) integer-value number that is not greater than the division result

In this case, 1.0 > 0.99999994, so 0.0 is the appropriate result. So the behavior checks out per array API. The surprising result is caused by the division itself being inaccurate, possibly due to lower precision on GPU devices.

antonwolfy · 2024-04-23T17:05:58Z

@ndgrigorian, now I see, thank you for the clarification.
Would it be worst then to have a special handling in the code? something like

if (sycl::fmod(in1, in2) == 0) {
    return resT(std::rint(in1/in2);
}

antonwolfy mentioned this issue Apr 26, 2024

Update skipped tests IntelPython/dpnp#1801

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`floor_divide` returns different result for arrays of floating dtype on GPU and CPU devices #1652

`floor_divide` returns different result for arrays of floating dtype on GPU and CPU devices #1652

antonwolfy commented Apr 23, 2024

ndgrigorian commented Apr 23, 2024

antonwolfy commented Apr 23, 2024 •

edited

Loading

floor_divide returns different result for arrays of floating dtype on GPU and CPU devices #1652

floor_divide returns different result for arrays of floating dtype on GPU and CPU devices #1652

Comments

antonwolfy commented Apr 23, 2024

ndgrigorian commented Apr 23, 2024

antonwolfy commented Apr 23, 2024 • edited Loading

`floor_divide` returns different result for arrays of floating dtype on GPU and CPU devices #1652

`floor_divide` returns different result for arrays of floating dtype on GPU and CPU devices #1652

antonwolfy commented Apr 23, 2024 •

edited

Loading