Indexing performance #1249

npolina4 · 2023-06-14T23:59:05Z

import dpctl.tensor as dpt
a = dpt.ones((8192, 8192), device='cpu', dtype='f4')
b = dpt.ones((8192, 8192), device='cpu', dtype=bool)
%timeit a[b]
#211 ms ± 6.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

import numpy
a_np = numpy.ones((8192, 8192), dtype='f4')
b_np = numpy.ones((8192, 8192), dtype=bool)
%timeit a_np[b_np]
#87.1 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

oleksandr-pavlyk · 2023-07-27T12:48:18Z

This should be improved by changes in gh-1300. @npolina4 could you please post timeit results on the same machine you used to obtain reported numbers in the original comment?

npolina4 · 2023-07-28T17:46:07Z

Result with changes in #1300
Size: 8192, 8192
numpy: 105 ms
cpu: 205 ms
gpu: 115 ms

Size: 4096, 4096
numpy: 24.5 ms
cpu: 45~80 ms
gpu: 21.4 ms

Changed hyperparameter choices to be different for CPU and GPU, resulting in 20% performance gain on GPU. The non-recursive implementation allows to avoid repeated USM allocations, resulting in performance gains for large arrays. Furthermore, corrected base step kernel to accumulate in outputT rather than in size_t, which additionally realizes savings when int32 is used as accumulator type. Using example from gh-1249, previously, on my Iris Xe laptop: ``` In [1]: import dpctl.tensor as dpt ...: ag = dpt.ones((8192, 8192), device='gpu', dtype='f4') ...: bg = dpt.ones((8192, 8192), device='gpu', dtype=bool) In [2]: cg = ag[bg] In [3]: dpt.all(cg == dpt.reshape(ag, -1)) Out[3]: usm_ndarray(True) In [4]: %timeit -n 10 -r 3 cg = ag[bg] 212 ms ± 56 ms per loop (mean ± std. dev. of 3 runs, 10 loops each) ``` while with this change: ``` In [4]: %timeit -n 10 -r 3 cg = ag[bg] 178 ms ± 24.2 ms per loop (mean ± std. dev. of 3 runs, 10 loops each) ```

oleksandr-pavlyk added the performance Code performance label Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing performance #1249

Indexing performance #1249

npolina4 commented Jun 14, 2023 •

edited

Loading

oleksandr-pavlyk commented Jul 27, 2023

npolina4 commented Jul 28, 2023

Indexing performance #1249

Indexing performance #1249

Comments

npolina4 commented Jun 14, 2023 • edited Loading

oleksandr-pavlyk commented Jul 27, 2023

npolina4 commented Jul 28, 2023

npolina4 commented Jun 14, 2023 •

edited

Loading