Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mkl_umath does not bring performance benefits relative to vanilla numpy #1

Open
samaid opened this issue Mar 3, 2023 · 0 comments
Open

Comments

@samaid
Copy link

samaid commented Mar 3, 2023

https://gist.github.com/samaid/bb680421ee29926cc7b8e536ee9a931c

Test was run on Intel DevCloud on TGL node in two setups

  1. STOCK: Clean environment with numpy installed from -c conda-forge
  2. INTEL: Clean environment with numpy installed from -c intel
(intel) u184071@s019-n016:~/repos/dpnp-umath$ python test.py
NP: [0.18639128 0.10316299 0.25168699 ... 0.11474663 0.59490342 0.68693815]
Buffer size: 8192
0.3318898677825928
NP: [0.18639128 0.10316299 0.25168699 ... 0.11474663 0.59490342 0.68693815]
Buffer size: 1600000
0.3113992214202881
UM: [0.18639128 0.10316299 0.25168699 ... 0.11474663 0.59490342 0.68693815]
0.30924224853515625 

(condaforge) u184071@s019-n016:~/repos/dpnp-umath$ python test.py
NP: [0.71962608 0.53769131 0.39456384 ... 0.20209085 0.19296594 0.17458681]
Buffer size: 8192
0.3226659297943115
NP: [0.71962608 0.53769131 0.39456384 ... 0.20209085 0.19296594 0.17458681]
Buffer size: 1600000
0.32870054244995117
No mkl_umath found. Skipping test...

NumPy performance difference between stock and intel is not observed on default buffer size, and only marginally better when numpy.setbufsize() is set to 16*10^5.

This behavior is not observed on SPR node in Intel DevCloud:

(intel) u184071@s018-n003:~/repos/dpnp-umath$ python test.py
NP: [0.71095155 0.23050819 0.1467021  ... 0.26945045 0.18541328 0.83865669]
Buffer size: 8192
0.4312753677368164
NP: [0.71095155 0.23050819 0.1467021  ... 0.26945045 0.18541328 0.83865669]
Buffer size: 1600000
0.04172515869140625
UM: [0.71095155 0.23050819 0.1467021  ... 0.26945045 0.18541328 0.83865669]
0.03204202651977539

(condaforge) u184071@s018-n003:~/repos/dpnp-umath$ python test.py
NP: [0.74352341 0.67897181 0.80952154 ... 0.02458932 0.78159    0.10357044]
Buffer size: 8192
0.34731459617614746
NP: [0.74352341 0.67897181 0.80952154 ... 0.02458932 0.78159    0.10357044]
Buffer size: 1600000
0.3502378463745117
No mkl_umath found. Skipping test..

Looks like no multithreading is exercised on TGL system. Second, default buffer size is too small to get any benefits from multi-threading. According to this chart, multithreading is beneficial with the buffer size greater than 10K and the performance is materially different on sizes 100K-1M:
https://www.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/trigonometric/sin.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant