-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pykdtree requires OMP_NUM_THREADS flag on Linux #49
Comments
Another breadcrumb to chase up: while individual KNN queries with |
I think multiprocessing can misbehave if there's more than one layer of concurrency - I think it depends on how everything is implemented (and can be platform-dependent too because the default forking behaviour is different for different OSs). Something like if different multiprocessing processes are sharing some object which has its own locks, they end up sharing that lock too, unless due care is taken. |
Good point! Confusingly, I managed to get On a related note: I've been working on another NBLAST variant that uses mostly matrix operations via numpy and I observed a similar behaviour. At a certain point throwing more cores at this NBLAST makes it go slower than faster - presumably because numpy already employs some level of concurrency. |
@jefferis had a very good explanation for this behaviour: Apple's clang compiler doesn't support openmp which is why that flag might not need to be set on OSX. |
NB if you have homebrew clang, that may support openmp. |
Just to leave a note that threadpoolctl might be handy here to avoid multiple layers of concurrency. Will see if it does the trick. |
Leaving another breadcrumb: on ARM Macs, Also: I tried |
Just to leave a note that since version # Increase limit to 2
navis.nbl.nblast_funcs.OMP_NUM_THREADS_LIMIT = 2 |
Just to leave a paper trail for this observation:
I was setting up a large NBLAST on a Linux (Ubuntu 16.04) server and noticed that it was running suspiciously slow. Some digging flagged
pykdtree
as the culprit: it was much slower thanscipy
's KDTree solution (order 5-10 fold). This is weird because on my OSX laptop it's the other way around.A quick glance at the
pykdtree
Github brought up theOMP_NUM_THREADS
environment variable (which I never bothered with on OSX). SettingOMP_NUM_THREADS=4
indeed brings uppykdtree
to the expected speed.FYI: @clbarnes @sdorkenw
The text was updated successfully, but these errors were encountered: