pykdtree requires OMP_NUM_THREADS flag on Linux #49

schlegelp · 2021-06-30T08:51:22Z

Just to leave a paper trail for this observation:

I was setting up a large NBLAST on a Linux (Ubuntu 16.04) server and noticed that it was running suspiciously slow. Some digging flagged pykdtree as the culprit: it was much slower than scipy's KDTree solution (order 5-10 fold). This is weird because on my OSX laptop it's the other way around.

A quick glance at the pykdtree Github brought up the OMP_NUM_THREADS environment variable (which I never bothered with on OSX). Setting OMP_NUM_THREADS=4 indeed brings up pykdtree to the expected speed.

FYI: @clbarnes @sdorkenw

The text was updated successfully, but these errors were encountered:

schlegelp · 2021-06-30T12:58:25Z

Another breadcrumb to chase up: while individual KNN queries with pykdtree are faster, it looks like there is some blocking when used with multiprocessing that makes it slower than scipy.

clbarnes · 2021-07-02T10:07:04Z

I think multiprocessing can misbehave if there's more than one layer of concurrency - I think it depends on how everything is implemented (and can be platform-dependent too because the default forking behaviour is different for different OSs). Something like if different multiprocessing processes are sharing some object which has its own locks, they end up sharing that lock too, unless due care is taken.

schlegelp · 2021-07-05T08:59:50Z

Good point!

Confusingly, I managed to get pykdtree to behave similar to on my Mac in both concurrent and non-concurrent contexts by setting OMP_NUM_THREADS=1.

On a related note: I've been working on another NBLAST variant that uses mostly matrix operations via numpy and I observed a similar behaviour. At a certain point throwing more cores at this NBLAST makes it go slower than faster - presumably because numpy already employs some level of concurrency.

schlegelp · 2021-07-14T17:18:04Z

@jefferis had a very good explanation for this behaviour: Apple's clang compiler doesn't support openmp which is why that flag might not need to be set on OSX.

jefferis · 2021-07-14T17:22:04Z

NB if you have homebrew clang, that may support openmp.

schlegelp · 2022-12-12T08:45:49Z

Just to leave a note that threadpoolctl might be handy here to avoid multiple layers of concurrency. Will see if it does the trick.

schlegelp · 2024-06-08T10:02:08Z

Leaving another breadcrumb: on ARM Macs, pykdtree seems to be compiled with openmp support.

Also: I tried threadpoolctl in the past and it didn't work 🤷

schlegelp · 2024-09-05T09:32:27Z

Just to leave a note that since version 1.5.0 (July 2023), navis is automatically setting the OMP_NUM_THREADS flag before spawning the child processes. By default it is set to 1 but that can be changed:

# Increase limit to 2
navis.nbl.nblast_funcs.OMP_NUM_THREADS_LIMIT = 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pykdtree requires OMP_NUM_THREADS flag on Linux #49

pykdtree requires OMP_NUM_THREADS flag on Linux #49

schlegelp commented Jun 30, 2021 •

edited

Loading

schlegelp commented Jun 30, 2021

clbarnes commented Jul 2, 2021

schlegelp commented Jul 5, 2021

schlegelp commented Jul 14, 2021

jefferis commented Jul 14, 2021

schlegelp commented Dec 12, 2022

schlegelp commented Jun 8, 2024

schlegelp commented Sep 5, 2024

pykdtree requires OMP_NUM_THREADS flag on Linux #49

pykdtree requires OMP_NUM_THREADS flag on Linux #49

Comments

schlegelp commented Jun 30, 2021 • edited Loading

schlegelp commented Jun 30, 2021

clbarnes commented Jul 2, 2021

schlegelp commented Jul 5, 2021

schlegelp commented Jul 14, 2021

jefferis commented Jul 14, 2021

schlegelp commented Dec 12, 2022

schlegelp commented Jun 8, 2024

schlegelp commented Sep 5, 2024

schlegelp commented Jun 30, 2021 •

edited

Loading