Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use send rather than ssend to avoid lockup #52

Merged
merged 1 commit into from
Aug 17, 2024
Merged

Conversation

dstndstn
Copy link
Contributor

@dstndstn dstndstn commented May 9, 2024

Hi,

I'm using Ubuntu 20.04, with the OS openmpi package, mpi4py 3.1.5, and schwimmbad 0.3.2. This is on the "symmetry" cluster at Perimeter Institute.

The behavior I'm seeing is that when creating an MPIPool(), I see each worker getting one task, it finishes the task and sends the result back, and the boss receives the result, but the workers never proceed to the next task.

Via some sophisticated printf debugging, I found that the workers were never returning from the self.comm.ssend() call. My wise colleague suggested changing that to self.comm.send(), and then it works perfectly!

I don't think you need any of the synchronization implied by ssend, so this should be fine?

My system details:

$ mpiexec --version
mpiexec (OpenRTE) 4.0.3
$ ls -l $(which mpiexec)
lrwxrwxrwx 1 root root 25 Aug 15 2023 /usr/bin/mpiexec -> /etc/alternatives/mpiexec
$ ls -l /etc/alternatives/mpiexec
lrwxrwxrwx 1 root root 24 Aug 15 2023 /etc/alternatives/mpiexec -> /usr/bin/mpiexec.openmpi

@adrn
Copy link
Owner

adrn commented Jun 3, 2024

Hey! I'm just getting back to work from parental leave, but I'll take a look at this within the next few weeks. Thanks for this!

@adrn
Copy link
Owner

adrn commented Aug 17, 2024

Whoops, where did those months go? Thanks for the patience -- I haven't seen the issue you described, but I also don't know why this was using ssend to begin with (it probably traces back to ye olde MPIPool implementation in emcee, where some of this all started...). So I'm find with changing it to the more standard send! Thanks for catching.

@adrn adrn merged commit 0802aae into adrn:main Aug 17, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants