Support `local_rank` computation with MPI<3 #17

jacobhinkle · 2019-04-11T12:46:09Z

Is your feature request related to a problem? Please describe.
Trying to use mpirun less than version 3 with for example lagomorph lddmm atlas results in an error currently since we can't properly find the local rank.

Describe the solution you'd like
Local rank should be determined in a uniform way regardless of MPI version. We should try the method used now, which is what horovod uses, but fall back to a naive hostname-based method if the import fails.

Describe alternatives you've considered
Previously we did not need to compute local rank because we accepted it on the command line as an argument. This is a bit cumbersome however, and switching to computation means the calling convention for lagomorph which uses pytorch.distributed will match horovod.

Additional context
The following stackoverflow response outlines the basic method we need to fall back to:
https://stackoverflow.com/a/31792540
The steps required are:

On each rank, compute processor name or hostname
Perform an allgather to grab all of the node names
sort the unique node names alphabetically
find integer index of this rank's hostname
use mpi_comm_split with the integer index found in the last step as the "color"
This can all be done inside lagomorph.utils.mpi_local_comm

The text was updated successfully, but these errors were encountered:

This unifies our approach to parallelism. Any command line tool will parse the command line and MPI environment in a uniform way. On Summit, this corresponds to calling jsrun with `jsrun -n<N> -a6 -g6` just as is expected by horovod. We need MPI>=3 in order to find local rank using the method implemented here. In the future, we need a fallback to remove this requirement (see Issue #17).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `local_rank` computation with MPI<3 #17

Support `local_rank` computation with MPI<3 #17

jacobhinkle commented Apr 11, 2019

Support local_rank computation with MPI<3 #17

Support local_rank computation with MPI<3 #17

Comments

jacobhinkle commented Apr 11, 2019

Support `local_rank` computation with MPI<3 #17

Support `local_rank` computation with MPI<3 #17