Skip to content

Using MPICH on Summit@OLCF

Yanfei Guo edited this page May 13, 2022 · 11 revisions

This page describes how to build and use MPICH on the 'Summit' machine at Oak Ridge. Summit is a POWER9-based machine with Infiniband interconnect. The 'UCX' device works best here. Performance is on par with the IBM SpectrumScale MPI (based on OpenMPI).

Prerequisite

  • MPICH 4.0
  • UCX 1.11.0

Note: MPICH embedded UCX, UCX v1.12.0 and v1.12.1 are currently broken with gdrcopy build on Summit.

MPICH needs the following tools (and their default version on Summit as of 05/13/2022) to build on Summit with GPU support.

  • gcc (gcc/8.3.1)
  • cuda (v11.0.3)
  • gdrcopy (v2.3)
  • spectrum-mpi (v10.4.0.3-20210112) optional only if building to use jsrun

Build CUDA-enabled UCX

MPICH by default uses the system default ucx which is not CUDA-aware. It is recommended to install your own UCX library. UCX 1.11.0 release has been tested with MPICH/main branch (commit 44fa0430f9ab).

module load gcc cuda gdrcopy
./configure CC=gcc  CXX=g++ --build=powerpc64le-redhat-linux-gnu --host=powerpc64le-redhat-linux-gnu \
  --with-cuda=$CUDA_DIR --with-gdrcopy=$OLCF_GDRCOPY_ROOT \
  --disable-logging --disable-debug --disable-assertions \
  --prefix=$ucxdir
make -j 8
make install

# $CUDA_DIR is set by cuda module and $OLCF_GDRCOPY_ROOT is set by gdrcopy module

A correctly configured build should display something similar to the following in configure summary.

configure: =========================================================
configure: UCX build configuration:
configure:         Build prefix:   /ccs/home/yguo/summit-proj/install/ucx/1.11.0-cuda-dbg
configure:    Configuration dir:   ${prefix}/etc/ucx
configure:   Preprocessor flags:   -DCPU_FLAGS="" -I${abs_top_srcdir}/src -I${abs_top_builddir} -I${abs_top_builddir}/src
configure:           C compiler:   gcc -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch -Wno-pointer-sign -Werror-implicit-function-declaration -Wno-format-zero-length -Wnested-externs -Wshadow -Werror=declaration-after-statement
configure:         C++ compiler:   g++ -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch
configure:         Multi-thread:   disabled
configure:         NUMA support:   enabled
configure:            MPI tests:   disabled
configure:          VFS support:   no
configure:        Devel headers:   no
configure: io_demo CUDA support:   no
configure:             Bindings:   < >
configure:          UCS modules:   < >
configure:          UCT modules:   < cuda ib rdmacm cma knem >
configure:         CUDA modules:   < gdrcopy >
configure:         ROCM modules:   < >
configure:           IB modules:   < >
configure:          UCM modules:   < cuda >
configure:         Perf modules:   < cuda >
configure: =========================================================

The important details are cuda presents in UCT modules and UCM modules, gdrcopy presents in CUDA modules.

Build MPICH

Build CUDA-enabled MPICH with jsrun

module load gcc spectrum-mpi cuda
./configure --with-device=ch4:ucx --with-ucx=$ucxdir --with-pm=none --with-pmix=$MPI_ROOT \
  --with-cuda=$CUDA_DIR --with-hwloc=embedded CFLAGS=-std=gnu11
make -j 8
make install

# $CUDA_DIR is set by the cuda module. $MPI_ROOT is set by the spectrum-mpi module; Summit-customized pmix is available here.

Build CUDA-enabled MPICH with hydra

module load gcc cuda
./configure --with-device=ch4:ucx --with-ucx=$ucxdir --with-cuda=$CUDA_DIR \
  --with-hwloc=embedded CFLAGS=-std=gnu11
make -j 8
make install

# $CUDA_DIR is set by the cuda module.

A correctly configured MPICH build should print the following message in confiugre output.

*****************************************************
***
*** device      : ch4:ucx
*** shm feature : auto
*** gpu support : CUDA
***
*****************************************************

Running MPI Application

A note about Darshan

The default darshan module loaded on Summit is compiled against OpenMPI-based spectrum-mpi. Unload the darshan-runtime module before running your own MPICH executables. If you are going to use Darshan to collect I/O statistics, building your own is straightforward once you have built MPICH. The following examples all disable darsham-runtime module.

Running MPI with jsrun

module load cuda gdrcopy spectrum-mpi
module unload darshan-runtime
# Launch two ranks each on a separate node and a separate GPU
jsrun -n 2 -r 1 -g 1 --smpiargs="-disable_gpu_hooks" \
    -E UCX_NET_DEVICES=mlx5_0:1 \
    ./test/mpi/pt2pt/pingping \
    -type=MPI_INT -sendcnt=512 -recvcnt=1024 -seed=78 -testsize=4  -sendmem=device -recvmem=device

For more jsrun options, please check Summit User Guide - Job Launcher

Running MPI with hydra

module load cuda gdrcopy spectrum-mpi
module unload darshan-runtime

# Adjust -n for different number of nodes. Example gets hostname of two nodes.
jsrun -n 2 -r 1 hostname > ~/hostfile

export LD_LIBRARY_PATH=$(jsrun -n 1 -r 1 echo $LD_LIBRARY_PATH)
mpiexec -np 2 -ppn 2 --launcher ssh -f <hostfile> -gpus-per-proc=1 \
    -genv UCX_NET_DEVICES=mlx5_0:1 \ ./test/mpi/pt2pt/pingping \
    -type=MPI_INT -sendcnt=512 -recvcnt=1024 -seed=78 -testsize=4  -sendmem=device -recvmem=device

Performance Tweaks

TODO: Add notes regarding UCX_RNDV_THRESH setting

Common Issues

  1. Watch out for system ucx vs mpich built-in ucx. I got some undefined symbols in ucx routines because MPICH was configured to use its own mpich but was picking up system UCX (thanks to spack setting LD_LIBRARY_PATH)

  2. "Invalid communicator" error at PMPI_Comm_size caused by darshan-runtime (see error below). Please run "module unload darshan-runtime" before execution

Abort(671694341) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Comm_size: Invalid communicator, error stack:
PMPI_Comm_size(100): MPI_Comm_size(comm=0xca12a0, size=0x2000000b049c) failed
PMPI_Comm_size(67).: Invalid communicator
  1. CUDA hook error reported when launching with jsrun and use GPU buffers in MPI communication call (see error below). Please add '--smpiargs="-disable_gpu_hooks"' for jsrun.
CUDA Hook Library: Failed to find symbol mem_find_dreg_entries, /autofs/nccs-svm1_home1/minsi/git/mpich.git.main/build-ucx-g-cuda10.1.243/test/mpi/pt2pt/pingping: undefined symbol: __PAMI_Invalidate_region 
  1. "Abort at src/util/mpir_pmi.c line 1105" when launching with jsrun. It is an MPICH/PMIx bug. Please watch out here for temporary workaround and final fix.

  2. "hydra_pmi_proxy: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory" when launching with mpiexec. It is because CUDA library path is not set on each compute node by default, but set by "jsrun". Workaround is to manually set CUDA library path in ~/.bashrc. Note that passing LD_LIBRARY_PATH to mpiexec cannot solve this issue, because hydra_pmi_proxy is launched before transferring environment variables.

# write into ~/.bashrc, here is example for cuda/10.1.243
export LD_LIBRARY_PATH=/sw/summit/cuda/10.1.243/lib64:$LD_LIBRARY_PATH
  1. Cannot find any library linked to executable when launching with mpiexec. Manually transfer LD_LIBRARY_PATH to compute node by setting LD_LIBRARY_PATH=$(jsrun -n 1 -r 1 echo $LD_LIBRARY_PATH) before mpiexec.

  2. Yaksa error "Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2000fbe00008)" when launching with mpiexec and using GPU buffer in MPI communication. You may also say the warning from Yaksa (see below). The reason is that Summit by default sets "EXCLUSIVE_PROCESS" compute node (see here) but does not set "CUDA_VISIBLE_DEVICES". The solution is to use mpiexec GPU binding option "-gpus-per-proc" (e.g., set "-gpus-per-proc=1" will bind one GPU for each rank)

[yaksa] ====> Disabling CUDA support <====
[yaksa] CUDA is setup in exclusive compute mode, but CUDA_VISIBLE_DEVICES is not set
[yaksa] You can silence this warning by setting CUDA_VISIBLE_DEVICES
  1. UCX error "Invalid parameter" reported at MPI init. This error is reported when using MPICH/main with UCX 1.9.0. Confirmed UCX 1.10.0 or an older UCX version does not cause this issue.
Abort(271159951) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
PMPI_Init_thread(103).......: MPI_Init_thread(argc=0x7ffff9c22640, argv=0x7ffff9c22648, required=0, provided=0x7ffff9c2243c) failed
MPII_Init_thread(196).......:
MPID_Init(475)..............:
MPID_Init_world(626)........:
MPIDI_UCX_mpi_init_hook(275):
init_worker(71).............:  ucx function returned with failed status(ucx_init.c 71 init_worker Invalid parameter)