-
Notifications
You must be signed in to change notification settings - Fork 23
frontier
Currently, OLCF has the same software stack on Frontier and Crusher (since the latter is a development version of the former).
So, the same configuration works for both machines. (Although, note that account numbers on Crusher are appended with _crusher
.)
The SLATE repo can be cloned as
git clone --recursive https://github.com/icl-utk-edu/slate.git
Currently, the GCC stack appears to be the most robust. It can be configured as:
module purge
module load craype-accel-amd-gfx90a craype-x86-trento
module load PrgEnv-gnu rocm
export MPICH_GPU_SUPPORT_ENABLED=1
export CPATH=${ROCM_PATH}/include:${CPATH}
export LIBRARY_PATH=${ROCM_PATH}/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH="${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}"
cat > make.inc << END
CXX=CC
FC=ftn
CXXFLAGS+=-I${ROCM_PATH}/include -craype-verbose -g
LDFLAGS+=-L${ROCM_PATH}/lib -craype-verbose
blas=libsci
gpu_backend=hip
hip_arch=gfx90a
mpi=cray
END
Then, the tester can be compiled with nice make -j 8 tester
.
SLATE can utilize GPU-aware MPI when the SLATE_GPU_AWARE_MPI=1
environement variable is set.
However, there is a known bug when trying to use GPU-aware Cray-MPICH that can result in a hang. So, listBcastMT
should not be enabled (as shown above), and the lookahead must be disabled for some routines (including geqrf
and getrf_nopiv
).
There is an alternative workaround: setting the FI_MR_CACHE_MONITOR=kdreg2
environement variable. However, HPE is still testing this setting, so it should not be used for long-running production jobs. With the setting, it appears that lookaheads can be used as normal and listBcastMT
can be enabled (by adding -DSLATE_HAVE_MT_BCAST
to CXXFLAGS
during compilation). If this setting causes problems, inform Neil so that it can be passed on to HPE.
The SLATE tests can be run as
srun -A $ACCOUNT_NUM -t 5:00 -J slate_example -N 1 -n 8 -c 7 \
--gpus-per-node=8 --ntasks-per-gpu=1 --threads-per-core=1 \
--gpu-bind=closest \
test/tester --type d --target d --nb 512 posv
Alternatively, the job can be submitted as a batch script.
#!/bin/bash
#SBATCH -A $ACCOUNT_NUM
#SBATCH -t 5:00
#SBATCH -J slate_example
#SBATCH -o %x-%j.out
#SBATCH -N 1
srun -n 8 -c 7 --gpus-per-node=8 --ntasks-per-gpu=1 --threads-per-core=1 \
--gpu-bind=closest \
test/tester --target d --type d --nb 512 posv
For getrf
, Cray's OpenMP doesn't support the multi-threaded panel factorization. So, the panel thread count should be set to 1 (--panel-threads
for the tester or slate::Option::MaxPanelThreads
for the routine.)