Skip to content

spock netlibscalapack

Mark Gates edited this page Jul 13, 2023 · 1 revision

Spock at Oak Ridge National Laboratory

Installation

Modules

Use the latest ROCm module to get the best performance. As of date, rocm/4.1.0 is the default one but its batch GEMM performance is worse than rocm/4.2.0.

module load rocm/4.2.0
$ module -t list
cce/11.0.4
craype/2.7.6
craype-x86-rome
libfabric/1.11.0.3.74
craype-network-ofi
cray-dsmml/0.1.4
perftools-base/21.02.0
xpmem/2.2.40-2.1_2.7__g3cf3325.shasta
cray-mpich/8.1.4
cray-libsci/21.04.1.1
cray-pmi/6.0.10
cray-pmi-lib/6.0.10
DefApps/default
PrgEnv-cray/8.0.0
rocm/4.2.0

LAPACK

LibSci v21.06 (check CC --cray-print-opts) supports LAPACK 3.5.0 so some kernels (e.g., tpmlqt) do not exist in LibSci. Hence, NETLIB LAPACK is used.

git clone https://github.com/Reference-LAPACK/lapack.git
cd lapack
mkdir build && cd build
CC=cc CXX=CC FC=ftn cmake .. -DBUILD_SHARED_LIBS=ON -DLAPACKE_WITH_TMG=ON -DCBLAS=OFF -DUSE_OPTIMIZED_BLAS=ON
make -j 20
export LAPACK_PATH=$PWD/lib
cd ../..

ScaLAPACK

git clone https://github.com/Reference-ScaLAPACK/scalapack.git
cd scalapack

Copy SLmake.inc.example as SLmake.inc. Use Cray compiler wrappers and -fPIC flag as follows:

FC            = ftn
CC            = cc
FCFLAGS       = -O3 -fPIC
CCFLAGS       = -O3 -fPIC

In SLmake.inc, unset the following variables:

BLASLIB       =
LAPACKLIB     =

Compile:

make
export SCALAPACK_PATH=$PWD
cd ..

SLATE

The installation steps here are tested for commit 859efbd of Slate.

git clone --recursive https://bitbucket.org/icl/slate.git
cd slate

Add the following lines to GNUmakefile after line 290:

# if LibSci
else ifeq ($(blas),libsci)
    FLAGS += -DSLATE_WITH_LIBSCI
    # no LIBS to add
export CPATH=${ROCM_PATH}/include
export LD_LIBRARY_PATH=${LAPACK_PATH}:$LD_LIBRARY_PATH

make.inc file for Slate:

CXX=CC
FC=ftn
CXXFLAGS=-I${ROCM_PATH}/include
LDFLAGS=-L${ROCM_PATH}/lib -L${LAPACK_PATH} -llapack -llapacke
LIBRARY_PATH=${ROCM_PATH}/lib:${SCALAPACK_PATH}:${LAPACK_PATH}
blas=libsci
gpu_backend=hip
mpi=1

Run make -j. The submodules will be configured. After the configuration, change LAPACK version in lapackpp/include/lapack/defines.h as follows:

#define LAPACK_VERSION 30700

Add the following include path to CXXFLAGS in lapackpp/make.inc:

-I${LAPACK_PATH}/../include

Set LIBS in lapackpp/make.inc as follows:

LIBS     = -L${LAPACK_PATH} -llapack -llapacke

Run make clean in lapackpp folder.

Run make -j 20 in slate folder.

The following command will run DGEMM on one MI100. The performance must be around 6 TF/s.

export OMP_NUM_THREADS=1 && srun -A CSC391 -p ecp -t 2:0:0 -N 1 -n 1 --ntasks-per-node=1 --cpus-per-task=1 --threads-per-core=1 --gpus-per-task=1 -J testjob -o %x-%j.out  ./test/tester --type d --nb 2048 --dim 36864 --grid 1x1 --check n --ref n --origin h --target d --repeat 3 gemm
Clone this wiki locally