-
Notifications
You must be signed in to change notification settings - Fork 23
spock netlibscalapack
Use the latest ROCm module to get the best performance. As of date, rocm/4.1.0
is the default one but its batch GEMM performance is worse than rocm/4.2.0
.
module load rocm/4.2.0
$ module -t list
cce/11.0.4
craype/2.7.6
craype-x86-rome
libfabric/1.11.0.3.74
craype-network-ofi
cray-dsmml/0.1.4
perftools-base/21.02.0
xpmem/2.2.40-2.1_2.7__g3cf3325.shasta
cray-mpich/8.1.4
cray-libsci/21.04.1.1
cray-pmi/6.0.10
cray-pmi-lib/6.0.10
DefApps/default
PrgEnv-cray/8.0.0
rocm/4.2.0
LibSci v21.06 (check CC --cray-print-opts
) supports LAPACK 3.5.0 so some kernels (e.g., tpmlqt
) do not exist
in LibSci. Hence, NETLIB LAPACK is used.
git clone https://github.com/Reference-LAPACK/lapack.git
cd lapack
mkdir build && cd build
CC=cc CXX=CC FC=ftn cmake .. -DBUILD_SHARED_LIBS=ON -DLAPACKE_WITH_TMG=ON -DCBLAS=OFF -DUSE_OPTIMIZED_BLAS=ON
make -j 20
export LAPACK_PATH=$PWD/lib
cd ../..
git clone https://github.com/Reference-ScaLAPACK/scalapack.git
cd scalapack
Copy SLmake.inc.example
as SLmake.inc
.
Use Cray compiler wrappers and -fPIC
flag as follows:
FC = ftn
CC = cc
FCFLAGS = -O3 -fPIC
CCFLAGS = -O3 -fPIC
In SLmake.inc
, unset the following variables:
BLASLIB =
LAPACKLIB =
Compile:
make
export SCALAPACK_PATH=$PWD
cd ..
The installation steps here are tested for commit 859efbd of Slate.
git clone --recursive https://bitbucket.org/icl/slate.git
cd slate
Add the following lines to GNUmakefile
after line 290:
# if LibSci
else ifeq ($(blas),libsci)
FLAGS += -DSLATE_WITH_LIBSCI
# no LIBS to add
export CPATH=${ROCM_PATH}/include
export LD_LIBRARY_PATH=${LAPACK_PATH}:$LD_LIBRARY_PATH
make.inc
file for Slate:
CXX=CC
FC=ftn
CXXFLAGS=-I${ROCM_PATH}/include
LDFLAGS=-L${ROCM_PATH}/lib -L${LAPACK_PATH} -llapack -llapacke
LIBRARY_PATH=${ROCM_PATH}/lib:${SCALAPACK_PATH}:${LAPACK_PATH}
blas=libsci
gpu_backend=hip
mpi=1
Run make -j
. The submodules will be configured. After the configuration,
change LAPACK version in lapackpp/include/lapack/defines.h
as follows:
#define LAPACK_VERSION 30700
Add the following include path to CXXFLAGS
in lapackpp/make.inc
:
-I${LAPACK_PATH}/../include
Set LIBS
in lapackpp/make.inc
as follows:
LIBS = -L${LAPACK_PATH} -llapack -llapacke
Run make clean
in lapackpp
folder.
Run make -j 20
in slate
folder.
The following command will run DGEMM on one MI100. The performance must be around 6 TF/s.
export OMP_NUM_THREADS=1 && srun -A CSC391 -p ecp -t 2:0:0 -N 1 -n 1 --ntasks-per-node=1 --cpus-per-task=1 --threads-per-core=1 --gpus-per-task=1 -J testjob -o %x-%j.out ./test/tester --type d --nb 2048 --dim 36864 --grid 1x1 --check n --ref n --origin h --target d --repeat 3 gemm