-
Notifications
You must be signed in to change notification settings - Fork 14
Howto
- caldgemm: DGEMM Library
- hpl: Our version of hpl. It is called HPL-GPU. Based on hpl 2.0. Customized
- adl3: Tool to set AMD powertune
- lib: Special OpenCL library from AMD for HPL
- tmp/amd-driver-installer-14.50.2.-1006: Special AMD Driver
- memtest: My DMA bandwdith test
- .bashrc: Setup file for environment
In general, CALDGEMM and HPL-GPU should work on any Linux distribution. For this Howto, we assume an OpenSuSE 13.2 setup with minimal installation as baseline.
During this howto, we will need certain software from the standard OpenSuSE repository. Not all of these software packages will be required, depending on which path you follow (which BLAS library, which GPU backend, and which optional steps you need). The requirements are:
- gcc-c++ (For compilation)
- rpm-build (For building AMD driver package)
- python (For adl3/atitweak utility)
- gcc-fortran, mpc-devel, mpfr-devel, gmp-devel (For AMD ACML)
- xdm (Display manager for headless X-Server)
- nano (Text editor)
To just install all these packages, so you do not need to bother them it during the howto, run as root:
zypper install gcc-c++ gcc-fortran rpm-build python xdm nano mpc-devel mpfr-devel gmp-devel
We will install HPL for the user called hpluser
. Make sure that this user exists. We will assume its home directory in /home/hpluser
, and we assume the $HOME
environment variable pointing to its home directory.
As a next step, we will have to set up the environment. We will setup environment variables pointing to all the software packages (these are needed by the build process), some environment variables related to GPUs and to the X server, and we set certain ulimit
values correctly to allow large memory allocation.
- First, we need to create a private library path, which we use to preload libraries by putting this first in
$LD_LIBRARY_PATH
. We also need a temp directory for downloads, etc.
mkdir $HOME/lib $HOME/tmp
- For allowing large memory allocations, we will have to edit the
ulimit
limits for the non-root userhpluser
. For this, we edit/etc/security/limits.conf
- Add:
drohr - memlock unlimited (drohr is the username)
- Next, we download the HPL-GPU and CALDGEMM software (because they contain some scripts required here). You can either install the latest release (by cloning the respective tag from the git repository, or by downloading the file) or you can check out the current
master
branch, for the very newest version. Themaster
branch in the repository should usually be stable, while thetest
branch is used for development.- Downloading the files of the latest release: In this case please unpack the files to $(HOME) and create symlinks
$HOME/caldgemm
and$HOME/hpl-gpu
pointing to the versioned directories. - For cloning the master repository, do
- Downloading the files of the latest release: In this case please unpack the files to $(HOME) and create symlinks
cd $HOME git clone https://github.com/davidrohr/caldgemm.git git clone https://github.com/davidrohr/hpl-gpu.git
-
CALDGEMM comes with an example script for the environment: caldgemm_setenv.sh.sample. If you leave all directories as they are in this howto, you can just use this script as is. Otherwise, please change the script accordingly.
- To bring the script in place:
cp $(HOME)/caldgemm/environment/caldgemm_setenv.sh.sample $(HOME)/caldgemm_setenv.sh´
- No, you can set the environment via:
source $(HOME)/caldgemm_setenv.sh
* If one of the `ulimit` commands in the script fails, you have properly not set up `/etc/security/limits.conf` properly as explained above.
* If you want to have the proper environment available upon login as `hpluser`, add `source $(HOME)/caldgemm_setenv.sh` to `/home/hpluser/.bashrc`.
* This script will modify your `DISPLAY` variable to enable a headless X setup. This will break SSH X-forwarding. Please refer to [[Headless System with X Server]] for details.
In case of an AMP GPU, download the GPU driver from http://support.amd.com/de-de/download. For FirePro GPU cards, you need driver version 14.502.x or newer. Download this driver to /home/hpluser/tmp
. In the example, the driver file is called amd-driver-installer-14.502-150406a-182396E-Retail_End_User-x86.x86_64.run
.
- Build the proper RPM package for the SuSE distribution. You can get a list of all packages with the
--listpkg
option:
cd $(HOME)/tmp ./amd-driver-installer-14.502-150406a-182396E-Retail_End_User-x86.x86_64.run --listpkg
- From the listed packages, we select select
SuSE/SUSE132-AMD64
for OpenSuSE 13.2 and build the RPMs:
./amd-driver-installer-14.502-150406a-182396E-Retail_End_User-x86.x86_64.run --buildpkg SuSE/SUSE132-AMD64
- Now, we can install the driver:
zypper install --force fglrx*.rpm
- For AMD GPUs, we have to create a proper X-config and set a variable to allow large OpenCL buffers:
aticonfig --initial --adapter=ALL aticonfig --set-pcs-u32=MCIL,VmClientAddressSpaceGB,512
The setting in the last call should match the memory of your machine, i.e. the above line is for a server with 512 GB.
- Now, we load the kernel module, and check whether it detected the GPUs:
modprobe fglrx dmesg | grep fglrx
The last command should show something like:
[1437512.156586] <6>[fglrx] module loaded - fglrx 14.50.2 [Apr 6 2015] with 8 minors
This step is required for the following cases:
- If you want to use CAL as GPU backend.
- If you want to use the
adl3 / atitweak
utility to set AMD GPU's powertune feature. - If you want to use the aticonfig utility to monitor GPU clocks and temperatures on AMD GPUs.
A headless X Setup is a setup where the server runs an X-Server with one screen per GPU, but the user does not log into the X server but remotely via SSH. Sometimes, the X-Server handles certain GPU features, and in that case a running X server is needed, even though X itself is not used. Details can be found in the Headless System with X Server entry.
For a headless X Setup, we need to perform the following actions:
- Create a proper X Config. (We already did this while installing the GPU driver in the previous part via
aticonfig --initial --adapter=ALL
. - Set
xdm
as Display Manager and disable some security features such that a user that logs in remotely can access the X server:- Edit:
/etc/X11/xdm/xdm-config
- Change:
- Edit:
DisplayManager._0.authorize: false (Set to false) * Edit: `/etc/sysconfig/displaymanager` * Change:DISPLAYMANAGER="xdm" DISPLAYMANAGER_REMOTE_ACCESS="yes" DISPLAYMANAGER_ROOT_LOGIN_REMOTE="yes" DISPLAYMANAGER_XSERVER_TCP_PORT_6000_OPEN="yes"
- Now, you can start the X server via
rcxdm start
and stop it with
rcxdm stop
- After the startup, you'll have to wait a certain time for the X server to come up. You can trace the X-log via
tail -f /var/log/Xorg.0.log
until you see the following lines, which indicate that the GPUs are ready (one line per GPU in the server):
[1437157.875] (II) fglrx(0): Restoring Recent Mode via PCS is not supported in RANDR 1.2 capable environments [1437157.875] (II) fglrx(1): Restoring Recent Mode via PCS is not supported in RANDR 1.2 capable environments ...
This utility can be used to set the powertune level of AMD GPUs. It requires python and an X server. You can get it from github: https://github.com/mjmvisser/adl3:
cd $HOME git clone https://github.com/mjmvisser/adl3.git
To verify that atitweak works, with a running X server, please execute:
$HOME/adl3/atitweak -k
In order to obtain best performance in DGEMM (and hence in HPL) on AMD GPUs, you have to set powertune to raise the GPUs TDP (see HPL Tuning). Please be aware, that this could damage your hardware, because it might rise the TDP beyond the specifications of the hardware. AMD GPUs offer a certain range in which powertune can be set. Usually it is -20% to 20% or -50% to 50%. To set powertune to 50%, please run:
$HOME/adl3/atitweak -p 50
Please be aware that similar TDP limitations hold true for NVIDIA GPUs.
Login as drohr:
drohr@linux-zsrp:~> cd caldgemm/ drohr@linux-zsrp:~/caldgemm> cd amd_dgemm_hawai drohr@linux-zsrp:~/caldgemm/amd_dgemm_hawai> make (build AMD DGEMM kernel) drohr@linux-zsrp:~/caldgemm/amd_dgemm_hawai> cd .. drohr@linux-zsrp:~/caldgemm> cp config_options.sample config_options.mak drohr@linux-zsrp:~/caldgemm> cp caldgemm_config.sample caldgemm_config.h
-
Edit
config_options.mak
: Enable OpenCL, Disable CAL and CUDA, CONFIGURED=1 -
Build CALDGEMM
drohr@linux-zsrp:~/caldgemm> make
drohr@linux-zsrp:~/caldgemm> cd .. drohr@linux-zsrp:~> cd hpl drohr@linux-zsrp:~/hpl> ln -s ../caldgemm drohr@linux-zsrp:~/hpl> cp setup/Make.Generic* .
- Edit Make.Generic.Options (HPL configuration file with tuning parameters)
- Disable HPL_CONFIG_MPI=0 in Make.Generic.Options
drohr@linux-zsrp:~/hpl> ./build.sh
HPL Runtime Configuration is in hpl/bin/Generic/HPL.dat
- In
HPL.dat
set: - NBs set to 1920
- Ns set to: (Ns * Ns * 8) must smaller than system memory, Ns should be multiple of NBs
- Example: 60 * 1920 = 115200, 115200 * 115200 * 8 = 106 168 320 000 --> 106 GB, OK for 128 GB of system memory, use 115200
- LOOKAHEADs: set to 2
Login as root:
- (start x server)
rcxdm start
- wait for 10 seconds
/home/drohr/adl3/atitweak -p 30
(Will increase TDP by 30%, can set between 0 and 50)
rcxdm stop
(You cannot start monitoring, while HPL is running!) Login as root
rcxdm start while true; do clear && aticonfig --odgc --odgt --adapter=ALL && sleep 2; done
Login as drohr
drohr@linux-zsrp:~> cd hpl/bin/Generic/ drohr@linux-zsrp:~/hpl/bin/Generic> ./xhpl
- You can see the current performance during the run in the line that ends with "System Gflops *******", where ******* is the performance
- Performance result in line: WC26L2C64 ................................ *******
- Verification: IT MUST PRINT "PASSED", otherwise computational error
Add user drohr on new system Copy /home/drohr to new system If username is not drohr: Change variable paths in .bashrc (from /home/drohr to /home/[new_user])
- DMA and memory bandwidth
- CALDGEMM Performance Optimization Guide (CAL OpenCL without GPU_C)
- CALDGEMM Performance Optimization Guide (OpenCL CUDA)
- Thread to core pinning in HPL and CALDGEMM
- Important HPL GPU / CALDGEMM options
Tools / Information
- Analysis Plots of HPL GPU Runs
- Headless System with X Server
- Heterogeneous cluster with different node types
- HPL Compile Time Options
- Catalyst Driver Patch
Reference