Skip to content
rothpc edited this page May 18, 2012 · 2 revisions

FAQ

SHOC's configure script didn't detect CUDA/OpenCL/MPI properly, what's going on?

SHOC's configure script may not automatically find all of the packages it can possibly use on your system. You might see this during the configure step, or after you've finished configuring and are left wondering why it didn't build some particular versions of the benchmark programs (e.g., the OpenCL versions, or the MPI versions).

First, check the output from running the SHOC configure script. There are two places to look: the console output while your configure ran, and the config.log file that the configure script produced. If the console output suggests SHOC's configure script couldn't find a usable installation of a package such as CUDA, OpenCL, or MPI, look in the config.log file to see any error messages that may have been generated when the configure script was testing for that package.

SHOC's configure script expects to find some things in your PATH. For instance, if your system includes CUDA, it expects to find the nvcc program in your PATH. Since OpenCL uses a library-based approach, there is no OpenCL executable to test for, and so SHOC's configuration script has to be told where to find OpenCL headers and possibly also libraries using the CPPFLAGS and LDFLAGS variables when the configuration script is run. See examples of how to do this type of configuration in the shell scripts in the config directory of the SHOC distribution, such as the conf-linux-openmpi.sh file.

Another common configuration problem is that SHOC was unable to detect a working MPI installation. SHOC's current configure script needs to know the flags needed to compile and link MPI programs. There are two scripts in the config directory of the SHOC distribution that show examples of how to configure SHOC with MPICH2/MVAPICH2 (conf-linux-mpich2.sh) and OpenMPI (conf-linux-openmpi.sh). Currently, SHOC's configure script is not smart enough to detect all the MPI information it needs if you just type ./configure. In the future, we plan to improve the SHOC configure script to do more of the MPI detection automatically.

If all else fails and you still can't diagnose a configuration problem, send a question to [email protected]. Include the version numbers of relevant software such as SHOC, your compiler, CUDA and/or OpenCL, MPI (if any), and your operating system. Also, include the config.log file.

How can I use SHOC to do scaling studies on my GPU cluster?

SHOC uses MPI to run across a cluster. Here's a graph showing how Stencil2D scales on Keeneland, a cluster with 3 Tesla M2070 GPUs per node.

Stencil2D Weak Scaling on Keeneland

For this graph, I used weak scaling (the amount of work increases with the number of GPUs), and a large problem size. So, the ideal scaling would just be a flat horizontal line. For the most part, it looks pretty good, except for the abrupt jump in the beginning, where you move from 1 GPU to more than one.

To make this graph, I made a little perl script. Here's what it looks like:

#!/bin/perl
# Simple script to run Stencil benchmark over various sizes
# Make sure you have a directory called stencil for your results
@rawOutput = `mpirun -np 1 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 1,1     >./stencil/1.out`;
@rawOutput = `mpirun -np 3 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 1,3     >./stencil/3.out`;
@rawOutput = `mpirun -np 6 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 2,3     >./stencil/6.out`;
@rawOutput = `mpirun -np 9 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 3,3     >./stencil/9.out`;
# and so on ...

What are all these parameters? Ok, here's a quick list:

  • -np - the number of MPI processes
  • -s 4 - specifies large problem size
  • --num-iters - the number of iterations to execute the stencil kernel
  • --msize - specifies the topology of MPI processes (Stencil operates on a big 2D grid of data, this just specifies how that grid is divided among MPI processes)

After I ran the script, I found the name of the test I want. In this case, it's DP_Sten2D(median), the median execution time for the double precision version.

I did a little grepping, and ended up with a nice tab-separated value file for Excel.

$ cd stencil
$ grep "DP_Sten2D(median)" ./* >results.tsv

Where can I find typical performance results?

We're currently evaluating a new web frontend for SHOC results using Google's Public Data Explorer. See it here Let us know what you think.

Also, if you would like to contribute some results, please send a tarball with your results.csv and Logs folders to [email protected]

I have a dual I/O hub platform. How can I get the driver to use numactl for pinning?

We'll be adding pinning to the driver script in a future release. In the meantime, you can get the same benefits by changing the driver script. Here's an example for the HP SL390, where the desired numa mapping is (CPU 0 -> GPU 0, CPU 1 -> GPU 1, CPU 1 -> GPU 2).

Go into the driver.pl script, and find the buildCommand subroutine.

Start right after my $str, and change the rest of the routine to:

my $numa;
   # $_[1] is the device number
   if ($_[1] == 0) {
       $numa = 0;
   }
   else {
       $numa = 1;
   }

   if (getArg("-read-only")) {
       $str = "echo " . $_[0];
   }
   else {
       $str = "numactl --membind=$numa --cpubind=$numa "
     .  $bindir . "/Serial/"
     . $platformString
     . $_[0]
     . " -s $sizeClass -d "
     . $_[1] . " >"
     . buildFileName( $_[0], $_[1] ) . " 2>"
     . buildFileName( $_[0], $_[1] ) . ".err";
   }
     # print "Built command: $str \n";
   return $str;

How do I use SHOC on a Cray XK6 like the Titan system at Oak Ridge National Laboratory?

A Cray XK6 is similar to a stock Linux cluster with GPUs in some ways, but the differences mean that slight adaptations are needed to build and run SHOC on an XK6.

For building, we recommend using the PrgEnv-gnu programming environment. This environment provides the traditional 'cc' and 'CC' compiler driver scripts for C and C++, respectively, but these scripts use the GNU compilers instead of the PGI or Intel compilers. You may need to 'module swap' your PrgEnv module to use the GNU version of this module.

Configure SHOC using the configure driver script in $SHOC_HOME/config/conf-titan.sh. Check the contents of this script first to make sure it will configure SHOC the way you want it. Assuming you have the cuda module loaded when you compile, this script should configure SHOC to build both the OpenCL and CUDA versions of the benchmarks, in both serial and MPI modes.

Unlike some traditional Linux clusters that use batch queue software, when a job runs on an XK6 its job script runs on a service node, not one of the compute nodes. These service nodes don't have attached GPUs, so you will not be able to simply run one of the serial mode SHOC benchmark programs with something like:

$ ./BusSpeedDownload

Instead, even for serial programs, you must use the aprun command, e.g.:

$ cd bin/Serial/CUDA
$ aprun $PWD/BusSpeedDownload

SHOC's Perl driver script does work on Titan. Run it with something like:

$ cd tools
$ aprun perl $PWD/driver.pl -cuda -s3

For the MPI-mode parallel programs, you will want to ensure that you start one process per node (since that one process controls the one GPU attached to each node). Assuming you have allocated eight nodes (i.e., using size=128 in the PBS -l option):

$ cd bin/TP/CUDA
$ aprun -n 8 -N 1 $PWD/Stencil2D --msize 4,2 -s 3