BabelStream

Measure memory transfer rates to/from global device memory on GPUs. This benchmark is similar in spirit, and based on, the STREAM benchmark [1] for CPUs.

Unlike other GPU memory bandwidth benchmarks this does not include the PCIe transfer time.

There are multiple implementations of this benchmark in a variety of programming models. Currently implemented are:

OpenCL
CUDA
OpenACC
OpenMP 3 and 4.5
Kokkos
RAJA
SYCL

This code was previously called GPU-STREAM.

Website

uob-hpc.github.io/BabelStream/

Usage

Drivers, compiler and software applicable to whichever implementation you would like to build against is required.

We have supplied a series of Makefiles, one for each programming model, to assist with building. The Makefiles contain common build options, and should be simple to customise for your needs too.

General usage is make -f <Model>.make Common compiler flags and names can be set by passing a COMPILER option to Make, e.g. make COMPILER=GNU. Some models allow specifying a CPU or GPU style target, and this can be set by passing a TARGET option to Make, e.g. make TARGET=GPU.

Pass in extra flags via the EXTRA_FLAGS option.

The binaries are named in the form <model>-stream.

Building Kokkos

We use the following command to build Kokkos using the Intel Compiler, specifying the arch appropriately, e.g. KNL.

../generate_makefile.bash --prefix=<prefix> --with-openmp --with-pthread --arch=<arch> --compiler=icpc --cxxflags=-DKOKKOS_MEMORY_ALIGNMENT=2097152

For building with CUDA support, we use the following command, specifying the arch appropriately, e.g. Kepler35.

../generate_makefile.bash --prefix=<prefix> --with-cuda --with-openmp --with-pthread --arch=<arch> --with-cuda-options=enable_lambda --compiler=<path_to_kokkos_src>/bin/nvcc_wrapper

Building RAJA

We use the following command to build RAJA using the Intel Compiler.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DCMAKE_BUILD_TYPE=ICCBuild -DRAJA_ENABLE_TESTS=Off

For building with CUDA support, we use the following command.

cmake .. -DCMAKE_INSTALL_PREFIX=<prefix> -DRAJA_PTR="RAJA_USE_RESTRICT_ALIGNED_PTR" -DRAJA_ENABLE_CUDA=1 -DRAJA_ENABLE_TESTS=Off

Results

Sample results can be found in the results subdirectory. If you would like to submit updated results, please submit a Pull Request.

Citing

Please cite BabelStream via this reference:

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. 2016. Paper presented at P^3MA Workshop at ISC High Performance, Frankfurt, Germany.

Other BabelStream publications:

Deakin T, McIntosh-Smith S. GPU-STREAM: Benchmarking the achievable memory bandwidth of Graphics Processing Units. 2015. Poster session presented at IEEE/ACM SuperComputing, Austin, United States. You can view the Poster and Extended Abstract.

Deakin T, Price J, Martineau M, McIntosh-Smith S. GPU-STREAM: Now in 2D!. 2016. Poster session presented at IEEE/ACM SuperComputing, Salt Lake City, United States. You can view the Poster and Extended Abstract.

Raman K, Deakin T, Price J, McIntosh-Smith S. Improving achieved memory bandwidth from C++ codes on Intel Xeon Phi Processor (Knights Landing). IXPUG Spring Meeting, Cambridge, UK, 2017.

Deakin T, Price J, Martineau M, McIntosh-Smith S. Evaluating attainable memory bandwidth of parallel programming models via BabelStream. International Journal of Computational Science and Engineering. Special issue (in press). 2017.

Deakin T, Price J, McIntosh-Smith S. Portable methods for measuring cache hierarchy performance. 2017. Poster sessions presented at IEEE/ACM SuperComputing, Denver, United States. You can view the Poster and Extended Abstract

[1]: McCalpin, John D., 1995: "Memory Bandwidth and Machine Balance in Current High Performance Computers", IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995.

Name		Name	Last commit message	Last commit date
Latest commit History 548 Commits
CL		CL
results		results
.gitignore		.gitignore
ACCStream.cpp		ACCStream.cpp
ACCStream.h		ACCStream.h
CHANGELOG.md		CHANGELOG.md
CUDA.make		CUDA.make
CUDAStream.cu		CUDAStream.cu
CUDAStream.h		CUDAStream.h
HC.make		HC.make
HCStream.cpp		HCStream.cpp
HCStream.h		HCStream.h
HIP.make		HIP.make
HIPStream.cpp		HIPStream.cpp
HIPStream.h		HIPStream.h
Kokkos.make		Kokkos.make
KokkosStream.cpp		KokkosStream.cpp
KokkosStream.hpp		KokkosStream.hpp
LICENSE		LICENSE
OCLStream.cpp		OCLStream.cpp
OCLStream.h		OCLStream.h
OMPStream.cpp		OMPStream.cpp
OMPStream.h		OMPStream.h
OpenACC.make		OpenACC.make
OpenCL.make		OpenCL.make
OpenMP.make		OpenMP.make
RAJA.make		RAJA.make
RAJAStream.cpp		RAJAStream.cpp
RAJAStream.hpp		RAJAStream.hpp
README.android		README.android
README.md		README.md
SYCL.make		SYCL.make
SYCLStream.cpp		SYCLStream.cpp
SYCLStream.h		SYCLStream.h
Stream.h		Stream.h
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BabelStream

Website

Usage

Building Kokkos

Building RAJA

Results

Citing

About

Releases

Packages

Languages

License

ronlieb/BabelStream

Folders and files

Latest commit

History

Repository files navigation

BabelStream

Website

Usage

Building Kokkos

Building RAJA

Results

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages