This is an implementation of BabelStream in Java 8 which contains the following implementations:
jdk-plain
- Single threadedfor
jdk-stream
- Threaded implementation using JDK8's parallel stream APItornadovm
- A TornadoVM implementation for PTX/OpenCLaparapi
- A Aparapi implementation for OpenCL
Prerequisites
- JDK >= 8
To run the benchmark, first create a binary:
> cd java-stream
> ./mvnw clean package
The binary will be located at ./target/java-stream.jar
. Run it with:
> java -version ✔ 11.0.11+9 ☕ tom@soraws-uk 05:03:20
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment GraalVM CE 21.1.0 (build 11.0.11+8-jvmci-21.1-b05)
OpenJDK 64-Bit Server VM GraalVM CE 21.1.0 (build 11.0.11+8-jvmci-21.1-b05, mixed mode)
> java -jar target/java-stream.jar --help
For best results, benchmark with the following JVM flags:
-XX:-UseOnStackReplacement # disable OSR, not useful for this benchmark as we are measuring peak performance
-XX:-TieredCompilation # disable C1, go straight to C2
-XX:ReservedCodeCacheSize=512m # don't flush compiled code out of cache at any point
Worked example:
> java -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:ReservedCodeCacheSize=512m -jar target/java-stream.jar
BabelStream
Version: 3.4
Implementation: jdk-stream; (Java 11.0.11;Red Hat, Inc.; home=/usr/lib/jvm/java-11-openjdk-11.0.11.0.9-4.fc33.x86_64)
Running all 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Function MBytes/sec Min (sec) Max Average
Copy 17145.538 0.03131 0.04779 0.03413
Mul 16759.092 0.03203 0.04752 0.03579
Add 19431.954 0.04144 0.05866 0.04503
Triad 19763.970 0.04075 0.05388 0.04510
Dot 26646.894 0.02015 0.03013 0.02259
If your OpenCL/CUDA installation is not at the default location, TornadoVM and Aparapi may fail to detect your devices. In those cases, you may specify the library directly, for example:
> LD_PRELOAD=/opt/rocm-4.0.0/opencl/lib/libOpenCL.so.1.2 java -jar target/java-stream.jar ...
The TornadoVM implementation requires you to run the binary with a patched JVM. Follow the official instructions or use the following simplified instructions:
Prerequisites
- CMake >= 3.6
- GCC or clang/LLVM (GCC >= 5.5)
- Python >= 2.7
- Maven >= 3.6.3
- OpenCL headers >= 1.2 and/or CUDA SDK >= 9.0
First, get a copy of the TornadoVM source:
> cd
> git clone https://github.com/beehive-lab/TornadoVM tornadovm
Take note of the required GraalVM version
in tornadovm/assembly/src/docs/10_INSTALL_WITH_GRAALVM.md
. We'll use 21.1.0
in this example.
Now, obtain a copy of GraalVM and make sure the version matches the one required by TornadoVM:
> wget https://github.com/graalvm/graalvm-ce-builds/releases/download/vm-21.1.0/graalvm-ce-java11-linux-amd64-21.1.0.tar.gz
> tar -xf graalvm-ce-java11-linux-amd64-21.1.0.tar.gz
Next, create ~/tornadovm/etc/sources.env
and populate the file with the following:
#!/bin/bash
export JAVA_HOME=<path to GraalVM 21.1.0 jdk>
export PATH=$PWD/bin/bin:$PATH
export TORNADO_SDK=$PWD/bin/sdk
export CMAKE_ROOT=/usr # path to CMake binary
Proceed to compile TornadoVM:
> cd ~/tornadovm
> . etc/sources.env
> make graal-jdk-11-plus BACKEND={ptx,opencl}
To test your build, source the environment file:
> source ~/tornadovm/etc/sources.env
> LD_PRELOAD=/opt/rocm-4.0.0/opencl/lib/libOpenCL.so.1.2 tornado --devices
Number of Tornado drivers: 1
Total number of OpenCL devices : 3
Tornado device=0:0
AMD Accelerated Parallel Processing -- gfx1012
Global Memory Size: 4.0 GB
Local Memory Size: 64.0 KB
Workgroup Dimensions: 3
Max WorkGroup Configuration: [1024, 1024, 1024]
Device OpenCL C version: OpenCL C 2.0
Tornado device=0:1
Portable Computing Language -- pthread-AMD Ryzen 9 3900X 12-Core Processor
Global Memory Size: 60.7 GB
Local Memory Size: 8.0 MB
Workgroup Dimensions: 3
Max WorkGroup Configuration: [4096, 4096, 4096]
Device OpenCL C version: OpenCL C 1.2 pocl
Tornado device=0:2
NVIDIA CUDA -- NVIDIA GeForce GT 710
Global Memory Size: 981.3 MB
Local Memory Size: 48.0 KB
Workgroup Dimensions: 3
Max WorkGroup Configuration: [1024, 1024, 64]
Device OpenCL C version: OpenCL C 1.2
You can now use TornadoVM to run java-stream:
> tornado -jar ~/java-stream/target/java-stream.jar --impl tornadovm --arraysize 65536 1 ✘ 11.0.11+9 ☕ tom@soraws-uk 05:31:34
BabelStream
Version: 3.4
Implementation: tornadovm; (Java 11.0.11;GraalVM Community; home=~/graalvm-ce-java11-21.1.0)
Running all 100 times
Precision: double
Array size: 0.5 MB (=0.0 GB)
Total size: 1.6 MB (=0.0 GB)
Using TornadoVM device:
- Name : NVIDIA GeForce GT 710 CL_DEVICE_TYPE_GPU (available)
- Id : opencl-0-0
- Platform : NVIDIA CUDA
- Backend : OpenCL
Function MBytes/sec Min (sec) Max Average
Copy 8791.100 0.00012 0.00079 0.00015
Mul 8774.107 0.00012 0.00061 0.00014
Add 9903.313 0.00016 0.00030 0.00018
Triad 9861.031 0.00016 0.00030 0.00018
Dot 2799.465 0.00037 0.00056 0.00041