The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Research initiative to provide GPU-centric networking through an OpenSHMEM-like interface. This intra-kernel networking library simplifies application code complexity and enables more fine-grained communication/computation overlap than traditional host-driven networking. rocSHMEM uses a single symmetric heap (SHEAP) that is allocated on GPU memories.
There are currently three backends for rocSHMEM; IPC, Reverse Offload (RO), and GPU-IB. The backends primarily differ in their implementations of intra-kernel networking. Currently, only the IPC backend is supported. The RO and GPU-IB backends are provided as-is with no guarantees of support from AMD or AMD Research.
The IPC backend implements communication primitives using load/store operations issued from the GPU.
The Reverse Offload (RO) backend has the GPU runtime forward rocSHMEM networking operations to the host-side runtime, which calls into a traditional MPI or OpenSHMEM implementation. This forwarding of requests is transparent to the programmer, who only sees the GPU-side interface.
The GPU InfiniBand (GPU-IB) backend implements a lightweight InfiniBand verbs interface on the GPU. The GPU itself is responsible for building commands and ringing the doorbell on the NIC to send network commands.
rocSHMEM base requirements:
- ROCm v6.2.2 onwards
- May work with other versions, but it has not been tested
- AMD GPUs
- MI250X
- MI300X
- ROCm-aware Open MPI and UCX as described in Building the Dependencies
rocSHMEM only supports HIP applications. There are no plans to port to OpenCL.
rocSHMEM uses the CMake build system. The CMakeLists file contains additional details about library options.
To create an out-of-source build for the IPC backend:
mkdir build
cd build
../scripts/build_configs/ipc_single
The build script passes configuration options to CMake to setup canonical builds.
There are other scripts in ./scripts/build_configs
directory but currently, only ipc_single
is supported.
By default, the library is installed in ~/rocshmem
. You may provide a
custom install path by supplying it as an argument. For example:
../scripts/build_configs/ipc_single /path/to/install
rocSHMEM is built as a library that can be statically
linked to your application during compilation using hipcc
.
During the compilation of your application, include the rocSHMEM header files and the rocSHMEM library when using hipcc. Since rocSHMEM depends on MPI you will need to link to an MPI library. The arguments for MPI linkage must be added manually as opposed to using mpicc.
When using hipcc directly (as opposed to through a build system), we
recommend performing the compilation and linking steps separately.
At the top of the examples files (./examples/*
),
example compile and link commands are provided:
# Compile
hipcc -c -fgpu-rdc -x hip rocshmem_allreduce_test.cc \
-I/opt/rocm/include \
-I$ROCSHMEM_INSTALL_DIR/include \
-I$OPENMPI_UCX_INSTALL_DIR/include/
# Link
hipcc -fgpu-rdc --hip-link rocshmem_allreduce_test.o -o rocshmem_allreduce_test \
$ROCSHMEM_INSTALL_DIR/lib/librocshmem.a \
$OPENMPI_UCX_INSTALL_DIR/lib/libmpi.so \
-L/opt/rocm/lib -lamdhip64 -lhsa-runtime64
If your project uses cmake, you may find the Using CMake with AMD ROCm page useful.
rocSHMEM has the following enviroment variables:
ROCSHMEM_HEAP_SIZE (default : 1 GB)
Defines the size of the rocSHMEM symmetric heap
Note the heap is on the GPU memory.
rocSHMEM is similar to OpenSHMEM and should be familiar to programmers who
have experience with OpenSHMEM or other PGAS network programming APIs in the
context of CPUs.
The best way to learn how to use rocSHMEM is to read the functions described in
headers in the dirctory ./include/rocshmem/
,
or to look at the provided example code in the ./example/
directory.
The examples can be run like so:
mpirun -np 2 ./build/examples/rocshmem_getmem_test
rocSHMEM is shipped with a functional and unit test suite for the supported rocSHMEM API.
They test Puts, Gets, nonblocking Puts,
nonblocking Gets, Quiets, Atomics, Tests, Wait-untils, Broadcasts, Reductions, and etc.
To run the tests, you may use the driver scripts provided in the ./scripts/
directory:
# Run Functional Tests
./scripts/functional_tests/driver.sh ./build/tests/functional_tests/rocshmem_example_driver short <log_directory>
# Run Unit Tests
./scripts/unit_tests/driver.sh ./build/tests/unit_tests/rocshmem_unit_tests all
rocSHMEM requires a ROCm-Aware Open MPI and UCX. Other MPI implementations, such as MPICH, should be compatible with rocSHMEM but it has not been thoroughly tested.
To build and configure ROCm-Aware UCX (1.17.0 or later), you need to:
git clone https://github.com/openucx/ucx.git -b v1.17.x
cd ucx
./autogen.sh
./configure --prefix=<prefix_dir> --with-rocm=<rocm_path> --enable-mt
make -j 8
make -j 8 install
Then, you need to build Open MPI (5.0.6 or later) with UCX support.
git clone --recursive https://github.com/open-mpi/ompi.git -b v5.0.x
cd ompi
./autogen.pl
./configure --prefix=<prefix_dir> --with-rocm=<rocm_path> --with-ucx=<ucx_path>
make -j 8
make -j 8 install
For more information on OpenMPI-UCX support, please visit: https://rocm.docs.amd.com/en/latest/how-to/gpu-enabled-mpi.html