AMD HPC Training Examples Repo

Welcome to AMD's HPC Training Examples Repo!

Here you will find a variety of examples to showcase the capabilities of AMD's GPU software stack. Please be aware that the repo is continuously updated to keep up with the most recent releases of the AMD software.

Repository Structure

Please refer to this table of contents to locate the exercises you are interested in sorted by topic.

HIP
1. HIP Functionality Checks
  1. query_device: checks that hipMemGetInfo works.
2. Basic Examples
  1. Stream_Overlap: this example shows how to share the workload of a GPU offload compation using several overlapping streams. The result is an additional gain in terms of time of execution due to the additional parallelism provided by the overlapping streams. README.
  2. dgemm: a (d)GEMM application created as an exercise to showcase simple matrix-matrix multiplications on AMD GPUs. README.
  3. basic_examples: a collection of introductory exercises such as device to host data transfer and basic GPU kernel implementation. README.
  4. hip_stream: modification of the STREAM benchmark for HIP. README.
  5. jacobi: distributed Jacobi solver, using GPUs to perform the computation and MPI for halo exchanges. README.
  6. matrix_addition: example of a HIP kernel performing a matrix addition.
  7. saxpy: example of a HIP kernel performing a saxpy operation. README.
  8. stencil_examples: examples stencils operation with a HIP kernel, including the use of timers and asyncronous copies.
  9. vectorAdd: example of a HIP kernel to perform a vector add. README.
  10. vector_addition_examples: another example of a HIP kernel to perform vector addition, including different versions such as one using shared memory, one with timers, and a CUDA one to try HIPIFY and hipifly tools on. The examples in this directory are not part of the HIP test suite.
3. CUDA to HIP Porting
  1. HIPIFY: example to show how to port CUDA code to HIP with HIPIFY tools. README.
  2. hipifly: example to show how to port CUDA code to HIP with hipifly tools. README.
4. HIP-Optimizations: a daxpy HIP kernel is used to show how an initial version can be optimized to improve performance. README.
5. HIPFort: a gemm example in Fortran using hipfort.
6. HIPStdPar: several examples showing C++ Std Parallelism on AMD GPUs. README.
7. HIP-OpenMP: example on HIP/OpenMP interoperability. README
MPI-examples
1. Benchmarks: GPU aware benchmarks (collective.cpp and pt2pt.cpp) to assess the performance of the communication libraries. README. NOTE: for more detailed instructions on how to run GPU aware MPI examples, see GPU_aware_MPI.
2. GhostExchange: slimmed down example of an actual physics application where the solution is initialized on a square domain discretized with a Cartesian grid, and then advanced in parallel using MPI communications. NOTE: detailed README files are provided here for the different versions of the GhostExchange_ArrayAssign code, that showcase how to use Omnitrace to profile this application.
ManagedMemory: programming model exercises, topics covered are APU programming model, OpenMP, performance protability frameworks (Kokkos and RAJA) and discrete GPU programming model. README.
MLExamples: a variation of PyTorch's MNIST example code and a smoke test for mpi4py using cupy. Instructions on how to run and test other ML frameworks are in the README.
Occupancy: example on modifying thread occupancy, using several variants of a matrix vector multiplication leveraging shared memory and launch bounds.
OmniperfExamples: several examples showing how to leverage Omniperf to perform kernel level optimization using HIP. NOTE: detailed READMEs are provided on each subdirectory. README.Video of Presentation.
Omniperf-OpenMP: example showing how to leverage Omniperf to perform kernel level optimization using Fortran and OpenMP. README.
Omnitrace
1. Omnitrace on Jacobi: Omnitrace used on the Jacobi solver example. README.
2. Omnitrace by Example: Omnitrace used on several versions of the Ghost Exchange example:
  1. OpenMP Version: READMEs available for each of the different versions of the example code. Video of Presentation.
  2. HIP Version: READMEs available for each of the different versions of the example code.
Pragma_Examples: OpenMP (in Fortran, C, and C++) and OpenACC examples. README.
Speedup_Examples: examples to show the speedup obtained going from a CPU to a GPU implementation. README.
atomics_openmp: examples on atomic operations using OpenMP.
Kokkos: runs the Stream Triad example with a Kokkos implementation. README.
Rocgdb: debugs the HPCTrainingExamples/HIP/saxpy example with Rocgdb.README. Video of Presentation.
Rocprof: uses Rocprof to profile HPCTrainingExamples/HIPIFY/mini-nbody/hip/. README.
Rocprofv3: uses Rocprofv3 to profile HPCTrainingExamples/HIP/jacobi/. README.
GPU_aware_MPI: OSU Mini Benchmarks with GPU aware MPI. README. Video of Presentation.
rocm-blog-codes: this directory contains accompany source code examples for select HPC ROCm blogs found at https://rocm.blogs.amd.com. README.
login_info
1. AAC: instructions on how to log in to the AMD Accelerator Cloud (AAC) resource. README.
Doc: directory with LaTeX and PDF documents that contain some of the most relevant README files properly formatted for ease of reading. The PDF document is obtained building the LaTeX document.

Run the Tests

Most of the exercises in this repo can be run as a test suite by doing:

git clone https://github.com/amd/HPCTrainingExamples && \
cd HPCTrainingExamples && \
cd tests && \
./runTests.sh

You can also run a subset of the whole test suite by specifying the subset you are interested in as an input to the runTests.sh script. For instance: ./runTests.sh --pytorch. To see a full list of the possible subsets that can be run: ./runTests.sh --help.

NOTE: tests can also be run manually from their respective directories, provided the necessary modules have been loaded and they have been compiled appropriately.

Feedback

We welcome your feedback and contributions, feel free to use this repo to bring up any issues or submit pull requests. The software made available here is released under the MIT license, more details can be found in LICENSE.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMD HPC Training Examples Repo

Repository Structure

Run the Tests

Feedback

About

Releases

Packages

Contributors 17

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 869 Commits
Doc		Doc
GPU_aware_MPI		GPU_aware_MPI
HIP-OpenMP		HIP-OpenMP
HIP-Optimizations/daxpy		HIP-Optimizations/daxpy
HIP		HIP
HIPFort/hipgemm		HIPFort/hipgemm
HIPIFY		HIPIFY
HIPStdPar/CXX		HIPStdPar/CXX
Kokkos		Kokkos
MLExamples		MLExamples
MPI-examples		MPI-examples
ManagedMemory		ManagedMemory
Occupancy		Occupancy
Omniperf-OpenMP/Fortran/1_collapse		Omniperf-OpenMP/Fortran/1_collapse
OmniperfExamples		OmniperfExamples
Omnitrace/omnitrace_jacobi		Omnitrace/omnitrace_jacobi
Pragma_Examples		Pragma_Examples
Rocgdb		Rocgdb
Rocprof		Rocprof
Rocprofv3		Rocprofv3
Speedup_Examples/rzf_training		Speedup_Examples/rzf_training
atomics_openmp		atomics_openmp
hipifly		hipifly
login_info/AAC		login_info/AAC
rocm-blogs-codes		rocm-blogs-codes
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

License

amd/HPCTrainingExamples

Folders and files

Latest commit

History

Repository files navigation

AMD HPC Training Examples Repo

Repository Structure

Run the Tests

Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 17

Languages

Packages