Measuring GPU utilization one level deeper

This repository contains the supporting code for our paper Measuring GPU Utilization one level deeper. We present a comprehensive suite of CUDA benchmarks designed to identify and measure interference across various GPU resources.

Repository Structure

The codebase is organized into the following primary directories:

gpu_util_bench_lib/: A shared library containing CUDA kernels and helper functions for kernel launching
inter_sm/: Benchmarks for measuring interference and utilization across Streaming Multiprocessors (SM) (Paper section 4.1)
intra_sm/: Benchmarks for measuring interference and utilization within SMs (Paper section 4.2)
mm_pytroch/: Example demonstrating interference patterns on production ML kernels (Paper section 4.3)
pitfalls/: Examples illustrating common limitations in current interference prediction approaches (Paper section 3)

Requirements and Installation

Prerequisites

To benchmarks require the follwoing dependencies:

CMake (version >= 3.22)
C++17 or later
CUDA toolkit (validated with CUDA 12.5 and 12.6)
NVIDIA GPU driver (can be installed alongside CUDA toolkit)

Note: Our benchmarks currently do not support AMD GPUs.

Compilation Instructions

Determine your GPU's Compute Capability using nvidia-smi:

nvidia-smi --query-gpu=compute_cap --format=csv,noheader | head -n 1

Update the Compute Capability in CMakeLists.txt:

set(CMAKE_CUDA_ARCHITECTURES 90)  # Modify based on your GPU

Build the repository

mkdir build && cd build
cmake ..
cmake --build .

Running experiments

Benchmark Execution

Each directory contains detailed instructions for executing the benchmarks and reproducing paper experiments. The provided scripts are optimized for the H100 GPU. Users with different GPU architectures may need to adjust script parameters accordingly.

Important: Before running experiments, set the BUILD_DIR environment variable to match your build directory.

export BUILD_DIR=$HOME/gpu-util-interference/build # update based on location of your build directory

Performance Analysis

NCU Metrics Collection

To gather detailed performance metrics for isolated kernel execution, use the Nsight Compute Profiler. When profiling with NCU, specify mode=0 in the scripts:

ncu -f -o ncu.ncu-rep --set full <executable>

CUDA Trace Collection

For analyzing kernel co-location scenarios, we recommend collecting CUDA traces using the Nsight Systems Profiler to visualize kernel overlap patterns and verify concurrent execution.

nsys profile --force-overwrite true -o nsys.nsys-rep --trace cuda <executable>

Experimental Setup

Our paper's results were obtained using the following hardware configurations:

H100 NVL:
- CUDA version 12.5
- GPU driver version 555.42.06
- Nsight Compute version 2024.2.1.0
- Nsight Systems version 2024.2.3.38
GeForce RTX3090:
- CUDA version 12.6
- GPU driver version 560.35.03
- Nsight Compute version 2024.3.1.0
- Nsight Systems version 2024.4.2.133

Paper

If you use our benchmarks, please cite our paper:

@article{elvinger2025measuring,
  title={Measuring GPU utilization one level deeper},
  author={Elvinger, Paul and Strati, Foteini and Jerger, Natalie Enright and Klimovic, Ana},
  journal={arXiv preprint arXiv:2501.16909},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
gpu_util_bench_lib		gpu_util_bench_lib
inter_sm		inter_sm
intra_sm		intra_sm
mm_pytorch		mm_pytorch
pitfalls		pitfalls
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Measuring GPU utilization one level deeper

Repository Structure

Requirements and Installation

Prerequisites

Compilation Instructions

Running experiments

Benchmark Execution

Performance Analysis

NCU Metrics Collection

CUDA Trace Collection

Experimental Setup

Paper

About

Releases

Packages

Contributors 2

Languages

License

eth-easl/gpu-util-interference

Folders and files

Latest commit

History

Repository files navigation

Measuring GPU utilization one level deeper

Repository Structure

Requirements and Installation

Prerequisites

Compilation Instructions

Running experiments

Benchmark Execution

Performance Analysis

NCU Metrics Collection

CUDA Trace Collection

Experimental Setup

Paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages