Distributed Ranges

https://www.bestpractices.dev/projects/8975/badge

Productivity library for distributed and partitioned memory based on C++ Ranges.

About

Distributed Ranges is a C++ productivity library for distributed and partitioned memory based on C++ ranges. It offers a collection of data structures, views, and algorithms for building generic abstractions and provides interoperability with MPI, SHMEM, SYCL and OpenMP and portability on CPUs and GPUs. NUMA-aware allocators and distributed data structures facilitate development of C++ applications on heterogeneous nodes with multiple devices and achieve excellent performance and parallel scalability by exploiting local compute and data access.

Main strength of the library

In this model one can:

create a distributed data structure that work with all our algorithms out of the box
create an algorithm that works with all our distributed data structures out of the box

Distributed Ranges is a glue that makes this possible.

Documentation

Usage:
- Introductory presentation: Distributed Ranges, why you need it, 2024
- Article: Get Started with Distributed Ranges, 2023
- Tutorial: Distributed Ranges Tutorial
Design / Implementation:
- Conference paper: Distributed Ranges, A Model for Distributed Data Structures, Algorithms, and Views, 2024
- Talk: CppCon 2023; Benjamin Brock; Distributed Ranges, 2023
- Technical presentation: Intel Innovation'23, 2023
- API specification

Requirements

Linux
cmake >=3.20
OneAPI HPC Toolkit installed

Enable OneAPI by:

source ~/intel/oneapi/setvars.sh

... or by:

source /opt/intel/oneapi/setvars.sh

... or wherever you have oneapi/setvars.sh script installed in your system.

Additional requirements for NVIDIA GPUs

CUDA
OneAPI for NVIDIA GPUs plugin

When enabling OneAPI use --include-intel-llvm option, e.g. call:

source ~/intel/oneapi/setvars.sh --include-intel-llvm

... instead of source ~/intel/oneapi/setvars.sh.

Build and run

Build for Intel GPU/CPU

All tests and examples can be build by:

CXX=icpx cmake -B build
cmake --build build -- -j

Build for NVidia GPU

Note

Distributed Ranges library works in two models:

Multi Process (based on SYCL and MPI)
Single Process (based on pure SYCL)

On NVIDIA GPU only Multi Process model is currently supported.

To build multi-process tests call:

CXX=icpx cmake -B build -DENABLE_CUDA:BOOL=ON
cmake --build build --target mp-all-tests -- -j

Run tests

Run multi process tests:

ctest --test-dir build --output-on-failure -L MP -j 4

Run single process tests:

ctest --test-dir build --output-on-failure -L SP -j 4

Run all tests:

ctest --test-dir build --output-on-failure -L TESTLABEL -j 4

Run benchmarks

Two binaries are build for benchmarks:

mp-bench - for benchmarking Multi-Process model
sp-bench - for benchmarking Single-Process model

Here are examples of running single benchmarks.

Running GemvEq_DR strong scaling benchmark in Multi-Process model using two GPUs:

ONEAPI_DEVICE_SELECTOR='level_zero:gpu' I_MPI_OFFLOAD=1 I_MPI_OFFLOAD_CELL_LIST=0-11 \
mpiexec -n 2 -ppn 2  build/benchmarks/gbench/mp/mp-bench --vector-size 1000000000 --reps 50 \
--v=3 --benchmark_out=mp_gemv.txt --benchmark_filter=GemvEq_DR/ --sycl

Running Exclusive_Scan_DR weak scaling in Single-Process model using two GPUs:

ONEAPI_DEVICE_SELECTOR='level_zero:gpu' KMP_AFFINITY=compact \
build/benchmarks/gbench/sp/sp-bench --vector-size 1000000000 --reps 50 \
--v=3 --benchmark_out=sp_exclscan.txt --benchmark_filter=Exclusive_Scan_DR/ \
--weak-scaling --device-memory --num-devices 2

Check all options:

./build/benchmarks/gbench/mp/mp-bench --help  # see google test options help
./build/benchmarks/gbench/mp/mp-bench --drhelp  # see DR specific options

Examples

See Distributed Ranges Tutorial for a few well explained examples.

Adding Distributed Ranges to your project

If your project uses CMAKE, add the following to your CMakeLists.txt to download the library:

find_package(MPI REQUIRED)
include(FetchContent)
FetchContent_Declare(
  dr
  GIT_REPOSITORY https://github.com/oneapi-src/distributed-ranges.git
  GIT_TAG main
  )
FetchContent_MakeAvailable(dr)

The above will define targets that can be included in your project:

target_link_libraries(<application> MPI::MPI_CXX DR::mpi)

See Distributed Ranges Tutorial for a live example of a cmake project that imports and uses Distributed Ranges.

Logging

Add below code to your main function to enable logging.

If using Single-Process model:

std::ofstream logfile("dr.log");
dr::drlog.set_file(logfile);

If using Multi-Process model:

int my_mpi_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_mpi_rank);
std::ofstream logfile(fmt::format("dr.{}.log", my_mpi_rank));

Example of adding custom log statement to your code:

DRLOG("my debug message with varA:{} and varB:{}", a, b);

Contact us

Contact us by writing a new issue.

We seek collaboration opportunities and welcome feedback on ways to extend the library, according to developer needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Distributed Ranges

About

Main strength of the library

Documentation

Requirements

Additional requirements for NVIDIA GPUs

Build and run

Build for Intel GPU/CPU

Build for NVidia GPU

Run tests

Run benchmarks

Examples

Adding Distributed Ranges to your project

Logging

Contact us

See also

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Distributed Ranges

About

Main strength of the library

Documentation

Requirements

Additional requirements for NVIDIA GPUs

Build and run

Build for Intel GPU/CPU

Build for NVidia GPU

Run tests

Run benchmarks

Examples

Adding Distributed Ranges to your project

Logging

Contact us

See also