Productivity library for distributed and partitioned memory based on C++ Ranges.
Distributed Ranges is a C++ productivity library for distributed and partitioned memory based on C++ ranges. It offers a collection of data structures, views, and algorithms for building generic abstractions and provides interoperability with MPI, SHMEM, SYCL and OpenMP and portability on CPUs and GPUs. NUMA-aware allocators and distributed data structures facilitate development of C++ applications on heterogeneous nodes with multiple devices and achieve excellent performance and parallel scalability by exploiting local compute and data access.
In this model one can:
- create a distributed data structure that work with all our algorithms out of the box
- create an algorithm that works with all our distributed data structures out of the box
Distributed Ranges is a glue that makes this possible.
- Usage:
- Introductory presentation: Distributed Ranges, why you need it, 2024
- Article: Get Started with Distributed Ranges, 2023
- Tutorial: Distributed Ranges Tutorial
- Design / Implementation:
- Conference paper: Distributed Ranges, A Model for Distributed Data Structures, Algorithms, and Views, 2024
- Talk: CppCon 2023; Benjamin Brock; Distributed Ranges, 2023
- Technical presentation: Intel Innovation'23, 2023
- API specification
- Linux
- cmake >=3.20
- OneAPI HPC Toolkit installed
Enable OneAPI by:
source ~/intel/oneapi/setvars.sh
... or by:
source /opt/intel/oneapi/setvars.sh
... or wherever you have oneapi/setvars.sh
script installed in your system.
- CUDA
- OneAPI for NVIDIA GPUs plugin
When enabling OneAPI use --include-intel-llvm
option, e.g. call:
source ~/intel/oneapi/setvars.sh --include-intel-llvm
... instead of source ~/intel/oneapi/setvars.sh
.
All tests and examples can be build by:
CXX=icpx cmake -B build cmake --build build -- -j
Note
- Distributed Ranges library works in two models:
- Multi Process (based on SYCL and MPI)
- Single Process (based on pure SYCL)
On NVIDIA GPU only Multi Process model is currently supported.
To build multi-process tests call:
CXX=icpx cmake -B build -DENABLE_CUDA:BOOL=ON cmake --build build --target mp-all-tests -- -j
Run multi process tests:
ctest --test-dir build --output-on-failure -L MP -j 4
Run single process tests:
ctest --test-dir build --output-on-failure -L SP -j 4
Run all tests:
ctest --test-dir build --output-on-failure -L TESTLABEL -j 4
Two binaries are build for benchmarks:
- mp-bench - for benchmarking Multi-Process model
- sp-bench - for benchmarking Single-Process model
Here are examples of running single benchmarks.
Running GemvEq_DR strong scaling benchmark in Multi-Process model using two GPUs:
ONEAPI_DEVICE_SELECTOR='level_zero:gpu' I_MPI_OFFLOAD=1 I_MPI_OFFLOAD_CELL_LIST=0-11 \ mpiexec -n 2 -ppn 2 build/benchmarks/gbench/mp/mp-bench --vector-size 1000000000 --reps 50 \ --v=3 --benchmark_out=mp_gemv.txt --benchmark_filter=GemvEq_DR/ --sycl
Running Exclusive_Scan_DR weak scaling in Single-Process model using two GPUs:
ONEAPI_DEVICE_SELECTOR='level_zero:gpu' KMP_AFFINITY=compact \ build/benchmarks/gbench/sp/sp-bench --vector-size 1000000000 --reps 50 \ --v=3 --benchmark_out=sp_exclscan.txt --benchmark_filter=Exclusive_Scan_DR/ \ --weak-scaling --device-memory --num-devices 2
Check all options:
./build/benchmarks/gbench/mp/mp-bench --help # see google test options help ./build/benchmarks/gbench/mp/mp-bench --drhelp # see DR specific options
See Distributed Ranges Tutorial for a few well explained examples.
If your project uses CMAKE, add the following to your
CMakeLists.txt
to download the library:
find_package(MPI REQUIRED) include(FetchContent) FetchContent_Declare( dr GIT_REPOSITORY https://github.com/oneapi-src/distributed-ranges.git GIT_TAG main ) FetchContent_MakeAvailable(dr)
The above will define targets that can be included in your project:
target_link_libraries(<application> MPI::MPI_CXX DR::mpi)
See Distributed Ranges Tutorial for a live example of a cmake project that imports and uses Distributed Ranges.
Add below code to your main
function to enable logging.
If using Single-Process model:
std::ofstream logfile("dr.log"); dr::drlog.set_file(logfile);
If using Multi-Process model:
int my_mpi_rank; MPI_Comm_rank(MPI_COMM_WORLD, &my_mpi_rank); std::ofstream logfile(fmt::format("dr.{}.log", my_mpi_rank));
Example of adding custom log statement to your code:
DRLOG("my debug message with varA:{} and varB:{}", a, b);
Contact us by writing a new issue.
We seek collaboration opportunities and welcome feedback on ways to extend the library, according to developer needs.
- CONTRIBUTING
- Fuzz Testing
- Spec Editing - Editing the API document
- Print Type - Print types at compile time:
- Testing - Test system maintenance
- Security - Security policy
- Doxygen