UCC Version 1.1.0 - RC1
Pre-release
Pre-release
1.1.0
Features
API
- Added float 128 and float 32, 64, 128 (complex) data types
- Added Active Sets based collectives to support dynamic groups as well as point-to-point messaging
Core
- Config file support
- Fixed component search
CL
- Added split rail all reduce collective implementation
- Enable hierarchical alltoallv
- Fixed cleanup bugs
TL
- Added SELF TL supporting team size one
UCP
- Added service broadcast
- Added reduce_scatterv ring algorithm
- Added k-nomial based gather collective implementation
- Added one-sided get based algorithms
SHARP
- Fixed SHARP OOB
- Added SHARP broadcast
GPU Collectives (CUDA, NCCL TL and RCCL TL)
- Added support for CUDA TL (intranode collectives for NVIDIA GPUs)
- Added multiring allgatherv, alltoall in CUDA TL
- Added NCCL gather, scatter and its vector variant
- Enable using multiple streams for collectives
- Added support for RCCL gather (v), scatter (v), broadcast, allgather (v), barrier, alltoall (v) and all reduce collectives
- Added ROCm memory component
- Adapted all GPU collectives to executor design
Tests
- Added tests for triggered collectives in perftests
- Fixed bugs in multi-threading tests
Utils
- Added CPU model and vendor detection
- Several bug fixes in all components