-
Notifications
You must be signed in to change notification settings - Fork 859
6.0.x Feature List
Luke Robison edited this page Apr 26, 2024
·
34 revisions
Target date - end CY24.
- Extended Accelerator API:
- CUDA support for IPC
- ZE support for IPC
- Switch over to forked PRRTe Phase 1
- Documentation Change
- Remove Remove prte binaries
- Remove --with-prte configure option from ompi
- BTL Self aware of accelerators
- Reduction op (and others) offload support (Joseph)
- Collectives:
- Merge XHC if they can commit to supporting it.
- Merge acoll once it passes CI
- smdirect won't be merged, salvage for parts.
- propose JSON format for tuning file
- Remove coll/sm (tuned is OK fallback, XHC/acoll coming soon)
- Performance testing of Luke's han alltoall pr with UCX.
- Remove:
- GINI BTL
- dredge_rcache
- Remove bpfs2 components
- Big Count:
- Collective embiggening Phase 1 (everything except
*v
*w
collectives)
- Collective embiggening Phase 1 (everything except
- Memory Kind support:
- Add memory-kind option
- Return supported memory kinds
- ROMIO Refresh
- Collective embiggening Phase 2 (
*v
*w
collectives) - Remove:
- Remove use TKR in MPI module for Fortran (old NAG)
- Big count support
- API level functions (in progress 1-2 months)
- Collective embiggening (discussed at F2F, stage in none v,w functions first)
- Changes to datatype engine/combiner support (could be a challenge)
- ROMIO refresh
- PRRTE switch Phase 1
- MPI_T events (probably won't do for 6.0.x).
- extended accelerator API functionality (IPC) and conversion of the last components to use accelerator API (DONE for ROCM, not CUDA or ZE).
- level zero (ze) accelerator component (DONE basic support, IPC not implemented, Howard)
- support for MPI 4.1 memory kinds info object (assume we have PRRTE move, 1 month for basic support)
- reduction op (and others) offload support (Joseph estimates 1-2 months to get in)
- SMSC accelerator (Edgar - not sure yet about this one for 6.0.x)
- Stream-aware datatype engine.
- BTL self issue (doesn't support accelerators currently). (Khawthar working on this)
- Datatype engine accelerator awareness(e.g. memcpy2d) (George).
What about smart pointers? Probably could not get this in to a 6.0.x.
- implement memory allocation kind info. (see above for accelerator features)
- GNI BTL - no longer have access to systems to support this (Howard)
- UDREG Rcache - no longer have access to systems that can use this (Howard)
- FS/PVFS2 an FBTL/PVFS2 - no longer have access to systems to support this (Edgar)
- coll/sm
- Remove TKR version of
use mpi
module. (Howard)- This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
- The RHEL 7.x default gcc (4.8.5) still uses the TKR
mpi
module - The NAG compiler still uses the TKR
mpi
module.
- The RHEL 7.x default gcc (4.8.5) still uses the TKR
- This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
- mca/coll: blocking reduction on accelerator (this is discussed above, Joseph)
- mca/coll: hierarchical MPI_Alltoall(v), MPI_Gatherv, MPI_Scatterv. (various orgs working on this)
- mca/coll: new algorithms (various orgs working on this)
There are quite a few open PRs related to collectives. Can some of these get merged? See notes from 2024 F2F Meeting
- Sessions - add support for UCX PML (Howard, 2-3 weeks)
- Sessions - various small fixes (Howard, 1 month)
- Atomics - can we just rely on C11 and remove some of this code? We are currently using gcc atomics for performance reasons. Joseph would like to have a wrapper for atomic types and direct load/store access.