-
Notifications
You must be signed in to change notification settings - Fork 859
6.0.x Feature List
Wenduo Wang edited this page Aug 13, 2024
·
34 revisions
Target date - end CY24.
When should we plan to cut the 6.0.x branch? As late as possible, unless we are blocking 7.0 changes (ABI).
Strike through means feature is complete and committed to Open MPI main branch.
- Extended Accelerator API:
CUDA support for IPC- ZE support for IPC
- Switch over to forked PRRTe Phase 1
- Documentation Change
- Remove Remove prte binaries
- Remove --with-prte configure option from ompi
- Same MCAs
- BTL Self aware of accelerators
- Reduction op (and others) offload support (Joseph)
- Collectives:
Merge XHC if they can commit to supporting it.Merge acoll once it passes CIsmdirect won't be merged, salvage for parts.- propose JSON format for tuning file
Remove coll/sm (tuned is OK fallback, XHC/acoll coming soon)Performance testing of Luke's han alltoall pr with UCX.
- Remove:
GNI BTLudredge_rcacheRemove pvfs2 components
- Big Count:
- API-level function generation
Collective embiggening Phase 1 (everything except*v
*w
collectives)
- Phase 2 PRRTE
- MCA parameters move into ompi namespace.
- prte_info is gone, move those to ompi_info, perhaps a prte-mca option?
- Memory Kind support:
- Add memory-kind option
- Return supported memory kinds
- ROMIO Refresh
Collective embiggening Phase 2 (*v
*w
collectives)- Remove:
- Remove use TKR in MPI module for Fortran (old NAG)
- If Jacob's ABI work is ready, it might help solidify the standard to have our implementation done.
- Merge ABI work into main, enable it only when requested, and stress in documentation it is experimental.
- Big count support
- API level functions (in progress 1-2 months)
- Collective embiggening (discussed at F2F, stage in none v,w functions first)
- Changes to datatype engine/combiner support (could be a challenge)
- ROMIO refresh
- PRRTE switch Phase 1
- MPI_T events (probably won't do for 6.0.x).
- extended accelerator API functionality (IPC) and conversion of the last components to use accelerator API (DONE for ROCM, not CUDA or ZE).
- level zero (ze) accelerator component (DONE basic support, IPC not implemented, Howard)
- support for MPI 4.1 memory kinds info object (assume we have PRRTE move, 1 month for basic support)
- reduction op (and others) offload support (Joseph estimates 1-2 months to get in)
- SMSC accelerator (Edgar - not sure yet about this one for 6.0.x)
- Stream-aware datatype engine.
- BTL self issue (doesn't support accelerators currently). (Khawthar working on this)
- Datatype engine accelerator awareness(e.g. memcpy2d) (George).
What about smart pointers? Probably could not get this in to a 6.0.x.
- implement memory allocation kind info. (see above for accelerator features)
- GNI BTL - no longer have access to systems to support this (Howard)
- UDREG Rcache - no longer have access to systems that can use this (Howard)
- FS/PVFS2 an FBTL/PVFS2 - no longer have access to systems to support this (Edgar)
- coll/sm
- Remove TKR version of
use mpi
module. (Howard)- This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
- The RHEL 7.x default gcc (4.8.5) still uses the TKR
mpi
module - The NAG compiler still uses the TKR
mpi
module.
- The RHEL 7.x default gcc (4.8.5) still uses the TKR
- This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
- mca/coll: blocking reduction on accelerator (this is discussed above, Joseph)
- mca/coll: hierarchical MPI_Alltoall(v), MPI_Gatherv, MPI_Scatterv. (various orgs working on this)
- mca/coll: new algorithms (various orgs working on this)
There are quite a few open PRs related to collectives. Can some of these get merged? See notes from 2024 F2F Meeting
- Sessions - add support for UCX PML (Howard, 2-3 weeks)
- Sessions - various small fixes (Howard, 1 month)
- Atomics - can we just rely on C11 and remove some of this code? We are currently using gcc atomics for performance reasons. Joseph would like to have a wrapper for atomic types and direct load/store access.