Skip to content

6.0.x Feature List

Luke Robison edited this page Apr 26, 2024 · 34 revisions

Time Line

Target date - end CY24.

What to get done by end of Q2

  • Extended Accelerator API:
    • CUDA support for IPC
    • ZE support for IPC
  • Switch over to forked PRRTe Phase 1
    • Documentation Change
    • Remove Remove prte binaries
    • Remove --with-prte configure option from ompi
  • BTL Self aware of accelerators
  • Reduction op (and others) offload support (Joseph)
  • Collectives:
    • Merge XHC if they can commit to supporting it.
    • Merge acoll once it passes CI
    • smdirect won't be merged, salvage for parts.
    • propose JSON format for tuning file
    • Remove coll/sm (tuned is OK fallback, XHC/acoll coming soon)
    • Performance testing of Luke's alltoall pr with UCX.
  • Remove:
    • GINI BTL
    • dredge_rcache
    • Remove bpfs2 components

What to get done by end of Q3

  • Memory Kind support:
    • Add memory-kind option
    • Return supported memory kinds
  • Remove:
    • Remove use TKR in MPI module for Fortran (old NAG)

List of Features planned for the 6.0.x release stream

MPI 4.0 (critical):

  • Big count support
    • API level functions (in progress 1-2 months)
    • Collective embiggening (discussed at F2F, stage in none v,w functions first)
    • Changes to datatype engine/combiner support (could be a challenge)
    • ROMIO refresh
  • PRRTE switch Phase 1

MPI 4.0 (tentative):

  • MPI_T events (probably won't do for 6.0.x).

Accelerator support:

  • extended accelerator API functionality (IPC) and conversion of the last components to use accelerator API (DONE for ROCM, not CUDA or ZE).
  • level zero (ze) accelerator component (DONE basic support, IPC not implemented, Howard)
  • support for MPI 4.1 memory kinds info object (assume we have PRRTE move, 1 month for basic support)
  • reduction op (and others) offload support (Joseph estimates 1-2 months to get in)
  • SMSC accelerator (Edgar - not sure yet about this one for 6.0.x)
    • Stream-aware datatype engine.
  • BTL self issue (doesn't support accelerators currently). (Khawthar working on this)
  • Datatype engine accelerator awareness(e.g. memcpy2d) (George).

What about smart pointers? Probably could not get this in to a 6.0.x.

MPI 4.1:

  • implement memory allocation kind info. (see above for accelerator features)

Things to remove:

  • GNI BTL - no longer have access to systems to support this (Howard)
  • UDREG Rcache - no longer have access to systems that can use this (Howard)
  • FS/PVFS2 an FBTL/PVFS2 - no longer have access to systems to support this (Edgar)
  • coll/sm
  • Remove TKR version of use mpi module. (Howard)
    • This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
      1. The RHEL 7.x default gcc (4.8.5) still uses the TKR mpi module
      2. The NAG compiler still uses the TKR mpi module.

Collectives:

  • mca/coll: blocking reduction on accelerator (this is discussed above, Joseph)
  • mca/coll: hierarchical MPI_Alltoall(v), MPI_Gatherv, MPI_Scatterv. (various orgs working on this)
  • mca/coll: new algorithms (various orgs working on this)

There are quite a few open PRs related to collectives. Can some of these get merged? See notes from 2024 F2F Meeting

Random:

  • Sessions - add support for UCX PML (Howard, 2-3 weeks)
  • Sessions - various small fixes (Howard, 1 month)
  • Atomics - can we just rely on C11 and remove some of this code? We are currently using gcc atomics for performance reasons. Joseph would like to have a wrapper for atomic types and direct load/store access.
Clone this wiki locally