-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20191203
Jeff Squyres edited this page Dec 3, 2019
·
2 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Jeff Squyres (Cisco)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- Ralph Castain (Intel)
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- David Bernholdt (ORNL)
- George Bosilca (UTK)
- Joshua Ladd (Mellanox)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (Mellanox)
- Austen Lauria (IBM)
- Brandon Yates (Intel)
- Brendan Cunningham (Intel)
- Brian Barrett (AWS)
- Charles Shereda (LLNL)
- David Bernholdt (ORNL)
- Edgar Gabriel (UH)
- Erik Zeiske (HPE)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Geoffrey Paulsen (IBM)
- George Bosilca (UTK)
- Jeff Squyres (Cisco)
- Josh Hursey (IBM)
- Joshua Ladd (Mellanox)
- Mark Allen (IBM)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Intel)
- mohan (AWS)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Ralph Castain (Intel)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- William Zhang (AWS)
- Xin Zhao (Mellanox)
Review Milestones v3.0.6
Just released v3.0.5.
Bar is now a bit higher to accept PRs into v3.0.x. We should be targeting master/v4.0.x these days.
Review Milestones v3.1.6
Just released v3.1.5.
Bar is now a bit higher to accept PRs into v3.1.x. We should be targeting master/v4.0.x these days.
Review Milestones v4.0.3
PRs:
- Not a ton happening because of past release, SC, and Thanksgiving.
- 7117: IPv6. Waiting on reply.
- 7151: small enough enhancement that was ok into v4.0.x.
- But adding JSM launch support seems like a large enough feature that it should wait for vNEXT.
- "Pepto bismal" label (target v4.1.x): these are parked right now -- they are new enhancements/features on the v4.0.x. These new enhancements/features are hopefully not ever going to be applied to v4.0.x -- but if v5.0.x is delayed, we may need to have a conversation.
- Target late Jan for v4.0.3 -- some fixes that have been found post v4.0.2.
Open question: what do we want to do about COMM_SPAWN problems in v4.0.x? There's a bunch of them (~10 or so) from the mailing list and the issue tracker. E.g., #6962, #7094, #6902, ...
A bunch are issues with hostfile issues with spawn (e.g., "too many resources for the slots you have"), similar info key issues, etc.
Discussion of difficulty testing for COMM_SPAWN. Ralph suggests that -- in the Python MTT -- they spin up PRRTE and then do all their tests under PRRTE (including COMM_SPAWN tests).
- PR 7174: OFI MTL issue: need AWS to think about this and make sure it's ok.
- SC meeting PRRTE vs. ORTE:
- Come to conclusion that removing ORTE and replacing it with PRRTE would be a good thing. Let's move ahead with it.
- Ralph/Gilles started #7202:
- makes PMIx 1st-class citizen,
- PMIx symbols are exported / available for users to call in their application.
- NOTE: This is not a regression -- even with v3.x/v4.x, if you try to run an app with a different version of external PMIx than OMPI is compiled with, kaboom.
- removes ORTE,
- put in embedded PRRTE (i.e., replace mpirun/mpiexec).
- Removed all PMI-1 and PMI-2 support.
- Did leave PMIx framework in OPAL -- it's now static (selecting internal vs. external).
- Aiming for end of Dec / Jan-ish before it's done.
- Several of these items are NEWS-worthy.
- makes PMIx 1st-class citizen,
- There's currently a problem with this branch an UCX PML: https://github.com/open-mpi/ompi/issues/6982. Ralph mentions that this will be a problem when we bring in PMIx as a 1st-class citizen.
- Git submodule?
- Github bots (lockbot, etc.)?
- Have no information because Brian is the one driving for these things.
- Need some review on the reachable stuff: PRs 7167 and 7134.
- Want to have a custom tuning collectives tuning file for EFA.
What's the best way to do that?
- Custom file for coll/tuned.
- Just ship it in install etc dir and load it in default MCA param file.
- How to detect EFA NIC and use that config file automatically?
- George wants to think about this.
- George, William, and Jeff discussed this a bit. William will
investigate something along the lines of:
- Change the default value of the MCA param of the coll/tuned decision filename to be some sentinel value (e.g., empty string).
- Move the reading of that MCA param to a later point in time (compared to when it is called today)
- If the value is the sentinel value, call some hooks to other routines to see if they want to supply a decision filename (e.g., if an EFA hook has been registered, it can look to see if an EFA NIC is available, and if so, return an EFA decision filename). A default hook should probably also be available that is always called last / lowest priority / whatever that supplies the default decision file is no other file was provided by any other hook.
- Need to think about how to register hooks (e.g., this only applies to coll/tuned -- so it may not be suitable for coll/base...?).
- It's official! Portland Oregon, Feb 17, 2020.
- Safe to begin booking travel now.
- Please register on Wiki page, since Jeff has to register you.
- Date looks good. Feb 17th right before MPI Forum
- 2pm monday, and maybe most of Tuesday
- Cisco has a portland facility and is happy to host.
- But willing to step asside if others want to host.
- about 20-30 min drive from MPI Forum, will probably need a car.