-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20200107
Geoffrey Paulsen edited this page Jan 7, 2020
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoffrey Paulsen (IBM)
- Jeff Squyres (Cisco)
- Akshay Venkatesh (NVIDIA)
- Austen Lauria (IBM)
- Charles Shereda (LLNL)
- Josh Hursey (IBM)
- Joshua Ladd (Mellanox)
- Thomas Naughton (ORNL)
- Ralph Castain (Intel)
- Todd Kordenbrock (Sandia)
- Brian Barrett (AWS)
- Brendan Cunningham (Intel)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Michael Heinz (Intel)
- Noah Evans (Sandia)
- William Zhang (AWS)
- George Bosilca (UTK)
- Artem Polyakov (Mellanox)
- David Bernhold (ORNL)
- Edgar Gabriel (UH)
- Matthew Dosanjh (Sandia)
- Brandon Yates (Intel)
- Erik Zeiske
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Nathan Hjelm (Google)
- Xin Zhao (Mellanox)
- mohan (AWS)
-
Discuss PR 6821
- This would be the first PR with a submodule. Uses hwloc via submodules.
- Name of the hwloc component changed to hwloc2 (not hwloc20x)
- Question, there was some issues with submodule PR testing?
- Initially we had some issues due to bad git docs, and caught by PR CI.
- didn't get through checkout phase.
- All issues resolved now.
- Initially we had some issues due to bad git docs, and caught by PR CI.
-
Discuss Probot process 7260
- Delayed until next week.
-
Unprefixed Symbols from December:
- Ton of unprefixed symbols being spit out by MPI.
- OMPI, OPAL, ORTE that's ours.
- Everything that starts with MCA are in there as public symbols.
- Problem is if Another library reuses the mca system you hit this.
- Domain frameworks - adding mca components to a list for autoclosure, but sequencing of closing needs to be very specific.
- Want to strip out as it's causing problems.
- Might need this for sessions
- Ton of unprefixed symbols being spit out by MPI.
Blockers All Open Blockers
Review v3.0.x Milestones v3.0.4
Review v3.1.x Milestones v3.1.4
- New oops 3.0.x/3.1.x
- Issue 7212 - patcher issue for new compilers
- Jeff reviewed and merged other PRs, Fix merged in, but no testing yet.
- Two still PRs open.
- No schedule yet for 3.0.6 and 3.0.5. Based on RM availability.
- Possibly a configure test for pmix warning/error.
Review v4.0.x Milestones v4.0.3
- v4.0.3 in the works.
- Schedule: End of january.
- Need to confirm if we need PR 7149 for 4.0.3 with George
- There may be a new PMIx v3.1.5 in January, we could pickup for v4.0.3.
- We'll know next week
- Schedule: April 2020?
- It's official! Portland Oregon, Feb 17, 2020.
- Please register on Wiki page, since Jeff has to register you.
- Date looks good. Feb 17th right before MPI Forum
- 2pm monday, and maybe most of Tuesday
- Cisco has a portland facility and is happy to host.
- about 20-30 min drive from MPI Forum, will probably need a car.
Review Master Master Pull Requests
- There may be a new PMIx v3.1.5 in January, we could pickup for v4.0.3.
- We'll know next week
- PRRTE almost ready to merge, but need help with oshmem
- PR 7202 build logic working correctly.
- Possibly some glitches in PRRTE support, but in general working okay.
- OSHMEM is compiling, but is expecting ORTE to do something.
- oshrun of hello_oshmem.
- Some ORTEcall in oshrun possibly?
- It's the application that's crashing.
- Suspicion that MPI_Init is no longer calling ORTE_Init.
- Question: Things will work "the same" under SLURM with this PR?
- yes.
- How will startup performance differ?
- Don't expect any difference.
- Mellanox can take a look. Probably something pretty trivial.
- In this PR, mpirun may be a shell script?
- https://github.com/open-mpi/ompi/pull/7202/files#diff-5d429ec6c9d4d2c13ebc1e732eead2cc
- mpirun is a binary, perhaps old branch or something.
- PRRTE testing infrastructure
- Josh has been working on this, in IBM's virtual scale cluster.
- Hitting a number of issues in PRRTE.
- CI testing of PRRTE itself (without open-mpi).
- Two PRRTE CI tests:
- PMIx Hello World (passes depending on number of nodes)
- Submit like 100 jobs in single job.
- Once this is working might want to use this in Open MPI testing before moving submodule pointer.
- Note: Can also use PMIx client
- PR7202 looks good for building, but do we want to move from mpirun to prrte at same time, or wait until PRRTE testing is better.
- Submodule vs embedded is harder than the simpler embedding orte (or prrte).
- This coirdination between repos is hard.
- Especially because PMIx is embedded in Open-MPI but PRRTE is a submodule.
- Part of this PR makes pmix a static component, and calls PMIx directly.
- Could seperate the PMIx static component and direct PMIx calls to seperate PR.
- Once this settles down, track release branches instead of master.