-
Notifications
You must be signed in to change notification settings - Fork 859
WeeklyTelcon_20171017
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen (IBM)
- Jeff Squyres
- Edgar Gabriel
- Josh Hursey
- Joshua Ladd
- Mohan
- Ralph
- Thomas Naughton
- Artem Polyakov
- Todd Kordenbrock
- Brian
- David Bernholdt
- Geoffroy Vallee
- George
- Howard
- pool Framework - 4283
- someone freed hostname (wrong)
- proposed strduping hostname which is done many times.
- proposed a string pool framework to reference count strings.
- PMIx already has ability to store strings, and return a pointer to it, others can't free.
- PMIx if shared memory, it's provided over shmem, if not shared mem, each process has 1 copy.
- We've survived for years without this. Seems to be just a single issue.
- Sounds like it might not be applicable across the rest of the code-base.
- PMIx pages could be made read-only pages.
- Overengineering? Perhaps we should add valgrind tests?
Review All Open Blockers
Review v2.0.x Milestones v2.0.4
- Two issues left:
- lets make hwloc fail if compiling external and it's hwloc 2.0.0 or later on OMPI v2.0.x and v2.1.x
- Bring PMIX v1.2.4 back to v2.0.x ?? Basically bug fixes.
- Most value is automate 1.13 patch (helps freeBSD or something?)
- Issue DDT is broken on v2.x - asking if IBM resolved internally already, if could get that back.
- Schedule: if we get PRs in today, we should aim to get v2.0.x release NEXT week.
Review v2.x Milestones v2.1.2
-
v2.1.3 (unscheduled, but probably jan 19, 2018)
- PR4172 - a mix between feature / bugfix.
-
Are we going to do anything for v2.x for hwloc 2?
- At least put in a configure error if detects hwloc v2.x
-
HWLoc is about to release v2.0
- If topology info comes in from outside, what hwloc was that resource manager using?
- Is the XML annotated with which version of hwloc generated it?
- would be nice to gracefully fail, since fairly opaque.
- Seems like we'll need a rosetta stone for
- HWLOC is a static framework.
- Brice is going to get HWLOC by super computing, but it might be tight.
- Are we comfortable releasing with an alpha/beta version of HWLOC imbedded.
- Jeff: at a minimum, we should get a beta quality version of hwloc to imbed.
- OMPI 2.x will not work with HWLOC 2.0, because Changed APIs.
- May want some configure errors (not in there yet)
- 3.0 only works with older hwloc pre-2.0. In v3.0.x if it's hwloc 2.0, we error at configure.
- in 3.1 branch external hwloc allows either hwloc 2.0 or older hwloc, but must decide at build time.
- Still have to run 3.1 everywhere.
- Do we want to backport the hwloc 2.0 support to v3.0?
- Since we're closing the door to v1.x and v2.x, that might a good support statement
Review v3.0.x Milestones v3.0
- v3.0.1 - Opened the branch for bugfixes Sep 18th.
- Still targeting End of October for release of v3.0.1
- Everything ready to push has been.
- a few PRs need review.
- Schedule:
- Originally was scheduling for this week, but
- Edgar has two open Issues, both fairly important:
- PR to master - already pending.
- NFS problem reported on mailing list. - coded, but not yet tested, and more worried. (4346 4334)
- Issue 3904 - only milestone for 3.0.x filed by edgar.
- Thought this was merged to v3.0.x anyway.
- Iterating a bit on disabling cuda inside of hwloc 4249 PR on this branch.
- Issue 4248 - disabling cuda on hwloc
- On all existing release branches, do -cuda=no for hwloc configury.
- Been merged into v2.x but not v3.0.x
Review v3.1.x Milestones v3.1](https://github.com/open-mpi/ompi/milestone/27)
-
v3.1.x - currently has hwloc 2.0 alpha
- Could roll-back to hwloc 1.11.7 - has some perf issues on KNL
- Could delay v3.1.x to mid-december to release.
- Could ship both hwloc 1.11.7 by default, but also ship a hwloc 2.0 alpha component that would have to be explicitly requested at configure time.
- Some strong objections to shipping other parties alpha/non-released software.
- Could support this with an external component, and a blurb in the README of new feature and how to use use external component.
- This would leave the hwloc 2.0 enhancements in OMPI, but back down the hwloc version to v1.11.7
- Making a new component in v3.1 and backing down version to v1.11.7 - Brian will own (thanks)
-
Schedule - still do a v3.1 drop before super computing.
-
v3.1.x Snapshots are not getting posted.
- Has to do with cron failures - went to ompi_team. cron on gater. Nightly cronjob sync.py.
- Ralph is forwarding to Brian.
- Causing nightly mtts to not be run.
- Brian didn't get cron failure emails.
- Has to do with cron failures - went to ompi_team. cron on gater. Nightly cronjob sync.py.
-
Add v3.1 to MTT tests
- Database is active now to accept v3.1 tests.
-
Last week MTT disk filled up.
-
PMIx 2.1 should get in in time for v3.1
- In master, but no PR to OMPI v3.1.x yet, since they haven't released it yet.
- Still intending to whip OMPI v3.1.0 with PMIx 2.1, but backup plan is PMIx v2.0 (there now)
-
Administration
- Revised Bylaws -
- rewording "group" and "community" to be more explicit to reflect involvement level of developers or contributors in other ways (like providing resources, etc).
- Helps support those who support us.
- voting yet via reply in email is sufficient.
- Revised Bylaws -
-
After Bylaws pass, will Nominate for formal membership
Review Master Master Pull Requests
Review Master MTT testing
- Website - openmpi.org
- Brian trying to make things more automated, so can checkout repo, etc. Repo is TOO large.
- Majority of the problem is the Tarballs. and already storing those in S3.
-
Need to see if Attributes are MT - IBM will see if we have any tests to audit.
- Asked, need to get answer back from them.
- Jan / Feb
- Possible locations: San Jose, Portland, Albuquerque, Dallas
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA