-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support pmix version 3.1.2 as installed on IBM coral systems #85
Comments
Apparently this older environment is about to get updated to TOSS4 including the newer pmix, and in fact flux-pmix pre-built. So I think this is actually a non-issue or soon will be. Closing. |
We never did get the installed pmix312 package working with flux:
Giving up on that one and just building from scratch. |
flux-pmix needs the following patch so that its unit tests run the configured flux instead of /usr/bin/flux diff --git a/t/sharness.d/00-setup.sh.in b/t/sharness.d/00-setup.sh.in
index ebd0546..c1f7fa4 100644
--- a/t/sharness.d/00-setup.sh.in
+++ b/t/sharness.d/00-setup.sh.in
@@ -1,2 +1,2 @@
-PATH=@FLUX_PREFIX@/bin:$PATH
PATH=@OMPI_PREFIX@/bin:$PATH
+PATH=@FLUX_PREFIX@/bin:$PATH
diff --git a/t/t0002-basic.t b/t/t0002-basic.t
index b716715..4e19f14 100755
--- a/t/t0002-basic.t
+++ b/t/t0002-basic.t I'll propose a PR. |
Here is a recipe for building a working flux wtih pmix capability in First Build and install flux-core
Build and install openpmix
Build, check, install flux-pmix
You can put back the tce envirnoment (remove .notce and log in again). I can't say for certain that it causes any problems with the build but I tend to leave it out when debugging build problems since it often complicates things and these packages are intended to be buildable from base system packages alone. |
Problem: the test suite pushed the ompi bin directory in front of the flux bin directory, but when ompi is installed as a system package, this places /usr/bin in front of a possibly side installed flux-core path. This was noted to be the case on LLNL's lassen system. Place the flux path in front of the ompi path when setting up sharness test paths. This was first noted in flux-framework#85.
And note to self, to run a test with the above on lassen: First get an allocation:
Then launch flux with pmi debug enabled
|
The TOSS 4 updates on LLNL's sierra systems have been postponed indefinitely so we really do need to get this working. |
Dropping the
I suppose what's happening is that jsrun/jsm is trying to force spectrum mpi to use the pmix client it was built with in a rather heavy handed way, so maybe the rpath pmix and the runtime pmix are getting mixed up and confused? Anyway a
Edit: oof! Apparently I didn't give the right jsrun options to get all the gpus and cpus assigned. Edit: in case it's useful, jsm's pmix server version is (from
Edit: and this is the parent process of launched tasks, so most likely contains the pmix server:
|
The missing jsrun option was
That was in the flux coral2 doc so my bad. Presumably we don't have GPUs because we've linked against the wrong hwloc. Hmm, looks like we got the system one built by redhat. That's another roadblock if the one we need isn't packaged for the build farm. |
But the flux-pmix shell plugin's pmix server now can't find its plugins (I assume):
|
Tantalizingly, this error message leaks through the failed path (to an IBM build farm location no doubt) that the pmix server we're using was built with a cuda-aware hwloc. Hmm, I think there may be a way to get the hwloc xml via the pmix client if we want to go that way. |
@grondo suggested building Flux with spack to see how it goes, so I tried this on lassen:
The sched build seems to have failed
and more of the same .... |
Problem: building flux-pmix on coral systems is a pain, but newer versions of flux-core require it to enable flux to bootstrap from LSF.
On our system, there are three versions of pmix provided by IBM:
They are all rooted in /usr/pmix
and they do not provide pkg-config files.
We should reduce the minimum required version from 3.2.3 to 3.1.2 and cover that version in CI.
I'm not sure what to do about the missing pkg-config file. flux-pmix wants to find pmix that way. For my testing I just created a
pmix.pc
by hand:The text was updated successfully, but these errors were encountered: