You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem: on Ubuntu 22.04.1 LTS, flux-pmix fails make check when built with an external openpmix-4.2.2 (default configure options) and openmpi-4.1.2-2ubuntu1 is installed:
expecting success:
run_timeout 30 flux mini run -overbose=2 -N1 -n2 \
${MPI_HELLO} >hello_1n2p.out &&
grep "There are 2 tasks" hello_1n2p.out
0.027s: flux-shell[0]: DEBUG: Loading /opt/flux-core-v0.46.1-54/etc/flux/shell/initrc.lua
0.027s: flux-shell[0]: TRACE: Sucessfully loaded flux.shell module
0.027s: flux-shell[0]: TRACE: trying to load /opt/flux-core-v0.46.1-54/etc/flux/shell/initrc.lua
0.027s: flux-shell[0]: TRACE: trying to load /opt/flux-core-v0.46.1-54/etc/flux/shell/lua.d/intel_mpi.lua
0.027s: flux-shell[0]: TRACE: trying to load /opt/flux-core-v0.46.1-54/etc/flux/shell/lua.d/mvapich.lua
0.028s: flux-shell[0]: TRACE: trying to load /opt/flux-core-v0.46.1-54/etc/flux/shell/lua.d/openmpi.lua
0.028s: flux-shell[0]: TRACE: trying to load /home/garlick/proj/flux-pmix/t/etc/rc.lua
0.029s: flux-shell[0]: DEBUG: output: batch timeout = 0.500s
0.030s: flux-shell[0]: DEBUG: pmix: jobid = 13690208256
0.030s: flux-shell[0]: DEBUG: pmix: shell_rank = 0
0.030s: flux-shell[0]: DEBUG: pmix: local_nprocs = 2
0.030s: flux-shell[0]: DEBUG: pmix: total_nprocs = 2
0.030s: flux-shell[0]: DEBUG: pmix: server outsourced to OpenPMIx 4.2.2rc2
0.052s: flux-shell[0]: DEBUG: pmix: local_peers = 0,1
0.052s: flux-shell[0]: DEBUG: pmix: node_map = system76-pc
0.052s: flux-shell[0]: DEBUG: pmix: proc_map = 0,1
0.052s: flux-shell[0]: DEBUG: 0: task_count=2 slot_count=2 cores_per_slot=1 slots_per_node=2
0.052s: flux-shell[0]: DEBUG: 0: tasks [0-1] on cores 0-1
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
PMIx stopped checking at the first component that it did not find.
Host: system76-pc
Framework: psec
Component: munge
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
PMIx stopped checking at the first component that it did not find.
Host: system76-pc
Framework: psec
Component: munge
--------------------------------------------------------------------------
[system76-pc:159601] PMIX ERROR: PACK-MISMATCH in file ../../../src/client/pmix_client.c at line 832
[system76-pc:159601] OPAL ERROR: Pack data mismatch in file ext3x_client.c at line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[system76-pc:159601] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
PMIx stopped checking at the first component that it did not find.
Host: system76-pc
Framework: psec
Component: munge
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
PMIx stopped checking at the first component that it did not find.
Host: system76-pc
Framework: psec
Component: munge
--------------------------------------------------------------------------
[system76-pc:159602] PMIX ERROR: PACK-MISMATCH in file ../../../src/client/pmix_client.c at line 832
[system76-pc:159602] OPAL ERROR: Pack data mismatch in file ext3x_client.c at line 112
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[system76-pc:159602] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
0.061s: flux-shell[0]: TRACE: pmi: 0: C: pmi EOF
0.061s: flux-shell[0]: DEBUG: task 0 complete status=1
0.061s: flux-shell[0]: TRACE: pmi: 1: C: pmi EOF
0.061s: flux-shell[0]: DEBUG: task 1 complete status=1
0.071s: flux-shell[0]: DEBUG: exit 1
Neither openmpi's built-in libpmix nor the side-installed 4.2.2 used to build flux-pmix have a psec_munge plugin installed as a separate DSO. However, rebuilding openpmix-4.2.2 with --without-munge does resolve the problem.
Based on the pack error, it would appear that the requirement for munge is not negotiated between client and server - it changes the wire protocol and mismatched configurations cannot interoperate. See also https://bugs.schedmd.com/show_bug.cgi?id=12396
The text was updated successfully, but these errors were encountered:
Problem: on Ubuntu 22.04.1 LTS, flux-pmix fails make check when built with an external openpmix-4.2.2 (default configure options) and openmpi-4.1.2-2ubuntu1 is installed:
Neither openmpi's built-in libpmix nor the side-installed 4.2.2 used to build flux-pmix have a psec_munge plugin installed as a separate DSO. However, rebuilding openpmix-4.2.2 with
--without-munge
does resolve the problem.Based on the pack error, it would appear that the requirement for munge is not negotiated between client and server - it changes the wire protocol and mismatched configurations cannot interoperate. See also https://bugs.schedmd.com/show_bug.cgi?id=12396
The text was updated successfully, but these errors were encountered: