Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra: spawn failing due to empty-string argument #7073

Open
drew-parsons opened this issue Jul 18, 2024 · 1 comment
Open

hydra: spawn failing due to empty-string argument #7073

drew-parsons opened this issue Jul 18, 2024 · 1 comment

Comments

@drew-parsons
Copy link

Debian Linux now uses mpich as the default MPI on 32-bit architectures (armel, armhf, i386, hppa, m68k, powerpc).

The corresponding rebuild of mpi4py fails spawn tests, reported at mpi4py/mpi4py#514

Running in an armel (armv8l) chroot with env variables HYDRA_IFACE=lo HYDRA_LAUNCHER=fork gets the following error log, with exit code 255

(sid_armel-dchroot)$ HYDRA_IFACE=lo HYDRA_LAUNCHER=fork PYTHONPATH=./debian/python3-mpi4py/usr/lib/python3/dist-packages autopkgtest -B -- null
...
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
testPutProcNull (test_rma_nb.TestRMAWorld.testPutProcNull) ... ok
ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... ok
testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... testArgsOnlyAtRoot (test_spawn.TestSpawnSelf.testArgsOnlyAtRoot) ... [proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_STDERR
[proxy:0@amdahl] we don't understand this command, forwarding upstream
[proxy:0@amdahl]     mcmd=spawn
nprocs=1
execname=/usr/bin/python3.12
totspawns=1
spawnssofar=1
argcnt=3
arg1=/tmp/autopkgtest.VaRWBv/tree/test/spawn_child.py
arg2=/tmp/autopkgtest.VaRWBv/tree/debian/python3-mpi4py/usr/lib/python3/dist-packages
arg3=
preput_num=1
preput_key_0=PARENT_ROOT_PORT_NAME
preput_val_0=tag#0$description#amdahl$port#57335$ifname#127.0.0.1$
info_num=0
endcmd
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_PMI
[proxy:0@amdahl] we don't understand this command, forwarding upstream
[proxy:0@amdahl]     mcmd=spawn
nprocs=1
execname=/usr/bin/python3.12
totspawns=1
spawnssofar=1
argcnt=3
arg1=/tmp/autopkgtest.VaRWBv/tree/test/spawn_child.py
arg2=/tmp/autopkgtest.VaRWBv/tree/debian/python3-mpi4py/usr/lib/python3/dist-packages
arg3=
preput_num=1
preput_key_0=PARENT_ROOT_PORT_NAME
preput_val_0=tag#0$description#amdahl$port#36665$ifname#127.0.0.1$
info_num=0
endcmd
[proxy:0@amdahl] Sending upstream hdr.cmd = CMD_PMI
[mpiexec@amdahl] [pgid: 0] got PMI command: mcmd=spawn
nprocs=1
execname=/usr/bin/python3.12
totspawns=1
spawnssofar=1
argcnt=3
arg1=/tmp/autopkgtest.VaRWBv/tree/test/spawn_child.py
arg2=/tmp/autopkgtest.VaRWBv/tree/debian/python3-mpi4py/usr/lib/python3/dist-packages
arg3=
preput_num=1
preput_key_0=PARENT_ROOT_PORT_NAME
preput_val_0=tag#0$description#amdahl$port#57335$ifname#127.0.0.1$
info_num=0
endcmd
[unset]: ERROR: Expecting value after arg3= in parse_v1_mcmd (202)
[unset]: ERROR: PMIU_cmd_parse (310)
[mpiexec@amdahl] handle_pmi_cmd (mpiexec/pmiserv_cb.c:57): unable to parse PMI command
[mpiexec@amdahl] control_cb (mpiexec/pmiserv_cb.c:367): unable to process PMI command
[mpiexec@amdahl] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@amdahl] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:173): error waiting for event
[mpiexec@amdahl] main (mpiexec/mpiexec.c:260): process manager error waiting for completion
autopkgtest [16:29:04]: ERROR: testbed failure: testbed auxverb failed with exit code 255
@hzhou
Copy link
Contributor

hzhou commented Jul 18, 2024

Thanks @drew-parsons for reporting the issue. I believe this is not related to 32-bit architecture. It is due to an older version of mpi4py passing an empty argument in the spawn command -- the arg3 in the log. The current PMI code does not handle empty arguments. I guess we could accept empty argument since the arguments are separated by newlines -- which means we can even accept spaces -- but that also have more opportunities for users to make mistakes without realizing it. We'll discuss this next week.

@hzhou hzhou changed the title spawn failing on 32-bit architectures hydra: spawn failing due to empty argument Jul 18, 2024
@hzhou hzhou changed the title hydra: spawn failing due to empty argument hydra: spawn failing due to empty-string argument Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants