You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the 2n2p pt2pt/osu_latency_mp sharness test on the fluke login node the following was emitted:
expecting success: run_osutest 300 2 2 pt2pt/osu_latency_mp
# OSU MPI Multi-process Latency Test
# Number of forked processes in sender: 2
# Number of forked processes in receiver: 2
# Size Latency (us)
[1633015691.215259] [fluke108:1585004:0] ib_md.c:1140 UCX WARN IB: ibv_fork_init() was disabled or failed, yet a fork() has been issued.
[1633015691.215263] [fluke108:1585004:0] ib_md.c:1141 UCX WARN IB: data corruption might occur when using registered memory.
[1633015691.230614] [fluke108:1585004:0] ib_md.c:1140 UCX WARN IB: ibv_fork_init() was disabled or failed, yet a fork() has been issued.
[1633015691.230623] [fluke108:1585004:0] ib_md.c:1141 UCX WARN IB: data corruption might occur when using registered memory.
[1633015691.215259] [fluke108:1585004:0] ib_md.c:1140 UCX WARN IB: ibv_fork_init() was disabled or failed, yet a fork() has been issued.
[1633015691.215263] [fluke108:1585004:0] ib_md.c:1141 UCX WARN IB: data corruption might occur when using registered memory.
not ok 6 - 2n2p pt2pt/osu_latency_mp
The text was updated successfully, but these errors were encountered:
I have seen a similar issue with MVAPICH on a Linux platform, and this was what I got from Ben W. back then.
Ben W:
IB control registers or semaphores are being mapped into normal application address space by the userspace part of the IB stack. This is fine and normal but for some reason sometimes it is getting colocated with application data within the same page.
To protect the MPI execution context while still supporting fork()/exec() or system() the kernel side of the IB software needs to make sure that the physical pages backing the MPI execution context including the IB control registers or semaphores stay with the parent process rather than being accessible in the child. However, since the user data, in this case the command line args to be passed to exec(), is colocated within one of these pages it is intentionally is not part of the child process’s address space and exec() fails with EFAULT because the syscall parameter is not in the child’s address space.
So my guess is a control (e.g., envVar) was used to disable fork but some part of the code actually did the fork and the MPI runtime is complaining about it?
During the
2n2p pt2pt/osu_latency_mp
sharness test on the fluke login node the following was emitted:The text was updated successfully, but these errors were encountered: