-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
many OSU benchmark tests time out on fluke login node #36
Comments
some debug output with index 6c87005..8af0fef 100755
--- a/t/t1001-osu-benchmarks.t
+++ b/t/t1001-osu-benchmarks.t
@@ -65,6 +65,7 @@ test_expect_success 'create rc.lua script' "
cat >rc.lua <<-EOT
plugin.load (\"$PLUGINPATH/pmix.so\")
shell.setenv (\"OMPI_MCA_btl_tcp_if_include\", \"lo\")
+ shell.setenv (\"OMPI_MCA_btl_base_verbose\", \"100\")
EOT
"
|
It looks like each rank successfully bound to a tcp port
But there are no connect messages. Usually we see (from the tcp btl) something like this:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Running
TEST_LONG=t ./t1001-osu-benchmarks.t
, it seems the tests that involving two "nodes" are taking a very long time to run and exceed their timeouts, much like when some of those LONGTEST tests were enabled in CI.For example,
1n2p pt2pt/osu_latency
runs in about a second, but2n2p pt2pt/osu_latency
exceeds its 5m timeout.Here's the output from the 1n2p one:
The text was updated successfully, but these errors were encountered: