-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue detecting a profile on single node system #28
Comments
Hi Tom,
Did you use e4s-cl init before this? What command did you use?
Thanks,
- Sameer
… On Aug 10, 2021, at 6:05 AM, Tom Robinson ***@***.***> wrote:
I am trying to set profile detect to set up my profile. Here is my sample Fortran program that sums the ranks (run with 11 ranks isum = 55):
PROGRAM hello_world_mpi
include 'mpif.h'
integer process_Rank, size_Of_Cluster, ierror
integer root_rank, isum
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)
root_rank = 0
call MPI_Reduce(process_rank, isum, 1, MPI_INT, MPI_SUM, root_rank, MPI_COMM_WORLD, ierror);
call MPI_bcast (isum, 1, MPI_INTEGER, root_rank, MPI_COMM_WORLD, ierror)
print *, 'Hello World from process: ', process_Rank, 'of ', size_Of_Cluster, 'sum = ', isum
end program
I compiled it with mpiifort
$ mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 Update 9 for Linux*
Copyright 2003-2020, Intel Corporation.
ifort version 19.1.3.304
This program runs with mpirun
$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2019 Update 9 Build 20200923 (id: abd58e492)
Copyright 2003-2020, Intel Corporation.
$ mpirun -np 11 -hosts lscamd50-d.gfdl.noaa.gov ./test.x
Here is my e4s-cl command
$ e4s-cl profile detect -p am4Run mpirun -np 11 -hosts lscamd50-d.gfdl.noaa.gov ./test.x
Failed to determine necessary libraries.
The advice in the documentation is to specify multiple hosts (https://e4s-project.github.io/e4s-cl/reference/profiles/detect.html#profile-detect), but this is a single node system with 128 cores. How can I get all of the libraries needed to run on my system?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
I am seeing a similar issue with Intel 21. This is peculiar, and the same library has no issue being detected with C programs.
Can you provide the output of e4s-cl -v profile detect -p am4Run mpirun -np 11 -hosts lscamd50-d.gfdl.noaa.gov ./test.x ? I am seeing MPI errors relating that the program has been killed.
The warning in the documentation is meant to prevent false positives. Some libraries lazy-load dependencies depending on the hosts to run on, and on multi-node systems this can result in incomplete profiles. You do not need to worry about this here.
|
On further testing, it seems like the issue does not originate in This seems to fail with the binaries created by
I will look into this. I am positive profile detection worked with Fortran binaries compiled with other MPI flavours, so this must be an Intel quirk. A profile created with a C program can ususally also be used after adding the fortran (
|
Sorry for the delay and long post. I couldn't post before for some reason. $ e4s-cl init
The target launcher /opt/intel/2020_up3/compilers_and_libraries/linux/mpi/intel64/bin/mpirun uses a single host by default, which may tamper with the library discovery. Consider running `e4s-cl profile detect` using mpirun specifying multiple hosts.
$ e4s-cl profile detect -p am4Run mpirun -np 11 -hosts lscamd50-d.gfdl.noaa.gov ./test.x
Failed to determine necessary libraries. I ran the debug and got this: $ e4s-cl -v profile detect -p am4Run mpirun -np 11 -hosts lscamd50-d.gfdl.noaa.gov ./test.x
[Debug] Arguments: Namespace(command='profile', options=['detect', '-p', 'am4Run', 'mpirun', '-np', '11', '-hosts', 'lscamd50-d.gfdl.noaa.gov', './test.x'], dry_run=None, slave=None, verbose='DEBUG')
[Debug] Verbosity level: DEBUG
[Debug] e4s-cl profile args: Namespace(subcommand='detect', options=['-p', 'am4Run', 'mpirun', '-np', '11', '-hosts', 'lscamd50-d.gfdl.noaa.gov', './test.x'])
[Debug] e4s-cl profile detect args: Namespace(profile_name='am4Run', cmd=['mpirun', '-np', '11', '-hosts', 'lscamd50-d.gfdl.noaa.gov', './test.x'])
[Debug] Creating subprocess: mpirun -np 11 -hosts lscamd50-d.gfdl.noaa.gov /home/Thomas.Robinson/e4s-cl/bin/e4s-cl --slave profile detect ./test.x
[Debug] Hello World from process: 5 of 11 sum = 55
Hello World from process: 3 of 11 sum = 55
Hello World from process: 0 of 11 sum = 55
Hello World from process: 1 of 11 sum = 55
Hello World from process: 4 of 11 sum = 55
Hello World from process: 6 of 11 sum = 55
Hello World from process: 7 of 11 sum = 55
Hello World from process: 8 of 11 sum = 55
Hello World from process: 2 of 11 sum = 55
Hello World from process: 10 of 11 sum = 55
Hello World from process: 9 of 11 sum = 55
{"files": {"__type": "set", "__list": ["/opt/intel/2020_up3/compilers_and_libraries/linux/mpi/intel64/lib/release/libmpi.so.12", "/etc/libnl/classid", "/opt/intel/2020_up3/compilers_and_libraries/linux/mpi/intel64/etc/tuning_generic_shm-ofi.dat"]}, "libraries": {"__type": "set", "__list": ["/lib64/libpsm2.so.2", "/lib64/libnl-route-3.so.200", "/lib64/libc.so.6", "/lib64/libnl-3.so.200", "/lib64/libfabric.so.1", "/lib64/libgcc_s.so.1", "/lib64/libnuma.so.1", "/lib64/libm.so.6", "/lib64/librt.so.1", "/opt/intel/2020_up3/compilers_and_libraries/linux/mpi/intel64/lib/libmpifort.so.12", "/lib64/libdl.so.2", "/lib64/libpthread.so.0", "/lib64/libibverbs.so.1", "/lib64/librdmacm.so.1", "/lib64/libefa.so.1"]}}
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 2127227 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 2127229 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 2127230 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 4 PID 2127231 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 5 PID 2127232 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 6 PID 2127233 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 7 PID 2127234 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 8 PID 2127235 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 9 PID 2127236 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 10 PID 2127237 RUNNING AT lscamd50-d.gfdl.noaa.gov
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
[Debug] ['mpirun', '-np', '11', '-hosts', 'lscamd50-d.gfdl.noaa.gov', '/home/Thomas.Robinson/e4s-cl/bin/e4s-cl', '--slave', 'profile', 'detect', './test.x'] returned 255
Failed to determine necessary libraries. If I change to a C program, do you think it will work then? What libraries do I need to link in? I tried to launch with the default profile created, but I get a different error $ e4s-cl launch --backend singularity --image am4_2021.03_ubuntu_intel.sif mpirun -n 48 ./2021.03_run.sh
Using selected profile default-137215bba819ae9d045d5b51c339b35e38c270bdafcf5d6a9181ae2e3640502d
2137479 on lscamd50-d.gfdl.noaa.gov: ./2021.03_run.sh: error while loading shared libraries: ./2021.03_run.sh: invalid ELF header Maybe this is a different issue. |
Github's servers were down for a little while, I wasn't able to edit either ! I tested the profile detection with C programs and it should work. This is just to detect the libraries, you can run Fortran binaries with the tool and it should work as This is another issue unfortunately. Here I add all but the last line to a setup script, and pass it to
|
I am trying to set profile detect to set up my profile. Here is my sample Fortran program that sums the ranks (run with 11 ranks
isum = 55
):I compiled it with
mpiifort
This program runs with
mpirun
Here is my
e4s-cl
commandThe advice in the documentation is to specify multiple hosts (https://e4s-project.github.io/e4s-cl/reference/profiles/detect.html#profile-detect), but this is a single node system with 128 cores. How can I get all of the libraries needed to run on my system?
The text was updated successfully, but these errors were encountered: