-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submit script for tianhe2 SLURM #93
Comments
@QJohn2017 some questions regarding your setup on tinahe2:
If you could provide an example submit file, that would help a lot. |
@PrometheusPi |
@PrometheusPi And other submit command is |
@PrometheusPi |
When I use the submit commands above, one of the output file is named |
Do you submit these scripts via the command siom003@login3 sbatch my_submit_file.sh You do not need any specifications via bash comments as e.g.: #SBATCH --partition=normal
#SBATCH --time=24:00:00
#SBATCH --job-name=NameOfMyJob
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=12
#SBATCH --ntasks-per-core=1
#SBATCH -o stdout
#SBATCH -e stderr
What does the help file of |
@QJohn2017 |
Could you try running |
@PrometheusPi |
@PrometheusPi , |
@PrometheusPi , |
The bash interface |
Remark for myself: |
Okay - Sorry - I got confused: Then please excuse my confusion and the use of |
@PrometheusPi , |
@PrometheusPi , |
Okay - then please excuse my confusion. I thought it is the same machine but different nodes. Than let us start with finding a setup for tianhe2 first. I know tianhe2 has a SLURM submit system with the above mentioned renaming. In |
Could you copy the help of |
@PrometheusPi , When I input the following commands to submit the job There are some errors appeared as follows
|
@PrometheusPi ,
|
Great, it looks like clara2 was put on the cluster. Could you please run If you add the following line to your #!/bin/bash
ldd executable
yhrun -N 2 -n 48 executable the same output should appear. However, I assume that |
Thank you for the help file. So far, everything looks the same to |
@PrometheusPi
|
Great, and what does |
When the commands in
The output errors become as follows:
|
Okay - it looks like as if your bash environment is different between the node you compiled at and the node you ran via Could you please copy the output of echo $LD_LIBRARY_PATH as returned from compile node to this issue. And then additionally post the same result returned by #!/bin/bash
echo $LD_LIBRARY_PATH
yhrun -N 2 -n 48 executable There is probably some difference in the |
The command
|
Thank you for posting the
|
@PrometheusPi |
After I load the fftw module, and input the command
|
Thank you for uploading the Okay - then let's come back to this discussion after the support replied. If I have another idea how to avoid the modules, I will post it here. |
@PrometheusPi , |
Hi, @PrometheusPi ,
According to their suggestions, I have changed the submit script file as above, but there are still some errors as follows:
|
@PrometheusPi ,
When I submit the job again, the output resluts are as follows:
It seems right, but I don't know how to do next step. Please respond to me, thank you! |
Hi @QJohn2017, thank you for the update. Please excuse my late reply, it was a busy week at work. Glad to hear that it seems to work! Interesting that sourcing the file
helped executing the script. Did they tell you what configuration this file changed? I assume that by loading a module for the compiler Looks looks like the initial 48 trajectories are going to be processed. Was there an error output (commonly a file name containing The In between these lines of code, I redirected the output into files named like this. Were the results with this name My first guess would be that clara2 did not find the input trajectories? Shall we go through all the parameters in |
@TheresaBruemmer reported a similar issue with missing link to @Belfi Are you willing to submit a pull request with your solution for Maxwell or may I cherry-pick your solution to make it available in the main repo? |
@TheresaBruemmer Perhaps sourcing the compiler variables as @QJohn2017 did will help you when using
or you try to locate
and ad the path to the EDIT: |
Hi, @PrometheusPi
|
Hello, |
@PrometheusPi , |
@PrometheusPi ,There is no output file named |
@PrometheusPi , Can you help me to go through the main parameters in |
@PrometheusPi ,
|
@QJohn2017 Okay thank you. That look good. |
@QJohn2017
but no:
This means that no trajectory was found. In the
if the stream redirect does not work as on our system. Thus we need to adjust the parameters. |
@PrometheusPi |
@QJohn2017 To keep this issue clean and only focused on issues regarding the technical setup on tianhe2 I moved the discussion on how to setup clara2 to issue #96. Feel free to ask further questions there. If you still encounter issues with the compilation or execution on tianhe2 please comment here in this issue. I will not close this until you have a working version on clara2 on tinahe2. |
@belfhi Thank you for allowing me to cherry-pick your solution. I will try to incorporate your work as soon as possible. |
@TheresaBruemmer Feel free to switch to #96 if you encounter issues with the setup of clara2. |
@QJohn2017 Great to hear in #96 that your simulation ran 🎉 If so , would you be so kind to provide a example submit file, so that I could add a module setup similar to the hypnos and Maxwell cluster for tinahe2? Or you could provide this as your first pull request to clara2 yourself. Decide yourself what you would prefer. Thank you in advance. Your feedback so far was already very helpful - many thanks. |
@PrometheusPi ,
And the command to submit the job is |
@QJohn2017 Thank you! I will provide a default source file as soon as possible. Do you know what prefix a submit script uses in order to transfer the arguments as |
@PrometheusPi , Sorry, I don't know what the prefix of submitting script, I guess it maybe |
@QJohn2017 Then I would suggest that I write a default submit script based on |
@PrometheusPi , Okay, thank you ! |
As needed by @QJohn2017 a setup script for tianhe2 is needed (see #89). Since tianhe2 uses SLURM for scheduling, either the
./prepare_job
script needs to be adjusted to an./prepare_job_tianhe2.sh
script which creates a SLURM submit file or we go directly with an submit file that focuses on MPI jobs only (since tianhe2 is large and probably has to handle quite a lot of jobs, MPI is probably the better choice for this system).Additionally, submit scripts for other clusters (taurus, PizDaint, etc.) should be provided.
@QJohn2017 Are you planing to submit with MPI only or are you also considering running SLURM Array jobs?
If only an MPI job is planned, a simple submit script should be sufficient.
The text was updated successfully, but these errors were encountered: