Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jhalverson authored Oct 21, 2023
1 parent 2b3aa90 commit 8a7b608
Showing 1 changed file with 1 addition and 234 deletions.
235 changes: 1 addition & 234 deletions 04_first_job/README.md
Original file line number Diff line number Diff line change
@@ -1,236 +1,3 @@
# Running Your First Job on the Cluster

This page provides a demonstration of how to transfer files to Adroit and run a job using the Slurm scheduler. There are examples for both Python and R. To obtain the example materials, run this command in a terminal on your **local machine**:

```
# on your laptop
$ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop
```

## Python Script Example

### On Your Local Machine

Examine the files in a terminal on your laptop:

```
$ cd hpc_beginning_workshop/04_first_job/python_example
$ cat matrix_inverse.py
$ cat job.slurm
```

Here are the contents of the Python script:

```python
import numpy as np
N = 3
X = np.random.randn(N, N)
print("X =\n", X)
print("Inverse(X) =\n", np.linalg.inv(X))
```

Below is the Slurm script which (1) prescribes the resource requirements for the job, (2) sets the environment and (3) specifies the work to be carried out (which in this case is to run a Python script):

```bash
#!/bin/bash
#SBATCH --job-name=py-matinv # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # how many instances of your command are run, total, across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin # send email when job begins
#SBATCH --mail-type=end # send email when job ends
#SBATCH --mail-type=fail # send email if job fails
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load anaconda3/2020.11

python matrix_inverse.py
```

Next, while still on your laptop and using a [VPN](https://www.princeton.edu/vpn) if off-campus, run the following `ssh` command to create a directory on Adroit (you need to replace `<YourNetID>` twice):

```
$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/python_test"
```

Note: If you are doing this exercise on Tiger, Della, Stellar or Traverse then replace `/scratch/network/` with `/scratch/gpfs/`.

Transfer the Python and Slurm scripts from your laptop to Adroit using the `scp` (secure copy) command ([VPN](https://www.princeton.edu/vpn) required if off-campus):

```
$ scp matrix_inverse.py job.slurm <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/python_test
```

Now everything is in place on Adroit. Let's connect to the head node of that cluster and submit the job.

### On Adroit

SSH to Adroit ([VPN](https://www.princeton.edu/vpn) required if off-campus):

```
$ ssh <YourNetID>@adroit.princeton.edu
```

Change the working directory ($USER does not need to be replaced):

```
$ cd /scratch/network/$USER/python_test
```

List the files in the current directory:

```
$ ls -l
```

Submit the job by running the following command:

```
# use a text editor to replace <YourNetID> in job.slurm with your actual NetID
$ sbatch job.slurm
```

This will place your job in the queue. You can monitor the status of your job with `squeue -u <YourNetID>`. If the `ST` field is `PD` (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. After the job runs you can view the output with `cat slurm-<XXXXXX>.out`. You will receive an email when the job is finished if you entered your email address in the Slurm script.

Here is an example of the expected output:

```
X =
[[-0.70101861 0.20261191 0.10836766]
[ 0.86684552 -0.75347296 -0.52716024]
[-0.02477092 0.21738458 -0.11216934]]
Inverse(X) =
[[-2.01455049 -0.46828701 0.25452735]
[-1.11588991 -0.82273617 2.78852862]
[-1.71771528 -1.49105147 -3.56712226]]
```

### Tired of Duo?

You can [suppress Duo](https://github.com/PrincetonUniversity/removing_tedium) in a variety of ways. You should not try to do this during an in-person workshop.

## R Script Example

### On Your Local Machine

In a terminal on your laptop, change the working directory and examine the scripts:

```
$ cd hpc_beginning_workshop/04_first_job/R_example
$ cat data_analysis.R
$ cat job.slurm
$ head cdc.csv
```

Here is the R script:

```R
health = read.csv("cdc.csv")
print(summary(health))
```

Below is the Slurm script:

```bash
#!/bin/bash
#SBATCH --job-name=R-test # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # how many instances of your command are run, total, across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multithread tasks)
#SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin # send email when process begins
#SBATCH --mail-type=fail # send email if job fails
#SBATCH --mail-type=end # send email when job ends
#SBATCH --mail-user=<YourNetID>@princeton.edu

Rscript data_analysis.R
```

Here are the first few lines of the data file (`cdc.csv`):

```
genhlth,exerany,hlthplan,smoke100,height,weight,wtdesire,age,gender
good,0,1,0,70,175,175,77,m
good,0,1,1,64,125,115,33,f
good,1,1,1,60,105,105,49,f
good,1,1,0,66,132,124,42,f
very good,0,1,0,61,150,130,55,f
very good,1,1,0,64,114,114,55,f
very good,1,1,0,71,194,185,31,m
very good,0,1,0,67,170,160,45,m
good,0,1,1,65,150,130,27,f
good,1,1,0,70,180,170,44,m
...
```

Next, while still on your laptop and using a [VPN](https://www.princeton.edu/vpn) if off-campus, run the following `ssh` command to create a directory on Adroit (you need to replace `<YourNetID>` twice):

```
$ ssh <YourNetID>@adroit.princeton.edu "mkdir -p /scratch/network/<YourNetID>/R_test"
```

Note: If you are doing this exercise on Tiger, Della, Stellar or Traverse then replace `/scratch/network/` with `/scratch/gpfs/`.

Transfer the R script, Slurm script and data file from your laptop to Adroit using the `scp` (secure copy) command ([VPN](https://www.princeton.edu/vpn) required if off-campus):

```
$ scp data_analysis.R job.slurm cdc.csv <YourNetID>@adroit.princeton.edu:/scratch/network/<YourNetID>/R_test
```

Now everything is in place on Adroit. Let's connect to the head node of that cluster and submit the job.

### On Adroit

SSH to Adroit ([VPN](https://www.princeton.edu/vpn) required if off-campus):

```
$ ssh <YourNetID>@adroit.princeton.edu
```

Change the working directory ($USER does not need to be replaced):

```
$ cd /scratch/network/$USER/R_test
```

List the files in the current directory:

```
$ ls -l
```

Submit the job by running the following command:

```
# use a text editor to replace <YourNetID> in job.slurm with your actual NetID
$ sbatch job.slurm
```

This will place your job in the queue. You can monitor the status of your job with `squeue -u <YourNetID>`. If the `ST` field is PD (pending) then your job is waiting for other jobs to finish. If you do not see it in the list then it has finished. After the job runs you can view the output with `cat slurm-<XXXXXX>.out`. You will receive an email when the job is finished if you entered your email address in the Slurm script.

Here is the expected output:

```
genhlth exerany hlthplan smoke100
excellent:4657 Min. :0.0000 Min. :0.0000 Min. :0.0000
fair :2019 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
good :5675 Median :1.0000 Median :1.0000 Median :0.0000
poor : 677 Mean :0.7457 Mean :0.8738 Mean :0.4721
very good:6972 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000
height weight wtdesire age gender
Min. :48.00 Min. : 68.0 Min. : 68.0 Min. :18.00 f:10431
1st Qu.:64.00 1st Qu.:140.0 1st Qu.:130.0 1st Qu.:31.00 m: 9569
Median :67.00 Median :165.0 Median :150.0 Median :43.00
Mean :67.18 Mean :169.7 Mean :155.1 Mean :45.07
3rd Qu.:70.00 3rd Qu.:190.0 3rd Qu.:175.0 3rd Qu.:57.00
Max. :93.00 Max. :500.0 Max. :680.0 Max. :99.00
```

### Tired of Duo?

You can [suppress Duo](https://github.com/PrincetonUniversity/removing_tedium) in a variety of ways. You should not try to do this during an in-person workshop.
Please see [here](https://researchcomputing.princeton.edu/get-started/guide-princeton-clusters/3-first-slurm-job).

0 comments on commit 8a7b608

Please sign in to comment.