Name		Name	Last commit message	Last commit date
parent directory ..
output		output
MASTER.scr		MASTER.scr
README.md		README.md
cleanUp.scr		cleanUp.scr
config.dat		config.dat
marcc_examples.pdf		marcc_examples.pdf
marcc_examples.tex		marcc_examples.tex
plot.py		plot.py
runParameters.dat		runParameters.dat
skrf.py		skrf.py
testMASTER.scr		testMASTER.scr
workerScript.scr		workerScript.scr

Example 001

This example gives a framework for using a python script to train a RandomForest on a dataset and write some statistics of that trained model to disk.

We will do this while sweeping over a suite of datasets and possible parameters.

The run this example on MARCC the folowing steps must be completed:

git clone https://github.com/neurodata/marcc_examples.git
ml python/3.8
python -m venv ~/env_examples
source ~/env_examples/bin/activate
pip install -r marcc_examples/requirements.txt

The files involved

This is the workhorse python script. It takes command line arguments as parameters to tell the script which dataset to use with which parameters. It also takes in a set of fixed parameters from the config.dat file. It ultimately trains a randomForest on using the input dataset ID and writes the results to disk.

`workerScript.scr`

This is the slurm scirpt that sets up the individual job parameters, such as number of nodes per job, cpus, memory, etc. and calls skrf.py feeding it the job parameters for the given job in the array.

`cleanUp.scr`

This is a slurm script that runs the clean up scripts which could be anything from moving files around, aggregating outputs from the previous jobs, plotting, etc.

`plot.py`

This is an example clean up script in python for aggregating and plotting the results.

`config.dat`

This is a file that specifies other parameters that are fixed accross each individual job. There are two sections, default for MARCC and dev for use on my local box. This helps with setting parameters such as number of runs, or number of trees to an acceptable level for testing before release on the cluster.

`runParameters.dat`

This is a space delimited file that contains the parameters used for each individual job. The parameters on line i are used for job i in the job array. For this example we have dataID runID.

`MASTER.scr`

This is the MASTER script. It is in charge of submitting the main job as a job array with the appropriate length and submitting the cleanUp job as a dependancy. It limits the job array to run 2 jobs concurrently so as to not overlaod the queue.

Running on MARCC

Starting from the marcc_examples/ex001 directory,

To run this on your own MARCC account you must update the lines with --mail-type= and --mail-user= with end and your username respectively in the files workerScript.scr and cleanUp.scr.

After one has a basic grasp of what the files in the above section are doing you can run the whole Job with the following:

sh MASTER.scr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ex001

ex001

README.md

Example 001

The files involved

`skrf.py`

`workerScript.scr`

`cleanUp.scr`

`plot.py`

`config.dat`

`runParameters.dat`

`MASTER.scr`

Running on MARCC

Files

ex001

Directory actions

More options

Directory actions

More options

Latest commit

History

ex001

Folders and files

parent directory

Example 001

The files involved

Running on MARCC