Skip to content

Latest commit

 

History

History
227 lines (163 loc) · 7.53 KB

getting-started.md

File metadata and controls

227 lines (163 loc) · 7.53 KB

Getting started

Getting an account

Go to the account request form and select VetMed as the sponsor.

You'll need an ssh public key to create an account. See below for instructions! (If you already have a public key, there is no need to generate a new one!)

Generating an SSH public key to paste into the farm.

To generate a public key for the first time, on Mac OS X and Linux run:

ssh-keygen

at the shell prompt, and hit the enter key.

Then hit enter three times - once to confirm file name, twice to enter an empty password.

Then, you should have a file in your home directory, under

.ssh/id_rsa.pub

Now run cat .ssh/id_rsa.pub and copy/paste the resulting text into the form.

Logging in

To log into the head node, run:

ssh <ucdavis id>@farm.cse.ucdavis.edu

if you have used a non-default private key, use -i <path to private key>.

Warning: do not run big analyses on the 'farm' head node itself - for that, you'll want to use srun or sbatch. See the next section.

(If you do run big analyses on the farm head, you run the risk of getting temporarily banned, and no one wants that!)

Interactively logging in

To request compute node resources to avoid running on the head node, use srun. This will request 20 GB RAM for 24 hours on high priority:

srun -p high2 -t 24:00:00 --mem=20000 --pty bash

(We strongly suggest against doing this regularly, and instead suggest using sbatch to submit scripts that run your analysis; see "Running software via the slurm queuing system" below, for more on the suggested approach.)

Installing conda

We recommend managing most of your software installs via conda and bioconda. Please see this tutorial on conda for a more complete introduction!

You'll need to install conda first, however.

Run the following:

echo source ~/.bashrc >> ~/.bash_profile
curl -LO https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh 

and then answer yes to all the questions!

Log out and log back in again to activate the base conda environment. You should now have a prompt that starts with (base) .

Lastly but not leastly, set up bioconda (see docs for more info):

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Now you can use conda install -y <packagename> to install stuff.

There's a nice conda tutorial for data scientists available!

Conda environments

You can use the default base conda environment for most things, but if you want to create a specialized environment, you can do:

conda create -n somename python==3.6

and then

conda activate somename

Now installs will go into this separate conda environment.

Note that you can do pip installs into a specific conda environment as well, e.g.

conda activate somename
pip install Cython

will install the python package Cython into the conda environment somename.

Running Rstudio interactively

First you need to install Rstudio using conda

conda create -n rstudio rstudio

Now log in to farm through ssh with X11 forwarding; from Mac OS X or Linux,

(If you use Mobaxterm on Windows, you can also get X11 forwarding and display to work; please let us know if you need help doing this!)

Start an interactive job:

srun -t 240 --mem=10g -p high2 --pty bash

Activate conda env for Rstudio and launch an instance

conda activate rstudio
rstudio

An Rstudio interface now should appear on your desktop.

Running software via the slurm queuing system

Briefly, to run big/long-running jobs, you'll need to:

  • create a new shell script with the commands you want to run
  • put slurm SBATCH commands at the top, like so:
#SBATCH -p med2
#SBATCH -J sgc
#SBATCH -t 3-0:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 4
#SBATCH --mem=30gb
  • run sbatch <scriptname>.

A script template I use regularly is something like this:

#! /bin/bash -login
#SBATCH -p med2
#SBATCH -J sgc
#SBATCH -t 3-0:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 4
#SBATCH --mem=30gb

# activate conda in general
. "/home/ctbrown/miniconda3/etc/profile.d/conda.sh"

# activate a specific conda environment, if you so choose
conda activate somename

# go to a particular directory
cd /home/ctbrown/2018-paper-spacegraphcats/pipeline-base

# make things fail on errors
set -o nounset
set -o errexit
set -x

### run your commands here!

# Print out values of the current jobs SLURM environment variables
env | grep SLURM

You can use squeue | grep ctbrown to look at the status of your jobs.

The job will output a file slurm-<NUMBER>.out that contains all of the output and errors from your job.

If you need to use more memory than slurm allows (e.g. sbatch fails with an error sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified or sbatch: error: CPU count per node can not be satisfied or sbatch: error: Batch job submission failed: Requested node configuration is not available then you probably need to use a different partition; see Partitions/queues we have available.

Shared storage

For files shared among users (references, databases, etc), use /group/ctbrowngrp/ to avoid having redundant files.

Using shared resources

Users in ctbrowngrp collectively share resources. Currently, this group has priority access to 1 TB of ram and 96 CPUs on one machine, and 256 GB RAM and up to 64 CPUs on another two machines. The big mem machine is accessed using the big mem partition, bm*, while the smaller-memory machines are accessed on high2/med2/low2.

As of February 2020, there are 31 researches who share these resources. To manage and share these resources equitably, we have created a set of rules for resource usage. When submitting jobs, if you submit to a bm* partition, please follow these rules:

  • bmh/high2: use for 1. small-ish interactive testing 2. single-core snakemake jobs that submit other jobs. 3. only if really needed: one job that uses a reasonable amount of resources of “things that I really need to not get bumped.” Things that fall into group 3 might be very long running jobs that would otherwise always be interupted on bmm/med2 or bml/low2 (e.g. > 5 days), or single jobs that need to be completed in time for a grant or presentation. If your single job on bmh/high2 will exceed 1/3 of the groups resources for either RAM or CPU, please notify the group prior to submitting this job.
  • bmm/med2: don’t submit more than 1/3 of resources at once. This counts for cpu (96 total, so max 32) and ram (1TB total, so max 333 GB).
  • bml/low2: free for all! Go hog wild! Submit to your hearts content!

Note that the bmm/bml and med2/low2 queues have access to the full cluster, not just our machines; so if farm is not being highly utilized you may be able to run more jobs faster on those nodes than on bmh/high2.

Managing system modules

  • List all available modules: module avail
  • List currently loaded modules: module list
  • Loading a module: module load <module_name>
    • Example loading GCC module: module load gcc/9.2.0
  • Unloading module: module unload <module_name>