Skip to content

Commit

Permalink
Merge pull request #3 from jbloomlab/feat/profiles
Browse files Browse the repository at this point in the history
Adds a section on snakemake profiles
  • Loading branch information
WillHannon-MCB authored Oct 28, 2024
2 parents d2eb3ae + 73b529e commit b6b7b9c
Showing 1 changed file with 67 additions and 0 deletions.
67 changes: 67 additions & 0 deletions sections/creating-workflows-and-pipelines/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,70 @@ We primarily use `Snakemake` in the Bloom lab. `Snakemake` is a workflow managem
- **Actions (Shell Commands or Scripts)**: The commands to execute.

`Snakemake` automatically builds a workflow based on these rules, figuring out the order of execution by analyzing the dependencies. The best way to learn `Snakemake` is by following the [tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) in it's documentation.

#### Using snakemake with SLURM

To configure how snakemake interacts with SLURM, newer versions of snakemake (>8.0) require you to use the [profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles) system. This replaces cluster configuration files. Conceptually these work similarly to the old configuration files in that they allow you to configure how jobs are submitted to SLURM both globally and at a per-rule level. This system is still under active development so the advice here is current as of snakemake v8.24.

##### Setup

To get started, first ensure that you have at least snakemake v8+ installed. You can check the Snakemake version with `snakemake --version`. In the latest version of the profile system, interaction with the job schedulers is abstracted away into plugins. For SLURM, you will need to install the [snakemake-executor-plugin-slurm](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html).

##### Profiles configuration

The profiles are simply YAML files that specify job scheduler-specific parameters. In theory, you could have separate profiles depending on where the pipeline is being run but in practice we'll mostly be dealing with SLURM. An example of a fairly minimal profile is below:

```
executor: slurm
default-resources:
- runtime=720
jobs: 50
use-conda: true
```

You can also specify rule-specific resource requirements in the profile configuration. For example, the code below would ask for 16 cpus anytime it submits a job for the `bigjob` rule.

```
set-resources:
bigjob:
cpus_per_task: 16
```

:::tip
These can also be specified in the rules themselves as demonstrated in the section below. However, if they are set in both places, the value specified in the profile will override what is set in the rule.
:::

Further details on configuring this SLURM plugin can be found [here](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html).

##### Rule configuration

Rule specific parameters can also be set in the rule definition itself. This can be particularly useful for long running steps (e.g. alignment) which may be able to use multiple cores to speed things up. These can be set with the `resources` keyword:

```
rule bigjob
input: ...
output: ...
threads: 8
resources:
mem_mb=16000,
cpus_per_task=8
shell: ...
```

:::warning
Currently, if you are submitting the snakemake job as a script itself, you must specify both `threads` and `cpus_per_task` or else this will not be properly propagated to SLURM. There is some ongoing discussion of this [issue](https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/141) so it may be resolved at some point in the future.
:::

##### Submission script

The main script to submit your pipeline to SLURM should look something like the following:

```
#!/bin/bash
#SBATCH -c 1
#SBATCH --mem=1G
snakemake --profile profiles/ -s workflow/Snakefile
```

This is just asking for 1 core with 1GB of memory for the main job of running the snakemake process, which will then spawn separate jobs for each rule that needs to be run. Therefore, we can keep the resource request here modest. Then in the `snakemake` command itself, you simply point it at your profiles configuration and your main `Snakefile` respectively.

0 comments on commit b6b7b9c

Please sign in to comment.