Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] adding snakemake --profile #33

Open
wants to merge 7 commits into
base: latest
Choose a base branch
from
Open

Conversation

mr-eyes
Copy link
Member

@mr-eyes mr-eyes commented Jan 23, 2022

Resolves #32

@mr-eyes mr-eyes changed the title adding snakemake --profile [MRG] adding snakemake --profile Jan 23, 2022
@bluegenes
Copy link
Member

bluegenes commented Mar 2, 2022

thanks Mo! It would be really great to provide an example snakemake rule where you set time, partition, etc within the snakemake rule, so folks can see how that happens :)

I have some examples over here if you want to just swipe: http://bluegenes.github.io/hpc-snakemake-tips/
e.g. -

rule quality_trim:
    input: 
        reads="rnaseq/raw_data/{sample}.fq.gz",
        adapters="TruSeq2-SE.fa",
    output: "rnaseq/quality/{sample}.qc.fq.gz"
    threads: 1
    resources:
        mem_mb=1000,
        runtime=10
    shell:
        """
        trimmomatic SE {input.reads} {output} \
        ILLUMINACLIP:{input.adapters}:2:0:15 \
        LEADING:2 TRAILING:2 SLIDINGWINDOW:4:2 MINLEN:25    
        """

@bluegenes
Copy link
Member

One more thought -- I see the default jobs is 100 and default partition is med2 -- can we change these to follow our recommended queue usage?

options: default low2 to keep default jobs at 100, or default jobs <= 30 on med2.
alternatively (or in addition), you can add resources: [cpus=30, mem_mb=350000] to limit cpu and memory allocation. The one caveat is that we don't need these limits for low2 or bml, so they may be annoying to have in the cluster profile when running on those queues.

@SichongP
Copy link
Contributor

SichongP commented Mar 2, 2022

A little trick that worked for me is using cpus_med2 and cpus_bmm to separate resource use on different partitions. Then I only set resource limit for med2 and bmm partition using resources: [cpus_med2=30, cpus_bmm=30]. This way snakemake will limit resources usage on medium priority partitions but won't restrict low partition usage.

Of course you will have to set cpus_med2 or cpus_low2 in your resource keyword for each rule instead of default parameter cpus.

As a bonus, you can use this function to automate which partition snakemake should submit your job to:

def getPartition(wildcards, resources):
    # Determine partition for each rule based on resources requested
    for key in resources.keys():
        if 'bmm' in key and int(resources['cpus_bmm']) > 0:
            return 'bmm'
        elif 'med' in key and int(resources['cpus_med']) > 0:
            return 'med2'
    if int(resources['mem_mb']) / int(resources['cpus']) > 4000:
        return 'bml'
    else:
        return 'low2'

And then in rule definition:

...
params: partition=getPartition
...

In my profile, I set following default resources:

default-resources: [cpus_bmm=0, cpus_med2=0, cpus=1, mem_mb_bmm=0, mem_mb_med2=0,, mem_mb=2000, time_min=120, node=1, task=1, download=0]

@mr-eyes
Copy link
Member Author

mr-eyes commented Mar 3, 2022

One more thought -- I see the default jobs is 100 and default partition is med2 -- can we change these to follow our recommended queue usage?

options: default low2 to keep default jobs at 100, or default jobs <= 30 on med2. alternatively (or in addition), you can add resources: [cpus=30, mem_mb=350000] to limit cpu and memory allocation. The one caveat is that we don't need these limits for low2 or bml, so they may be annoying to have in the cluster profile when running on those queues.

Thanks, @bluegenes for the suggestions. I have edited the default parameters for partition. I don't think setting the default mem_mb to 350GB is a good idea because that will consume a lot of memory for the total running job on default parameters. Same with the cpu. What do you think?

@mr-eyes
Copy link
Member Author

mr-eyes commented Mar 3, 2022

A little trick that worked for me is using cpus_med2 and cpus_bmm to separate resource use on different partitions.

That's a cool workaround, thanks for sharing! I think controlling the default parameters for each partition separately can also work using Python functions with the partition name as input.

@bluegenes
Copy link
Member

bluegenes commented Mar 3, 2022

I don't think setting the default mem_mb to 350GB is a good idea because that will consume a lot of memory for the total running job on default parameters. Same with the cpu. What do you think?

As I've used it,resources at the top level doesn't actually allocate that memory (or cpu/etc), it just limits the total amount you can allocate at once. The resources within each rule does try to allocate that particular amount of memory/etc, as does default-resources which is used to fill in resources for rules missing any of the default resource parameters.
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#resources

That's a cool workaround, thanks for sharing! I think controlling the default parameters for each partition separately can also work using Python functions with the partition name as input.

This sounds like an excellent workaround. If we can set limits for med, high partitions by default and no limits for low, that would be really helpful. Of course for rare cases (deadlines, huge jobs, etc), users can override the limits by setting different ones on the command line with, e.g. --resources mem_mb=XX.

@ctb
Copy link
Member

ctb commented Mar 4, 2022

this is all greek to me. Maybe we need (or could use) a lab meeting tutorial/demo on cool farm/snakemake hacks...

@bluegenes
Copy link
Member

bluegenes commented Mar 5, 2022

this is all greek to me. Maybe we need (or could use) a lab meeting tutorial/demo on cool farm/snakemake hacks...

😂 I ran an ILLO on farm/snakemake (w/profiles and resource limitation hacks!) back in Aug 2020, but we could do another/up-to-date one? @mr-eyes, interested in doing this with me? Partition-specific allocation using this profile is already making my life better! @SichongP, I would also love your feedback on what we come up with if you have time, in case you have more/different tricks you use.

Back when profiles were newer, the hard part was figuring out how to introduce them without leaving folks behind who are newer to snakemake. But now I think profile setup is something we should just help everyone do as soon as possible, since it makes so many things easier (and doesn't add much complication, aside from setup).

ILLO from 8/24/2020 - http://bluegenes.github.io/hpc-snakemake-tips/
My practices have changed a little since then, but not a ton. I think for the next one, I would start with profiles and assume snakemake conda environment management :)

@mr-eyes
Copy link
Member Author

mr-eyes commented Mar 5, 2022

@mr-eyes, interested in doing this with me?

Sure!

Nice! Thanks, Tessa!

Co-authored-by: Tessa Pierce Ward <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

snakemake --cluster-config is deprecated.
4 participants