Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for rule grouping to improve workflow organization #3242

Open
jiangyun-fun opened this issue Dec 31, 2024 · 0 comments
Open

Add support for rule grouping to improve workflow organization #3242

jiangyun-fun opened this issue Dec 31, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jiangyun-fun
Copy link

Is your feature request related to a problem? Please describe.

When working with large Snakemake workflows (100+ rules), it becomes challenging
to manage and understand the relationships between rules. Currently, we use
rules.foo.input and rules.foo.output to reference file dependencies, but
this approach focuses on individual rules rather than logical groupings of
related steps. For example, using groups.database_annotation.output would be
more semantically meaningful than rules.step3_postprocess.output when
referring to the final output of a database annotation process.

Describe the solution you'd like

I propose introducing a group keyword(or similar) to define logical groups of
related rules. This grouping would be purely organizational and wouldn't affect
workflow execution. Example:

rule step1_preprocess:
    """Preprocess input data"""
    input: "a.txt"
    output: "b.txt"
    shell: "echo {input} > {output}"

rule step2_database_annotation:
    """Perform database annotation"""
    input: "b.txt"
    output: "c.txt"
    shell: "echo {input} > {output}"

rule step3_postprocess:
    """Post-process annotation results"""
    input: "c.txt"
    output: "d.txt"
    shell: "echo {input} > {output}"

# Define a logical group of related rules
group database_annotation:
    input: rules.step1_preprocess.input
    output: rules.step3_postprocess.output

# Reference the group instead of individual rules
rule step4:
    input: groups.database_annotation.output
    output: "e.txt"
    shell: "echo {input} > {output}"

Some potential benefits of this feature:

  1. Simplified DAG Visualization: Groups can collapse related rules in the
    DAG visualization, making it easier to understand the high-level workflow
    structure.
  2. Improved Readability: Groups provide semantic meaning to collections of
    rules, making the workflow more self-documenting.
  3. Alternative Modularization: Complements the existing include keyword by
    providing another way to organize workflow components.
@jiangyun-fun jiangyun-fun added the enhancement New feature or request label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant