Skip to content

06 Using Sub Workflows

Matin Nuhamunada edited this page Feb 21, 2024 · 2 revisions

Available sub workflows

Sub workflows are Snakefiles that can be run on top of the main workflow in BGCFlow. All available workflows can be shown using bgcflow run -h. This subworfklows can be executed by running:

bgcflow run --workflow {workflow name or Snakefile}

As of bgcflow_wrapper v0.3.5, these subworkflows are officially included:

  • BGC: Do comparative BGC analytics of selected antiSMASH BGC regions
  • Database: Build a duckdb database. Same as running bgcflow build database
  • Report: Build a Jupyter notebook markdown reports. Same as running bgcflow build report
  • Metabase: Serve a Metabase server. Same as running bgcflow serve --metabase
  • lsagbc: Run a population genetic analysis using lsabgc-easy pipeline
  • ppanggolin: Build a graph based pangenome and identify region of genome plasticity

Additional subworkflows that will be included in bgcflow v0.8.2:

  • Alleleome: Run Core-Alleleome to explore and analyze natural sequence variations within the Open Reading Frames (ORFs) of alleles of core genes in a species' pan-genome, both at the amino acid and nucleotide levels (Archana S. Harke et al., 2023). This can be run by providing the path to the Snakefile:
bgcflow run --workflow workflow/Alleleome

Running a comparative BGC workflows

This feature is used when you have a selection of AntiSMASH BGC regions that you want to compare. You might want to run this after finishing the main workflow

  1. Make a new project folder in config/<project_name> for that particular BGCs. You can see the example config format here: https://github.com/NBChub/bgcflow/tree/dev-0.6.1/.examples/lanthipeptide

  2. The samples csv (https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/lanthipeptide/df_antismash_6.1.1_bgc.csv). This can be edited from the previous results table (tables/df_regions_antismash_6.1.1.csv). You then needs to add this two columns:

  • source (right now just write “bgcflow” as the source)
  • gbk_path (preferably an absolute path to the antismash BGC region genbank file, you can also use your own BGCs)
  1. You can then create a project config file (https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/lanthipeptide/project_config.yaml). The latest available rules can be seen here: https://github.com/NBChub/bgcflow/blob/dev-0.6.1/workflow/rules_bgc.yaml. Here are the current rules available:
  • bigslice:
  • query-bigslice
  • bigscape
  • clinker
  • interproscan
  • mmseqs2
  1. Add the project to the global config file in config/config.yaml under the bgc_projects variable (see https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/_config_example.yaml#L27-L28): bgc_projects:
  - name: config/<project_name>/project_config.yaml

5.You can then run the subworkflow with e.g.:

bgcflow run --snakefile workflow/BGC -c 2 -n