-

ARETE: Usage

-
-

Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files.

-
+

beiko-lab/ARETE: Usage

Introduction

-

The ARETE pipeline can is designed as an end-to-end workflow manager for genome assembly, annotation, and phylogenetic analysis, beginning with read data. However, in some cases a user may wish to stop the pipeline prior to annotation or use the annotation features of the work flow with pre-existing assemblies. Therefore, ARETE allows users four use cases:

+

The ARETE pipeline can is designed as an end-to-end workflow manager for genome assembly, annotation, and phylogenetic analysis, beginning with read data. However, in some cases a user may wish to stop the pipeline prior to annotation or use the annotation features of the work flow with pre-existing assemblies. Therefore, ARETE allows users different use cases:

    -
  1. Run the full pipeline end-to-end..
  2. +
  3. Run the full pipeline end-to-end.
  4. Input a set of reads and stop after assembly.
  5. Input a set of assemblies and perform QC.
  6. Input a set of assemblies and perform annotation and taxonomic analyses.
  7. +
  8. Input a set of assemblies and perform genome clustering with PopPUNK.
  9. +
  10. Input a set of assemblies and perform phylogenomic and pangenomic analysis.

This document will describe how to perform each workflow.

+

"Running the pipeline" will show some example command on how to use these different entries to ARETE.

Samplesheet input

No matter your use case, you will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. For full runs and assembly, it has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.

@@ -183,9 +196,9 @@

Full workflow or assembly samples -

An example samplesheet has been provided with the pipeline.

+

An example samplesheet has been provided with the pipeline.

Annotation only samplesheet

-

The ARETE pipeline allows users to provide pre-existing assemblies to make use of the annotation and reporting features of the workflow. Users may use the assembly_qc entry point to perform QC on the assemblies. Note that the QC workflow does not automatically filter low quality assemblies, it simply generates QC reports! assembly and assembly_qc workflows accept the same format of sample sheet.

+

The ARETE pipeline allows users to provide pre-existing assemblies to make use of the annotation and reporting features of the workflow. Users may use the assembly_qc entry point to perform QC on the assemblies. Note that the QC workflow does not automatically filter low quality assemblies, it simply generates QC reports! annotation, assembly_qc and poppunk workflows accept the same format of sample sheet.

The sample sheet must be a 2 column, comma-seperated CSV file with header.

@@ -205,6 +218,28 @@

Annotation only samplesheet

+

An example samplesheet has been provided with the pipeline.

+

Phylogenomics and Pangenomics only samplesheet

+

The ARETE pipeline allows users to provide pre-existing assemblies to make use of the phylogenomic and pangenomic features of the workflow.

+

The sample sheet must be a 2 column, comma-seperated CSV file with header.

+ + + + + + + + + + + + + + + + + +
ColumnDescription
sampleCustom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample.
gff_file_pathFull path to GFF file for assembly or genome. File must have .gff or .gff3 file extension. These files can be the ones generated by Prokka or Bakta in ARETE's annotation subworkflow.

Reference Genome

For full workflow or assembly, users may provide a path to a reference genome in fasta format for use in assembly evaluation.

--reference_genome ref.fasta
@@ -228,15 +263,27 @@ 

Running the pipeline

.nextflow_log # Log file from Nextflow # Other nextflow hidden files, eg. history of pipeline runs and old logs.
-

As written above, the pipeline also allows users to execute only assembly or only annotation. To execute assembly (reference genome optional):

+

As written above, the pipeline also allows users to execute only assembly or only annotation.

+

Assembly Entry

+

To execute assembly (reference genome optional):

nextflow run beiko-lab/ARETE -entry assembly --input_sample_table samplesheet.csv --reference_genome ref.fasta  -profile docker
 
+

Assembly QC Entry

To execute QC on pre-existing assemblies (reference genome optional):

nextflow run beiko-lab/ARETE -entry assembly_qc --input_sample_table samplesheet.csv --reference_genome ref.fasta  -profile docker
 
-

To execute annotation of pre-existing assemblies:

+

Annotation Entry

+

To execute annotation of pre-existing assemblies (PopPUNK model can be either bgmm, dbscan, refine, threshold or lineage):

nextflow run beiko-lab/ARETE -entry annotation --input_sample_table samplesheet.csv --poppunk_model bgmm -profile docker
 
+

PopPUNK Entry

+

To execute annotation of pre-existing assemblies (PopPUNK model can be either bgmm, dbscan, refine, threshold or lineage):

+
nextflow run beiko-lab/ARETE -entry poppunk --input_sample_table samplesheet.csv --poppunk_model bgmm -profile docker
+
+

Phylogenomics and Pangenomics Entry

+

To execute phylogenomic and pangenomics analysis on pre-existing assemblies:

+
nextflow run beiko-lab/ARETE -entry phylogenomics --input_sample_table samplesheet.csv -profile docker
+

Updating the pipeline

When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:

nextflow pull beiko-lab/ARETE
@@ -251,9 +298,9 @@ 

Core Nextflow arguments

-profile

Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments.

-

Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below.

+

Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud) - see below.

-

We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.

+

We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility.

The pipeline also dynamically loads configurations from https://github.com/nf-core/configs when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the nf-core/configs documentation.

Note that multiple profiles can be loaded, for example: -profile test,docker - the order of arguments is important! @@ -264,35 +311,30 @@

-profile

docker

  • A generic configuration profile to be used with Docker
  • -
  • Pulls software from Docker Hub: nfcore/arete
  • singularity

  • podman

    • A generic configuration profile to be used with Podman
    • -
    • Pulls software from Docker Hub: nfcore/arete
  • shifter

    • A generic configuration profile to be used with Shifter
    • -
    • Pulls software from Docker Hub: nfcore/arete
  • charliecloud

  • @@ -307,6 +349,7 @@

    -profile

    test

    • A profile with a complete configuration for automated testing
    • +
    • Can run in personal computers with at least 6GB of RAM and 2 CPUs
    • Includes links to test data so needs no other parameters
  • @@ -376,14 +419,15 @@

    Nextflow memory requirements

    - - - - - + + + + +