Skip to content

Joon-Klaps/lasvphylo

Repository files navigation

LASV-phylo V1.0

Nextflow run with conda run with docker run with singularity

Introduction

nf-core/lasvphylo is a bioinformatics "best-practice" analysis pipeline for creating a high quality alignment of LASV segements and a maximum likelihood tree. I use this pipeline for a fast phylogentic analysis of newly identified LASV cases/outbreaks to determine the rise of new clades or interesting mutations.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Pipeline summary

lasvphylo-workflow

  1. Orient and isolate the genes of LASV (MAFFT)
  2. Remove the reference sequences from the alignment (SeqTk)
  3. Concatenate the genes to each other (SeqKit)
  4. Align the concatenated genes to an existing alignment (MUSCLE)
  5. Perform a constrained tree search using a previous tree and new sequences (IQ-TREE)

Quick Start

  1. Install Nextflow (>=22.10.1)

  2. Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).

    • The pipeline comes with config profiles called docker, singularity, podman, shifter, charliecloud and conda which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.
    • If you are using singularity, please use the nf-core download command to download images first, before running the pipeline. Setting the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
    • If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs.
  3. Download the pipeline (git clone) and give in the correct variables :

    nextflow run lasvphylo/main.nf -profile YOURPROFILE -c <CONFIG> --outdir <OUTDIR>

    Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. You can chain multiple config profiles in a comma-separated string. In this config file you'll have to specify the following variables:

    • input_S : S segment of the new sequence(s)
    • input_L : L segment of the new sequence(s)
    • input_id_S : id for S segment files
    • input_id_L : id for L segment files
    • alignment_S : Previous alignment of S segment
    • alignment_L : Previous alignment of L segment
    • tree_S : Previous tree of S segment
    • tree_L : Previous tree of L segment