Skip to content

Commit

Permalink
Merge pull request #105 from nf-core/dev
Browse files Browse the repository at this point in the history
Dev > Master for v1.0.0 release
  • Loading branch information
drpatelh authored Jun 1, 2020
2 parents 259cad1 + 9a46ba7 commit 7422b0d
Show file tree
Hide file tree
Showing 44 changed files with 18,708 additions and 743 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: nf-core AWS test
# This workflow is triggered on PRs to the master branch.
# It runs the -profile 'test_full' on AWS batch

on:
push:
branches:
- master
release:
types: [published]

jobs:
run-awstest:
name: Run AWS test
runs-on: ubuntu-latest
steps:
- name: Setup Miniconda
uses: goanpeca/[email protected]
with:
auto-update-conda: true
python-version: 3.7
- name: Install awscli
run: conda install -c conda-forge awscli
- name: Start AWS batch job
env:
AWS_ACCESS_KEY_ID: ${{secrets.AWS_KEY_ID}}
AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_KEY_SECRET}}
TOWER_ACCESS_TOKEN: ${{secrets.TOWER_ACCESS_TOKEN}}
run: | # Submits job to AWS batch using a 'nextflow-big' instance. Setting JVM options to "-XX:+UseG1GC" for more efficient garbage collection when staging remote files.
aws batch submit-job --region eu-west-1 --job-name nf-core-viralrecon --job-queue 'default-8b3836e0-5eda-11ea-96e5-0a2c3f6a2a32' --job-definition nextflow-4GiB --container-overrides '{"command": ["nf-core/viralrecon", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://nf-core-awsmegatests/viralrecon/results-'"${GITHUB_SHA}"' -w s3://nf-core-awsmegatests/viralrecon/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}, {"name": "NXF_OPTS", "value": "-XX:+UseG1GC"}]}'
70 changes: 67 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,73 @@ jobs:
- name: Pull docker image
run: |
docker pull nfcore/viralrecon:dev
docker tag nfcore/viralrecon:dev nfcore/viralrecon:dev
docker tag nfcore/viralrecon:dev nfcore/viralrecon:1.0.0
- name: Run pipeline with test data
run: |
# TODO nf-core: You can customise CI pipeline run tests as required
# (eg. adding multiple test runs with different parameters)
nextflow run ${GITHUB_WORKSPACE} -profile test,docker
parameters:
env:
NXF_VER: '19.10.0'
NXF_ANSI_LOG: false
runs-on: ubuntu-latest
strategy:
matrix:
parameters: [--skip_adapter_trimming, --skip_markduplicates, --skip_variants, --skip_amplicon_trimming, --skip_kraken2, --skip_assembly]
steps:
- uses: actions/checkout@v2
- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Pull docker image
run: |
docker pull nfcore/viralrecon:dev
docker tag nfcore/viralrecon:dev nfcore/viralrecon:1.0.0
- name: Run pipeline with test amplicon data with various options
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker ${{ matrix.parameters }}
test_sra:
env:
NXF_VER: '19.10.0'
NXF_ANSI_LOG: false
runs-on: ubuntu-latest
strategy:
matrix:
parameters: [--skip_sra, '']
steps:
- uses: actions/checkout@v2
- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Pull docker image
run: |
docker pull nfcore/viralrecon:dev
docker tag nfcore/viralrecon:dev nfcore/viralrecon:1.0.0
- name: Run pipeline with minimal data via SRA ids and various options
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_sra,docker ${{ matrix.parameters }}
test_sispa:
env:
NXF_VER: '19.10.0'
NXF_ANSI_LOG: false
runs-on: ubuntu-latest
strategy:
matrix:
parameters: [--gff false, '']
steps:
- uses: actions/checkout@v2
- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Pull docker image
run: |
docker pull nfcore/viralrecon:dev
docker tag nfcore/viralrecon:dev nfcore/viralrecon:1.0.0
- name: Run pipeline with minimal SISPA data and various options
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_sispa,docker ${{ matrix.parameters }}
33 changes: 27 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,35 @@
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## v1.0dev - [date]
## [1.0.0] - 2020-06-01

Initial release of nf-core/viralrecon, created with the [nf-core](http://nf-co.re/) template.

### `Added`
This pipeline is a re-implementation of the [SARS_Cov2_consensus-nf](https://github.com/BU-ISCIII/SARS_Cov2_consensus-nf) and [SARS_Cov2_assembly-nf](https://github.com/BU-ISCIII/SARS_Cov2_assembly-nf) pipelines initially developed by [Sarai Varona](https://github.com/svarona) and [Sara Monzon](https://github.com/saramonzon) from [BU-ISCIII](https://github.com/BU-ISCIII). Porting both of these pipelines to nf-core was an international collaboration between numerous contributors and developers, led by [Harshil Patel](https://github.com/drpatelh) from the [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London. We appreciated the need to have a portable, reproducible and scalable pipeline for the analysis of COVID-19 sequencing samples and so the Avengers Assembled!

### `Fixed`
### Pipeline summary

### `Dependencies`

### `Deprecated`
1. Download samples via SRA, ENA or GEO ids ([`ENA FTP`](https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html), [`parallel-fastq-dump`](https://github.com/rvalieris/parallel-fastq-dump); *if required*)
2. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html); *if required*)
3. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
4. Adapter trimming ([`fastp`](https://github.com/OpenGene/fastp))
5. Variant calling
1. Read alignment ([`Bowtie 2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml))
2. Sort and index alignments ([`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
3. Primer sequence removal ([`iVar`](https://github.com/andersen-lab/ivar); *amplicon data only*)
4. Duplicate read marking ([`picard`](https://broadinstitute.github.io/picard/); *removal optional*)
5. Alignment-level QC ([`picard`](https://broadinstitute.github.io/picard/), [`SAMtools`](https://sourceforge.net/projects/samtools/files/samtools/))
6. Choice of multiple variant calling and consensus sequence generation routes ([`VarScan 2`](http://dkoboldt.github.io/varscan/), [`BCFTools`](http://samtools.github.io/bcftools/bcftools.html), [`BEDTools`](https://github.com/arq5x/bedtools2/) *||* [`iVar variants and consensus`](https://github.com/andersen-lab/ivar) *||* [`BCFTools`](http://samtools.github.io/bcftools/bcftools.html), [`BEDTools`](https://github.com/arq5x/bedtools2/))
* Variant annotation ([`SnpEff`](http://snpeff.sourceforge.net/SnpEff.html), [`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html))
* Consensus assessment report ([`QUAST`](http://quast.sourceforge.net/quast))
6. _De novo_ assembly
1. Primer trimming ([`Cutadapt`](https://cutadapt.readthedocs.io/en/stable/guide.html); *amplicon data only*)
2. Removal of host reads ([`Kraken 2`](http://ccb.jhu.edu/software/kraken2/))
3. Choice of multiple assembly tools ([`SPAdes`](http://cab.spbu.ru/software/spades/) *||* [`metaSPAdes`](http://cab.spbu.ru/software/meta-spades/) *||* [`Unicycler`](https://github.com/rrwick/Unicycler) *||* [`minia`](https://github.com/GATB/minia))
* Blast to reference genome ([`blastn`](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch))
* Contiguate assembly ([`ABACAS`](https://www.sanger.ac.uk/science/tools/pagit))
* Assembly report ([`PlasmidID`](https://github.com/BU-ISCIII/plasmidID))
* Assembly assessment report ([`QUAST`](http://quast.sourceforge.net/quast))
* Call variants relative to reference ([`Minimap2`](https://github.com/lh3/minimap2), [`seqwish`](https://github.com/ekg/seqwish), [`vg`](https://github.com/vgteam/vg), [`Bandage`](https://github.com/rrwick/Bandage))
* Variant annotation ([`SnpEff`](http://snpeff.sourceforge.net/SnpEff.html), [`SnpSift`](http://snpeff.sourceforge.net/SnpSift.html))
7. Present QC and visualisation for raw read, alignment, assembly and variant calling results ([`MultiQC`](http://multiqc.info/))
103 changes: 103 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# nf-core/viralrecon: Citations

## [nf-core](https://www.ncbi.nlm.nih.gov/pubmed/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ReadCube: [Full Access Link](https://rdcu.be/b1GjZ).
## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
## Pipeline tools

* [ABACAS](https://www.ncbi.nlm.nih.gov/pubmed/19497936/)
> Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009 Aug 1;25(15):1968-9. doi: 10.1093/bioinformatics/btp347. Epub 2009 Jun 3. PubMed PMID: 19497936; PubMed Central PMCID: PMC2712343.
* [Bandage](https://www.ncbi.nlm.nih.gov/pubmed/26099265)
> Wick R.R., Schultz M.B., Zobel J. & Holt K.E. Bandage: interactive visualisation of de novo genome assemblies. Bioinformatics, 31(20), 3350-3352. doi: 10.1093/bioinformatics/btv383. PubMed PMID: 26099265; PubMed Central PCMID: PMC4595904.
* [BCFtools](https://www.ncbi.nlm.nih.gov/pubmed/21903627/)
> Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8. PubMed PMID: 21903627; PubMed Central PMCID: PMC3198575.
* [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/)
> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.
* [BLAST](https://www.ncbi.nlm.nih.gov/pubmed/20003500/)
> Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421. PubMed PMID: 20003500; PubMed Central PMCID: PMC2803857.
* [Bowtie 2](https://www.ncbi.nlm.nih.gov/pubmed/22388286/)
> Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923. PubMed PMID: 22388286; PubMed Central PMCID: PMC3322381.
* [Cutadapt](http://dx.doi.org/10.14806/ej.17.1.200)
> Marcel, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, [S.l.], v. 17, n. 1, p. pp. 10-12, may 2011. ISSN 2226-6089. doi: 10.14806/ej.17.1.200.
* [fastp](https://www.ncbi.nlm.nih.gov/pubmed/30423086/)
> Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560. PubMed PMID: 30423086; PubMed Central PMCID: PMC6129281.
* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

* [iVar](https://www.ncbi.nlm.nih.gov/pubmed/30621750/)
> Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, Tan AL, Paul LM, Brackney DE, Grewal S, Gurfield N, Van Rompay KKA, Isern S, Michael SF, Coffey LL, Loman NJ, Andersen KG. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019 Jan 8;20(1):8. doi: 10.1186/s13059-018-1618-7. PubMed PMID: 30621750; PubMed Central PMCID: PMC6325816.
* [Kraken 2](https://www.ncbi.nlm.nih.gov/pubmed/31779668/)
> Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0. PubMed PMID: 31779668; PubMed Central PMCID: PMC6883579.
* [minia](https://www.ncbi.nlm.nih.gov/pubmed/24040893/)
> Chikhi R, Rizk G. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol Biol. 2013 Sep 16;8(1):22. doi: 10.1186/1748-7188-8-22. PubMed PMID: 24040893; PubMed Central PMCID: PMC3848682.
* [Minimap2](https://www.ncbi.nlm.nih.gov/pubmed/29750242/)
> Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PubMed PMID: 29750242; PubMed Central PMCID: PMC6137996.
* [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/)
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
* [parallel-fastq-dump](https://github.com/rvalieris/parallel-fastq-dump)

* [picard-tools](http://broadinstitute.github.io/picard)

* [QUAST](https://www.ncbi.nlm.nih.gov/pubmed/23422339/)
> Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19. PubMed PMID: 23422339; PubMed Central PMCID: PMC3624806.
* [R](https://www.R-project.org/)
> R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
* [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/)
> Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
* [seqwish](https://github.com/ekg/seqwish)

* [SnpEff](https://www.ncbi.nlm.nih.gov/pubmed/22728672/)
> Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012 Apr-Jun;6(2):80-92. doi: 10.4161/fly.19695. PubMed PMID: 22728672; PubMed Central PMCID: PMC3679285.
* [SnpSift](https://www.ncbi.nlm.nih.gov/pubmed/22435069/)
> Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, Lu X. Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift. Front Genet. 2012 Mar 15;3:35. doi: 10.3389/fgene.2012.00035. eCollection 2012. PubMed PMID: 22435069; PubMed Central PMCID: PMC3304048.
* [SPAdes](https://www.ncbi.nlm.nih.gov/pubmed/24093227/)
> Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, Lapidus A, Prjibelski AD, Pyshkin A, Sirotkin A, Sirotkin Y, Stepanauskas R, Clingenpeel SR, Woyke T, McLean JS, Lasken R, Tesler G, Alekseyev MA, Pevzner PA. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol. 2013 Oct;20(10):714-37. doi: 10.1089/cmb.2013.0084. PubMed PMID: 24093227; PubMed Central PMCID: PMC3791033.
* [SRA Toolkit](http://ncbi.github.io/sra-tools/)

* [Trimmomatic](https://www.ncbi.nlm.nih.gov/pubmed/24695404/)
> Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014 Aug 1;30(15):2114-20. doi: 10.1093/bioinformatics/btu170. Epub 2014 Apr 1. PubMed PMID: 24695404; PubMed Central PMCID: PMC4103590.
* [Unicycler](https://www.ncbi.nlm.nih.gov/pubmed/28594827/)
> Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun. PubMed PMID: 28594827; PubMed Central PMCID: PMC5481147.
* [VarScan 2](https://www.ncbi.nlm.nih.gov/pubmed/22300766/)
> Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012 Mar;22(3):568-76. doi: 10.1101/gr.129684.111. Epub 2012 Feb 2. PubMed PMID: 22300766; PubMed Central PMCID: PMC3290792.
* [vg](https://www.ncbi.nlm.nih.gov/pubmed/30125266/)
> Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET, Jones W, Garg S, Markello C, Lin MF, Paten B, Durbin R. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol. 2018 Oct;36(9):875-879. doi: 10.1038/nbt.4227. Epub 2018 Aug 20. PubMed PMID: 30125266; PubMed Central PMCID: PMC6126949.
## Software packaging/containerisation tools

* [Bioconda](https://www.ncbi.nlm.nih.gov/pubmed/29967506/)
> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
* [Anaconda](https://anaconda.com)
> Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
* [Singularity](https://www.ncbi.nlm.nih.gov/pubmed/28494014/)
> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)
11 changes: 9 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,15 @@ LABEL authors="Sarai Varona and Sara Monzon" \
COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a

# For Bandage: otherwise it complains about missing libGL.so.1
RUN apt-get install -y libgl1-mesa-glx && apt-get clean -y

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/nf-core-viralrecon-1.0dev/bin:$PATH
ENV PATH /opt/conda/envs/nf-core-viralrecon-1.0.0/bin:$PATH

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name nf-core-viralrecon-1.0dev > nf-core-viralrecon-1.0dev.yml
RUN conda env export --name nf-core-viralrecon-1.0.0 > nf-core-viralrecon-1.0.0.yml

# Instruct R processes to use these empty files instead of clashing with a local version
RUN touch .Rprofile
RUN touch .Renviron
Loading

0 comments on commit 7422b0d

Please sign in to comment.