DADA2 pipeline used for the La Romaine project

This repository stores the scripts and files used to process the 16S rRNA sequencing reads of the La Romaine project. The project is part of the Industrial Research Chair in Carbon Biogeochemistry in Boreal Aquatic systems (CarBBAS Chair) led by Paul A. del Giorgio.

Workflow overview

Samples were sequenced with a Illumina (MiSeq and/or HiSeq) paired-end sequencing platform at Genome Quebec. In the given workflow, we process reads sequenced for the V4 region of the 16S rRNA.

In brief, the workflow was largely inspired by the official DADA2 tutorials. Our pipeline is mainly designed for big-data, however, it is similarly applicable to smaller datasets.

We start by renaming sample file names, to simplify the names returned by our sequencing service. Followed by removing the primers using cutadapt. After quality checking, the samples are run through the dada step on a plate basis with 'pseudo-pooling' enabled to ensure the retention of singleton reads within samples. Pooling only removes singletons within sampling campaigns. We cluster the dervied ASVs by a 99% similarity threshold into Operational taxonomic units (OTUs), to improve DNA-RNA matching. The level of resolution of ASVs are believed to be too high to be able to match potential DNA and RNA reads of the same taxon.

Samples in 2015 showed lower quality and hence, many reads are being lost through the pipeline. Hence, maxEE was increased to maxEE = c(5,5) in the filterAndTrim step. This improved the read retention for most samples.

The original workflow was designed for samples that were split in several folders by years (e.g. 2015, 2016, 2017) and thus makes use of lists frequently. This is now deprecated and all samples are within a single folder. The list apply version is still inside the scripts, but commented out with #.

Steps that required higher computing power and did not successfully finish on our lab computer with 60 GB RAM and 12 cores were run on Compute Candada's Cedar supercomputer. The equivalent scripts are provided and indicated as *_slurm.R and the corresponding .sh files are available as well.

Raw sequences can be found on SRA under the Bioproject number: PRJNA693020. Intermediate processing files are stored on Zenodo: 10.5281/zenodo.4611421.

Files are being uploaded as manuscripts are published.

Currently available files:

2015-2017: 16S rRNA gene and transcripts (DNA and cDNA) in spring, summer, autumn (shallow sequencing)
- Part of the manuscript: Stadler M, del Giorgio PA. Terrestrial connectivity , upstream aquatic history and seasonality shape bacterial community assembly within a large boreal aquatic network. ISME J 2022; 16: 937–947.
- All files part of this paper are named with: 'paper1'

Final scripts used:

Scripts/0_clean_unify_files.R
Scripts/1_remove_primers_cutadapt.R
Scripts/2_quality_filtering.R OR 2_quality_filt_slurm.R
Scripts/3_learnError_dada.R OR 3_learnError_dada_slurm.R
Scripts/4_chimrm_assigntax.R
Scripts/5_clusterOTU_slurm.R
Scripts/6_prepare_for_SRA.R

Final data files of all La Romaine data run together:

Cleaned sequences, assigned ASVs (run through DADA2): Objects/nochim_seqtab_2015-2018.rds
Assigned taxonomy using GTDB database v.95: Objects/taxtav_gtdb_r95_2015-18.rds
Distance matrix calculated as part of 99% OTU clustering: Objects/distmat_decipher_2015-18.rds
Sequences aligned for 99% OTU clustering: Objects/aln_decipher_2015-18.rds

Attribution:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Please cite the Zenodo DOI if you use either the data or the scripts on this Github repository.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
DB		DB
Meta		Meta
Output		Output
Scripts		Scripts
.gitignore		.gitignore
DADA2.Rproj		DADA2.Rproj
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DADA2 pipeline used for the La Romaine project

Workflow overview

Attribution:

About

Releases

Packages

Languages

License

masumistadler/LaRomaine_DADA2

Folders and files

Latest commit

History

Repository files navigation

DADA2 pipeline used for the La Romaine project

Workflow overview

Attribution:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages