smk_emseq - A Snakemake-based workflow for EMseq data processing

📌 Acknowledgement/Disclaimer

This workflow is heavily based on https://github.com/seb-mueller/snakemake-bisulfite by @seb-mueller. I basically just streamlined/tailored the workflow to my needs.

❗ Needed/used software

The workflow is based on the following tools:

fastQC
bbduk part of the BBtools suite
bismark
samtools

The separate installation of the tools is not necessary, they are installed 'on the fly' (see Usage below).

Snakemake should be installed as outlined in its documentation for instance using conda/mamba. It is recommended to create a dedicated conda environment for Snakemake.

📘 Description of the workflow

The workflow can be equally used for the analysis of data derived from NEB's EMseq approach as well as data derived from WGBS (whole-genome bisulfite sequencing). Working with environmental microbes, we made the experience that the usage of commonly available bisulfite treatment kits lead to a strong loss of DNA. As a result, we ended up giving EMseq a try. For details about the methodology have a look at NEBs EMseq paper.

A reference genome stored in resources/ is bisulfite-treated in silico with bismark. Paired-end sequencing data (stored in data/) is subjected to quality-control and adapter-trimming using bbduk. Quality reports are written using fastQC before and after trimming.

Read pairs are subsequently mapped onto the bisulfite-treated genome, alignments with identical mapping positions are removed. Methylations are extracted for all three contexts (CpG, CHH, CHX) and used to generate a .bedGraph and coverage file. The latter can be used for downstream methylation analysis.

The below DAG graph outlines the different processes of the workflow.

🔨 Usage

Start by cloning the repository and move into respective directory.

git clone https://github.com/wegnerce/smk_emseq.git
cd smk_emseq

Place paired sequence data (R{1,2}.fastq.gz) in data/. The repository contains two pairs of exemplary files (La.1_R1.fastq.gz + La.1_R2.fastq.gz & Nd.1_R1.fastq.gz + Nd.1_R2.fastq.gz).

config/ contains, besides from the configuration of the workflow (config/config.yaml), a tab-separated table samples.tsv, which contains a list of all datasets, one per line. The workflow expects *.fastq.gzfiles and R{1,2} as prefixes for forward and reverse read files.

From the root directory of the workflow, processing the data can then be started.

# --use-conda makes sure that needed tools are installed based
# on the requirements specified in the respective *.yaml in /envs
snakemake  --use-conda

The directory structure of the workflow is shown below:

├── config
│   ├── config.yaml
│   └── samples.tsv
├── dag.svg
├── data
│   ├── La.1_R1.fastq.gz
│   ├── La.1_R2.fastq.gz
│   ├── Nd.1_R1.fastq.gz
│   └── Nd.1_R2.fastq.gz
├── logs
├── LICENSE
├── README.md
├── resources
│   ├── adapters.fa
│   └── RHAL1_chromosome_plasmid.fa
├── results
└── workflow
    ├── envs
    │   ├── bbmap.yaml
    │   ├── bismark.yaml
    │   ├── fastqc.yaml
    │   └── samtools.yaml
    ├── rules
    │   ├── bismark.smk
    │   ├── qc.smk
    │   └── sort.smk
    └── Snakefile

Output from the different steps of the workflow are stored in /results and /logs.

The resulting *.cov.gz files (results/04_coverage/*.cov.gz) can be used for downstream methylation analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smk_emseq - A Snakemake-based workflow for EMseq data processing

📌 Acknowledgement/Disclaimer

❗ Needed/used software

📘 Description of the workflow

🔨 Usage

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
data		data
resources		resources
workflow		workflow
LICENSE		LICENSE
README.md		README.md
dag.svg		dag.svg

License

wegnerce/smk_emseq

Folders and files

Latest commit

History

Repository files navigation

smk_emseq - A Snakemake-based workflow for EMseq data processing

📌 Acknowledgement/Disclaimer

❗ Needed/used software

📘 Description of the workflow

🔨 Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages