Repli-seq processing pipeline

⚠️ For the time being, this pipeline can be run only on the graviton server: In-progress work to make the pipeline working on other machines!

About the pipeline

This pipeline is heavily inspired by the Gilbert pipeline and the shart pipeline. For any information about the internal workings of the pipeline, please check these references.

The main usage of this pipeline is to run the analysis of multiple Repli-seq samples and organize their results in a coherent way.

Installing the pipeline

Clone the repository and enter in the folder:

git clone https://github.com/CSOgroup/repliseq_pipeline.git
cd repliseq_pipeline

Install the dependencies using conda/mamba, creating a new environment (repliseq_pipeline):

conda env create -f environment.yml

Test that the pipeline works by running:

./run_repliseq_pipeline.sh.sh test_input.csv

Processing repli-seq samples

This is the main step of the pipeline.

./run_repliseq_pipeline.sh <input-samples.csv>

Input format

The run_repliseq_pipeline.sh script accepts a .csv file (with header) as input having one line for each sample to be processed. The required columns are:

sample_path: path to the sample results
raw_path: path to the sample fastq files. Fastq files are assumed to be paired-ended and located in the same folder. Read1 and Read2 are denoted by _R1_ and _R2_ inside the file names.
genome_assembly: genome assembly (hg19, mm10, etc...)
genome_sequence: path to the fasta file for the reference genome
chromsizes: path to the chromosome sizes file for the reference genome
blacklisted_regions (optional): path to a BED file containing regions that should be excluded from the analysis (like these)

You can check the test_input.csv file for reference.

Steps

The pipeline will run the following analyses:

Fastq quality control with fastqc
Read alignment, filtering, sorting and statistics (bwa, samtools)
Read deduplication (samtools)
Computing repli-seq coverage (number of reads) in equally spaced bins (10Kb), giving the output as a .bedGraph

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
run_repliseq_pipeline.sh		run_repliseq_pipeline.sh
test_input.csv		test_input.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repli-seq processing pipeline

About the pipeline

Installing the pipeline

Processing repli-seq samples

Input format

Steps

About

Releases

Packages

Languages

CSOgroup/repliseq_pipeline

Folders and files

Latest commit

History

Repository files navigation

Repli-seq processing pipeline

About the pipeline

Installing the pipeline

Processing repli-seq samples

Input format

Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages