GitHub - MattBashton/aloft: Aloft: a parallel implementation of the Arriba workflow

Aloft

Aloft is a Bash script script for running the excellent Arriba RNA-Seq fusion detector in parallel, it takes charge of:

Downloading and compiling an Arriba release (if required).
Downloading references and annotation (via Arriba's download_references.sh)
Producing STAR indexes if need be.
Creating the known fusion events file from the COSMIC [Complete Fusion Export] (https://cancer.sanger.ac.uk/cosmic/download), and fixing none HGNC compliant gene symbols.
Running the STAR aligner in parallel on a set of samples via GNU parallel.
Running Arriba in parallel - as above.
Running SAMtools in parallel - enables viewing of STAR derived BAM with IGV.
Running Arriba's outstanding plotting script draw_fusions.R in parallel over all samples as with previous stages.

This script was inspired by the demo script run_arriba.sh supplied with Arriba. Aloft implements the recommended Arriba workflow. The only difference being STAR alignment is output to disk rather than piped into Arriba, so that it can be subsequently sorted indexed and saved for manual inspection of fusions in say IGV.

Requirements

Linux or *nix like OS, with working make, Wget, Bash, GNU sed, GNU gawk, GNU grep, gzip and Perl. - Tested on Ubuntu 18.04 LTS
STAR aligner
SAMtools
GNU Parallel
R and Bioconductor specifically packages: GenomicRanges, circlize and GenomicAlignments.
The COSMIC Complete Fusion Export file.
Some RNA-Seq FASTQ files to analyse.

Central config file

aloftConfig.sh contains various settings which will be used for execution along with comments, this Bash script is sourced by the main aloft.sh so it will inherit variables defined here. Please review this before launching an analysis run.

Sample sheet file

Samples are defined in a tab delimited flat file taking the form of:

sample_1	sample_1_R1.fastq.gz	sample_1_R2.fastq.gz
sample_2	sample_2_R1.fastq.gz	sample_2_R2.fastq.gz
sample_3	sample_3_R1.fastq.gz	sample_3_R2.fastq.gz

Here the first column defines the sample ID. If more than one pair of FASTQ files exists for each sample simply add these on as extra tab delimited columns. The path to these files should not be present in the sample sheet just their names. As the path can be given below.

Having made such a file you, and reviewed the settings in aloftConfig.sh you can run the pipeline like so:

aloft.sh <tab delimited sample sheet> <input FASTQ path> <output dir for run>

Out of the box aloft is configured to use 16 cores and will consume about 64GB of RAM during execution, the core count of various stages and number of concurrent jobs RAM allocated etc can be adjusted in aloftConfig.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE.txt		LICENSE.txt
README.md		README.md
aloft.sh		aloft.sh
aloftConfig.sh		aloftConfig.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aloft

Requirements

Central config file

Sample sheet file

About

Releases

Packages

Languages

License

MattBashton/aloft

Folders and files

Latest commit

History

Repository files navigation

Aloft

Requirements

Central config file

Sample sheet file

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages