Skip to content

GNU Make-driven workflow to download TCGA data via the TCGAbiolinks package

Notifications You must be signed in to change notification settings

mschubert/TCGAbiolinks-downloader

Repository files navigation

TCGAbiolinks-downloader

This workflow is using the TCGAbiolinks package to download data from the NCI's Genomic Data Commons.

All files are stored as <cohort>.RData in their respecitive analysis directories.

Requirements

The following software is required to run this workflow:

Optionally, the following R packages for post-processing:

  • edgeR - for log2 cpm transformation of RNA-seq reads
  • DESeq2 - for variance stabilizing transformation of RNA-seq reads

Downloading the data

The are three options to download and save TCGA data:

# Download everything
make # add the -j<n> flag to run n data sets in parallel

# Selection by cohort
# - see projects.txt for valid cohorts
make <cohort> # eg. 'TCGA-LUAD' for lung adenocarcinoma

# Selection by data type
# - valid types are: snv_mutect2, rna_seq_raw, cnv_segments, mirna_seq, clinical
make <data type> # eg. 'clinical' for downloading clinical data

Data will be stored as RData files (containing a data.frame or SummarizedExperiment object) for each cohort in the respective data type directories.

Additional documentation

The data processing steps underlying the data being downloaded is fully documented on the GDC webpage.

About

GNU Make-driven workflow to download TCGA data via the TCGAbiolinks package

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published