This repository implements a pair of Jupyter notebook-based tools that utilize the MetaSRA for building structured datasets from the SRA in order to facilitate secondary analyses of the SRA’s human RNA-seq data:
- Case-Control Finder: Finds suitable case and control samples for a given disease or condition where the cases and controls are matched by tissue or cell type.
- Series Finder: Finds ordered sets of samples for the purpose of addressing biological questions pertaining to changes over a numerical property such as time.
For a more in depth description of these tools, please see our publication:
A few things to note:
- These notebooks currently utilize the MetaSRA version 1.6.
- This repository was copied and modified from the following repository: https://github.com/NCBI-Hackathons/RNA-Seq-in-the-Cloud/tree/master/Metadata. These tools were developed at an NCBI computational biology codeathon in March 2019 held in Chapel Hill, North Carolina.
The notebooks can be executed in the cloud via Google Colab:
The dependencies for these notebooks are described in requirements.txt
. To install these dependencies, please run
pip install -r requirements.txt
Furthermore, before running the notebook, you must unpack the static metadata files from data.tar.gz
. To do so, run the following command:
tar -zcf data.tar.gz
To run the Case-Control Finder, run:
jupyter notebook case_control_finder.ipynb
To run the Series Finder, run:
jupyter notebook series_finder.ipynb
- Matthew Bernstein
- Emily Clough
- Ariella Gladstein
- Khun Zaw Latt
- Ben Busby
- Allissa Dillman