10x CellRanger SJP

10xGenomics CellRanger - Slurm Job Partitioner (SJP)

Background

The pipelines

Cell Ranger is a set of analysis pipelines that processes Chromium single cell 3’ RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis. Cell Ranger includes four main pipelines:

cellranger mkfastq
cellranger count
cellranger aggr
cellranger reanalyze

These pipelines combine Chromium-specific algorithms with the widely used RNA-seq aligner STAR. Output is delivered in standard BAM, MEX, CSV, HDF5 and HTML formats that are augmented with cellular information.

You can find more information about Chromium and Cell Ranger here.

The setup

I run the Cell Ranger pipelines (presented above) on a SLURM based cluster, to analyze the raw, single-cell sequencing data deriving from Illumina HiSeq instruments.

The problem

Most of the bioinformatics tools to date, are not mature enough to make efficient use of distributed sytems. Cell Ranger is not an exception. It doesn't seem to provide destributing capabilities in order to run it in more than one nodes, to parallelize the process. Moreover, the nodes on the cluster I am using, are not so powerful by their own. That means, one should split one big batch job into several smaller (each running on a different node), in an attempt to parallelize the process.

The workaround

Here I created a workaround (rather than a solution) to this problem. I made this Python script, which based on the given parameters it decides how to devide a project into small jobs that run individually in their own node.

Installation

Before you start

(This steps will soon become automatic.) Make sure your folder structure looks like this:

10x-CellRanger-SJP
  |
  +-- cellranger-1.2.0/
  |
  +-- deliverables
  |
  +-- projects
  |
  +-- references/
  |
  `-- scripts/

Make sure your cellranger installation is under the cellranger-1.2.0 directory. Place the references in the references folder. Then download the python script and the template files in the scripts folder.

Usage

Adapt the 10xGenomics Pipeline to your new Project.

$ python script_generator.py [-h] -d <Data_Path> -s <Samplesheet> -A <Slurm_Project> -J <Jobname> [--qos] [-r {mm,hg}] [--aggr-norm {mapped,raw,None}] [--version]

General arguments

Argument	Description
-h, --help	shows the help message and exits
--version	shows program's version number and exits

#SBATCH arguments

#SBATCH arguments are used for specifying your preferences for the SLURM job you plan to submit.

Argument	Description
-A	SLURM Project name.
-J	Give a name to this job.
-q, --qos	Use it for testing. Max 15mins, 4nodes. Very high priority.

Script Variables:

Variables about your project (e.g. file paths)

Argument	Description
-d, --hiseq-datapath	Path of the raw hiseq data (pref. absolute).
-r, --ref {mm,hg}	Choose a reference genome [mouse, human].
-s, --samplesheet	Path of the metadata samplesheet (pref. absolute).
--aggr-norm {mapped,raw,None}	Normalization depth across the input libraries.

NOTE: If you call the script without arguments it will enter the adaptive mode, where you will be asked specifically to add each of the necessary inputs. (currently under construction)

Make Deliverables

Once the projects you were running are done, you may want to deliver the data to someone, or keep them organized for yourself. To do that you can use the make_deliverable.py script, which will aggregate the data of the given projects into one folder. (The given projects have to be subject of a common biological project. That means part of the project name should much.)

Usage

./make_deliverable.py -p project_XXXXXXXXX_10X_YY_NNN_##_001 project_YYYYYYYYY_10X_YY_NNN_##_002 -f -o 10X_YY_NNN_##

Parameters

Argument	Description
-h, --help	show the help message and exits
-v, --version	shows program's version number and exits
-p, --project	Names of the projects you want to include in the deliverable.
-f, --fastq	If set, it will also include the fastqs in the deliverable.
-o, --output	(Optional) The name of the deliverable to be created.

NOTE:

You can give multiple projects by simply separete them with space. Note that the projects should be part of a bigger biological project. In other words, this part of the name: 10X_YY_NNN_## should be common.
If -o, --output argument is not given, the common part of the project names will be used for the naming of the output.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
make_deliverable.py		make_deliverable.py
run_partitioner.py		run_partitioner.py
zipper.py		zipper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

10x CellRanger SJP

Background

The pipelines

The setup

The problem

The workaround

Installation

Before you start

Usage

General arguments

#SBATCH arguments

Script Variables:

Make Deliverables

Usage

Parameters

About

Releases

Packages

Languages

glrs/10x-CellRanger-SJP

Folders and files

Latest commit

History

Repository files navigation

10x CellRanger SJP

Background

The pipelines

The setup

The problem

The workaround

Installation

Before you start

Usage

General arguments

#SBATCH arguments

Script Variables:

Make Deliverables

Usage

Parameters

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages