IGVF Single-Cell Pipeline Setup and Execution Documentation

The default configuration for the IGVF single-cell pipeline operates in a Docker-based Terra cloud environment. However, Docker is not always available on high-performance computing (HPC) environments. To accommodate this, Singularity can be used as an alternative. This documentation outlines the setup for running the pipeline on a single host using Singularity. As of now, the Cromwell configuration for specific queuing systems has not been finalized, and individual experiments are being executed on a standalone machine.

Define some variables

root_dir	/home/user/proj
single_cell_pipeline_git	https://github.com/IGVF/single-cell-pipeline
diane_conda_url	https://woldlab.caltech.edu/~diane/conda/
conda_command	/home/user/bin/micromamba
environment	the IGVF single-cell pipeline

These variables can be accessed using ${defaults[“variable_name”]} in the shell scripts provided.

Set up the conda environment

It is crucial to use a recent version of Cromwell, as older versions available on Conda may not be compatible with the pipeline’s requirements. Version 40, for example, is not supported.

The following block installs most of the necessary dependencies for the IGVF single-cell pipeline into a Conda environment. The usage of Micromamba is recommended for faster installation, though it is more limited than Conda or Mamba.

Make sure to replace ${defaults[“conda_command”]} and ${defaults[“environment”]} with your own settings.

${defaults['conda_command']} create -q \
           -n ${defaults['environment']} \
           -c conda-forge -c bioconda \
           'cromwell>=86' \
           singularity \
           gsutil \
           'synapseclient>=4' >/dev/null &

Once that process finishes, you can use the following run commands to make sure the main components were installed.

${defaults['conda_command']} run -n ${defaults['environment']} womtool --version
${defaults['conda_command']} run -n ${defaults['environment']} cromwell --version
${defaults['conda_command']} run -n ${defaults['environment']} singularity --version
${defaults['conda_command']} run -n ${defaults['environment']} gsutil --version
${defaults['conda_command']} run -n ${defaults['environment']} synapse --version

Checkout the IGVF Single-Cell Pipeline Repository. As of Sep 2024, this is for release v1.

pushd ${defaults['root_dir']}
git clone ${defaults['atomic_workflows_git']} -b updated-qcs
popd

Configuring cromwell

To use Singularity instead of Docker with Cromwell, the following configuration is required. The example below is for running the pipeline on a single machine. When launching the pipeline, pass the -Dconfig.file=cromwell.conf argument to Cromwell to use this configuration. Additional adjustments are needed for integration with local queuing systems. adjustments to integrate with the local queing system.

This configuration is based on the Containers documentation. Note that Synapse requires access to a configuration file and a cache directory to function properly. Ensure that ${HOME} paths in the configuration are replaced with absolute paths to avoid issues with Singularity.

include required(classpath("application"))

backend {
    default: singularity
    providers: {
        singularity {
            # The backend custom configuration.
            actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"

            config {
                run-in-background = true
                runtime-attributes = """
                  String? docker
                """
                submit-docker = """
                  singularity exec --containall --bind ${HOME}/.synapseConfig:${HOME}/.synapseConfig --bind ${HOME}/.synapseCache:${HOME}/.synapseCache --bind ${cwd}:${docker_cwd} docker://${docker} ${job_shell} ${docker_script}
                """
            }
        }
    }
}

Running the IGVF Single-Cell Pipeline Locally

When running outside of Terra, the metadata .json file for Cromwell must be generated manually. Below is an example .json configuration for a customized 10x multiome experiment.

{
  "single_cell_pipeline.prefix": "Team_6-GM12878-10XMultiome",
  "single_cell_pipeline.chemistry": "10x_multiome",
  "single_cell_pipeline.genome_fasta": "https://api.data.igvf.org/reference-files/IGVFFI0653VCGH/@@download/IGVFFI0653VCGH.fasta.gz",
  "single_cell_pipeline.genome_tsv": "IGVF_human_v43_Homo_sapiens_genome_files_hg38_v43.tsv",
  "single_cell_pipeline.gtf": "https://api.data.igvf.org/reference-files/IGVFFI7217ZMJZ/@@download/IGVFFI7217ZMJZ.gtf.gz",
  "single_cell_pipeline.atac_barcode_offset": 0,
  "single_cell_pipeline.fastq_barcode": [
    "syn61457432",
    "syn61457437",
    "syn61457449",
    "syn61457459"
  ],
  "single_cell_pipeline.read1_atac": [
    "syn61457431",
    "syn61457436",
    "syn61457448",
    "syn61457458"
  ],
  "single_cell_pipeline.read2_atac": [
    "syn61457434",
    "syn61457438",
    "syn61457454",
    "syn61457460"
  ],
  "single_cell_pipeline.read1_rna": [
    "syn61457461",
    "syn61457463",
    "syn61457465",
    "syn61457469"
  ],
  "single_cell_pipeline.read2_rna": [
    "syn61457462",
    "syn61457464",
    "syn61457468",
    "syn61457476"
  ],
  "single_cell_pipeline.seqspecs": [
    "https://raw.githubusercontent.com/detrout/y2ave_seqspecs/main/Team_6_GM12878_10XMultiome-L001_seqspec.yaml"
  ],
  "single_cell_pipeline.whitelist_atac": [
    "737K-arc-v1_ATAC.txt.gz"
  ],
  "single_cell_pipeline.whitelist_rna": [
    "737K-arc-v1_GEX.txt.gz"
  ],
  "single_cell_pipeline.whitelists_tsv": "gs://broad-buenrostro-pipeline-genome-annotations/whitelists/whitelists.tsv",
  "single_cell_pipeline.check_read1_rna.disk_factor": 1,
  "single_cell_pipeline.check_read2_rna.disk_factor": 1,
  "single_cell_pipeline.check_read1_atac.disk_factor": 1,
  "single_cell_pipeline.check_read2_atac.disk_factor": 1,
  "single_cell_pipeline.check_fastq_barcode.disk_factor": 1
}

Running cromwell

To execute the pipeline, ensure the script and configuration files are properly referenced, including the Cromwell configuration file (cromwell.conf), the WDL script, and the generated JSON file.

PATH=/woldlab/loxcyc/home/user/proj/single-cell-pipeline/src/bash/:$PATH
  \ cromwell -Dconfig.file=cromwell.conf run \
  ../single-cell-pipeline/single_cell_pipeline.wdl \ -i tiny-13a.json

Finding results with cromwell.

Cromwell stores results in a nested directory structure under cromwell-executions/. To locate error or output logs, navigate to the appropriate subdirectory based on the workflow step. The example below shows a partial directory tree from a failed run:

cromwell-executions
- single_cell_pipeline
  - ${random_uuid}
    - call-atac
    - call-barcode_mapping
    - call-check_fastq_barcode
    - call-check_read1_atac
    - call-check_read1_rna
      - shard-0
        
        execution
        
        check_inputs_monitor.log
        
        files
        
        glob-aae8b15f635ae9fc31e845b03c8537e4
        
        glob-aae8b15f635ae9fc31e845b03c8537e4.list
        
        rc
        
        script
        
        script.background
        
        script.submit
        
        stderr
        
        stderr.background
        
        stdout
        
        stdout.background
        
        tmp.${suffix}
      - shard-1
      - shard-2
      - shard-3
    - call-check_read2_atac
    - call-check_read2_rna
    - call-check_seqspec

To find specific files, use the following command: find cromwell-executions -name ${filename}

This structure can take some trial and error to configure correctly for your environment. For easier results management, consider using tools like Caper. However, it is also possible to manage this manually as demonstrated above.

We extend our gratitude to @detrout for their contribution in providing the initial draft of this document.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

install-pipeline-locally.org

install-pipeline-locally.org

IGVF Single-Cell Pipeline Setup and Execution Documentation

Define some variables

Set up the conda environment

Configuring cromwell

Running the IGVF Single-Cell Pipeline Locally

Running cromwell

Finding results with cromwell.

Files

install-pipeline-locally.org

Latest commit

History

install-pipeline-locally.org

File metadata and controls

IGVF Single-Cell Pipeline Setup and Execution Documentation

Define some variables

Set up the conda environment

Configuring cromwell

Running the IGVF Single-Cell Pipeline Locally

Running cromwell

Finding results with cromwell.