- Introduction
- Installation
- Running Workflows
- Inspecting Outputs
- Supported Platforms
- Using Singularity
Here are some notes on running and using this pipeline. Using Caper is the canonical, supported and official way to use ENCODE Uniform Processing Pipelines. The example below uses the command caper run
, which is the simplest way to run a single pipeline. For running multiple pipelines in a production setting using caper server
is recommended. To find details on setting up the server, refer to the Caper documentation.\
This repository contains several workflows. For details on them, please see the reference
-
Git clone this pipeline.
$ git clone https://github.com/ENCODE-DCC/hic-pipeline
-
Install Caper, requires
java
>= 1.8 andpython
>= 3.6, Caper >= 0.8.2.1 is required to run the pipeline. Caper is a Python wrapper for Cromwell.$ pip install caper # use pip3 if it doesn't work
-
Follow Caper's README carefully to configure it for your platform (local, cloud, cluster, etc.)
IMPORTANT: Configure your Caper configuration file
~/.caper/default.conf
correctly for your platform.
If using local Docker, go to Docker Preferences > Resources > Advanced > Memory
and set the max memory to 12 GB. Otherwise it may fail due to resource issues.
Make sure you have properly installed the pipeline as described in the installation instructions. Make sure to run the following commands from the root of the repository (i.e. cd hic-pipeline
if you have not done so already).
- Prepare the input JSON file. This file contains the user-specified files and parameters to run the pipeline with. Different examples of input JSON files are available here. Details about the different input parameters are available here. Copy and paste the entirety of the following command into your terminal (uses heredoc syntax) and press enter/return to create a file called
input.json
pointing to the test data in this repo as pipeline input:
cat << EOF > input.json
{
"hic.assembly_name": "ce10",
"hic.chrsz": "tests/data/ce10_selected.chrom.sizes.tsv",
"hic.fastq": [
[
[
"tests/data/merged_read1.fastq.gz",
"tests/data/merged_read2.fastq.gz"
]
]
],
"hic.reference_index": "tests/data/ce10_selected.tar.gz",
"hic.restriction_enzymes": [
"MboI"
],
"hic.restriction_sites": "tests/data/ce10_selected_MboI.txt.gz"
}
EOF
- Run the pipeline using Caper. The
-m
flag is used to give a memorable name to the metadata JSON file the pipeline will produce once it is finished describing the run. More details about the metadata JSON can be found in the Cromwell documentation
$ caper run hic.wdl -i tests/functional/json/test_hic.json -m hic_testrun_metadata.json
Rather than needing to dig through the highly nested Cromwell output directories or complex JSON metadata, Croo can be used to generate a more legible HTML table of paths to outputs. To invoke croo
, run the following, passing a Cromwell metadata JSON file as input:
$ croo "${PATH_TO_METADATA_JSON}"
This pipeline can be run on a variety of platforms via Caper. For a list of supported platforms, see Caper's list of built-in backends. These include local machines, Google Cloud Platform, Amazon Web Services, and a selection of HPC clusters, namely Slurm, PBS, and SGE. Furthermore, Caper provides the ability to use a custom backend, which can be useful in getting it to work with your particular cluster or cluster configuration.
Caper comes with built-in support for using Singularity containers instead of Docker with --singularity
option. This is useful in HPC environments where Docker usage is restricted. See Caper documentation for more information. Please note that GPU-enabled tasks (HiCCUPS and DELTA) will not work.