Skip to content
This repository was archived by the owner on Aug 29, 2023. It is now read-only.

Merging latest changes for running on Compute Canada clusters #8

Open
wants to merge 50 commits into
base: dev-organize-output
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
75e0f90
Update readme.md
sitag Nov 12, 2019
05083fa
Update readme.md
sitag Nov 12, 2019
5cf17c3
file tracking
May 1, 2019
b56a763
track all encode extensions
May 3, 2019
a71e2f8
cleanup config
Jul 29, 2019
e334f1b
cleanup filt.bam + master filelist
Jul 29, 2019
d9c3b44
change md5 script generation
Jul 29, 2019
ad5f03f
add unrecognized file patterns
Aug 1, 2019
01b541b
cleaup updates
Aug 1, 2019
9acfa57
trapping find (really should do it differently)
Aug 1, 2019
6048281
add rmsize tracking
Aug 15, 2019
a70dafe
add cleanup notes
sitag Sep 3, 2019
520bead
add organising output section
sitag Sep 3, 2019
e3e1ee8
enable passing output directories
Oct 1, 2019
1df6a00
fix documentation
Oct 1, 2019
ab778de
Adding slurm singularity backend support
paulstretenowich Oct 1, 2019
7e9acc1
Singularity usage with cleanenv
paulstretenowich Oct 2, 2019
5529e7c
Readme update
paulstretenowich Oct 3, 2019
de56ad8
Fixing Local for IHEC test
paulstretenowich Oct 11, 2019
815ddf4
Merge branch 'master' of https://github.com/paulstretenowich/integrat…
paulstretenowich Jan 31, 2020
2fcbb39
Merge pull request #1 from IHEC/dev-organize-output
paulstretenowich Jan 31, 2020
7fabada
Fixing sambamba issue + pbs support
paulstretenowich Jan 31, 2020
0f34fea
PBS support fix
paulstretenowich Jan 31, 2020
82d44c5
PBS support readme modification
paulstretenowich Jan 31, 2020
d2a34aa
Update docker image
paulstretenowich Feb 3, 2020
077a00b
Path to cromwell fix inside template for piperunner
paulstretenowich Feb 13, 2020
f2d0509
Adding cleanenv at singularity calling
paulstretenowich Feb 13, 2020
4b6f7fb
Merge branch 'master' into dev-organize-output
paulstretenowich Aug 27, 2020
59badf4
Merge remote-tracking branch 'upstream/dev-organize-output' into dev-…
paulstretenowich Aug 27, 2020
0eddb1f
Test
paulstretenowich Sep 24, 2020
1a0c331
test
paulstretenowich Sep 25, 2020
ed2dc1d
Test
paulstretenowich Sep 30, 2020
3b68f5c
Test
paulstretenowich Oct 6, 2020
0357c05
Test
paulstretenowich Oct 6, 2020
a735a2e
Test
paulstretenowich Oct 6, 2020
7e52f38
Test
paulstretenowich Oct 7, 2020
6de370e
Changing the usage and updating the doc
paulstretenowich Oct 7, 2020
ee6c610
Doc update
paulstretenowich Oct 7, 2020
e5fea87
Doc update
paulstretenowich Oct 7, 2020
8e4a668
Fixing backend.conf
paulstretenowich Oct 7, 2020
6f7b247
Changing back
paulstretenowich Oct 8, 2020
8e6a18c
Changing Compute Canada behaviour
paulstretenowich Mar 3, 2021
1a9a13f
Debug
paulstretenowich Apr 9, 2021
b7ade1a
Updating md5s for new singularity image
paulstretenowich Jun 9, 2021
e6f44ba
Adding execute permission for compute canada launchers
paulstretenowich Jun 9, 2021
40f9aeb
Updating new singularity image
paulstretenowich Jun 9, 2021
4d43c3d
Updating documentation
paulstretenowich Jun 9, 2021
5ab13d0
Updating documentation
paulstretenowich Jun 9, 2021
f9bba75
Updating documentation
paulstretenowich Jul 7, 2021
0bfd39c
Merge pull request #2 from paulstretenowich/dev-organize-output
paulstretenowich Jul 7, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Readme update
  • Loading branch information
paulstretenowich committed Jan 31, 2020
commit 5529e7c500869145a11d62c2b14f03c697da69c0
34 changes: 17 additions & 17 deletions encode-wrapper/readme.md
Original file line number Diff line number Diff line change
@@ -12,17 +12,17 @@ Documemtation on how to define configs for IHEC standard workflows: [IHEC standa

## Downloading test data

First run `./get_encode_resources.sh` to get encode test dataset and hg38 genome files.
First run `./get_encode_resources.sh` to get encode test dataset and hg38 genome files.

By default it will use git over http. If you want to use ssh, then pass `ssh` as first argument
By default it will use git over http. If you want to use ssh, then pass `ssh` as first argument.

Run `chip.py -get` to get IHEC ChIP test data for MCF10A cell line.

## Pulling Singularity image and generating wrapper scripts

Check singularity version with `singularity --version` to make sure it's at least `2.5.2`.

Then run `python chip.py -pullimage -bindpwd`. `bindpwd` will mount the current directory (equivalent to arguments `-B $PWD`). Note that this means singularity must be a recent enough version to be able to bind to directories that do not exist on the image, since your `$PWD` may not exist on the image. Otherwise see `-pwd2ext0` option that binds $PWD to `/mnt/ext_0`.
Then run `python chip.py -pullimage -bindpwd`. `bindpwd` will mount the current directory (equivalent to arguments `-B $PWD`). Note that this means singularity must be a recent enough version to be able to bind to directories that do not exist on the image, since your `$PWD` may not exist on the image. Otherwise see `-pwd2ext0` option that binds $PWD to `/mnt/ext_0`.

This will write:

@@ -42,29 +42,29 @@ This will write:

If you are running in `Local` mode using `./chip.py -pullimage -bindpwd $PWD/data_b $PWD/data_a` will mount `$PWD/data_b` as `/mnt/ext_0`, `$PWD/data_a` as `/mnt/ext_1` and so on, and it binds `$PWD` to `$PWD`. If you are on older systems without support for overlayFS, then passing `-pwd2ext0` will bind `$PWD` `/mnt/ext_0` and shift other bind points further along `ext_$i`'s.

For example,
For example,

python ./chip.py -pullimage -bindpwd -pwd2ext0 $PWD/v2/ihec

will set up all binds so that after downloading the cemt0007 test data, you can just use `cemt0007_h3k27me3_mnt_ext_0.json` out of the box like:

$ ./singularity_wrapper.sh cemt0007_h3k27me3_mnt_ext_0.json

without needing to do `chip.py -maketests` as later described.
without needing to do `chip.py -maketests` as later described.

This will also create the singularity image in `./images`.

Do `chmod +x ./*sh`.

You can pass `-nobuild` if you just want to regenerate the wrapper scripts without pulling the singularity image again.
You can pass `-nobuild` if you just want to regenerate the wrapper scripts without pulling the singularity image again.

If you did not use `python ./chip.py -pullimage -bindpwd -pwd2ext0 $PWD/v2/ihec` then you will not be able to use `cemt0007_h3k*_mnt_ext_0.json` for tests, as the test data may not be mapped to `/ext/mnt_0`. See running tests below.
If you did not use `python ./chip.py -pullimage -bindpwd -pwd2ext0 $PWD/v2/ihec` then you will not be able to use `cemt0007_h3k*_mnt_ext_0.json` for tests, as the test data may not be mapped to `/ext/mnt_0`. See running tests below.

## Running tests

To run ENCODE test tasks, do `./singularity_encode_test_tasks.sh Local try1` to run it locally. The first argument is the config argument to cromwell (see ENCODE pipeline documentation). The output of tests will be written in `test_tasks_results_try1`. If you are on HPC and prefer to use SLURM, do `./encode_test_tasks_run_ihec_slurm_singularity.sh <installation_dir> slurm_singularity try1`.
To run ENCODE test tasks, do `./singularity_encode_test_tasks.sh try1` to run it locally. The first argument is the config argument to cromwell (see ENCODE pipeline documentation). The output of tests will be written in `test_tasks_results_try1`. If you are on HPC and prefer to use SLURM, do `./encode_test_tasks_run_ihec_slurm_singularity.sh <installation_dir> slurm_singularity try1`.

Make sure all test pass, by looking through jsons generated. `./status_encode_tasks.py` can be used here.
Make sure all test pass, by looking through jsons generated. `./status_encode_tasks.py` can be used here.

python ./status_encode_tasks.py ./test_tasks_results_try1
# ok:./test_tasks_results_try1/test_spr.test_task_output.json
@@ -87,7 +87,7 @@ Make sure all test pass, by looking through jsons generated. `./status_encode_ta
"#ok": 14
}

Doing `python chip.py -maketests` will write ChIP test configurations (you also need to pass `-pwd2ext0` if you set `$PWD` to `/ext/mnt_0`) :
Doing `python chip.py -maketests` will write ChIP test configurations (you also need to pass `-pwd2ext0` if you set `$PWD` to `/ext/mnt_0`):

* ./v2/ihec/cemt0007_h3k4me3.json

@@ -101,20 +101,20 @@ Or using SLURM with:

`./piperunner_ihec_slurm_singularity.sh ./v2/ihec/cemt0007_h3k4me3.json slurm_singularity h3k4me3_out` and `./piperunner_ihec_slurm_singularity.sh ./v2/ihec/cemt0007_h3k27me3.json slurm_singularity h3k27me3_out`

The provided configuration files are for 75bp PET only. Standard configration files for SET and read lengths will be provided. The ENCODE documentation discusses other modes.
The provided configuration files are for 75bp PET only. Standard configration files for SET and read lengths will be provided. The ENCODE documentation discusses other modes.

To compute md5s of generated file, use `computemd5s.py <output_dir> <script_suffix>` with `<output_dir>` being the output directory of previous step and `<script_suffix>` being the suffix to add at file output basename `computemd5s_`. This will locate peak calls and bam files, and generate scripts to compute the md5s. Note the bam md5s are generated without teh bam header as that may contain full paths names.
To compute md5s of generated file, use `computemd5s.py <output_dir> <script_suffix>` with `<output_dir>` being the output directory of previous step and `<script_suffix>` being the suffix to add at file output basename `computemd5s_`. This will locate peak calls and bam files, and generate scripts to compute the md5s. Note the bam md5s are generated without teh bam header as that may contain full paths names.

As an example, supose output of `./singularity_wrapper.sh ./v2/ihec/cemt0007_h3k4me3.json` is in `$PWD/h3k4me3_out`. So do
As an example, supose output of `./singularity_wrapper.sh ./v2/ihec/cemt0007_h3k4me3.json` is in `$PWD/h3k4me3_out`. So do

python computemd5s.py $PWD/h3k4me3_out test
chmod +x ./computemd5s_test
./computemd5s_test > log_h3k4me3
python status_cemt.py log_h3k4me3 expected_md5s_h3k4me3.json

This will match md5s for cemt0007 H3K4me3 analysis. And similarly for H3K27me3.
This will match md5s for cemt0007 H3K4me3 analysis. And similarly for H3K27me3.

$ python status_cemt.py computemd5s_0.out ./expected_md5s_h3k27me3.json
$ python status_cemt.py computemd5s_0.out ./expected_md5s_h3k27me3.json
ok ChIP-Seq.IX1239-A26688-GGCTAC.134224.D2B0LACXX.2.1.merged.nodup.pr2_x_ctl_for_rep1.pval0.01.500K.narrowPeak.gz 1c9554fe8b67e61fd7c69a1881ec2e3a
ok conservative_peak.narrowPeak.hammock.gz b78724bb667cc7bbfece8a587c10c915
ok ChIP-Seq.IX1239-A26688-GGCTAC.134224.D2B0LACXX.2.1.merged.nodup.pr1_x_ctl_for_rep1.pval0.01.500K.bfilt.narrowPeak.hammock.gz defd886ab7923b952e04ee033a722fac
@@ -145,6 +145,6 @@ See output of `./trackoutput.sh <cromwell_directory_for_analysis>` to see what f
./unresolvedfiles.list # files that will be kept, but cannot be accessed as they may be hardlinks that cannot be resolved
./unexpectedfiles.list # extraneous cromwell files that do not match patterns for expected cromwell files

The recommended workflow if to remove files from `delete.list` only (in case diskspace is an issue). And then symlink files from `masterfiles.list` in an empty directory. So all files other than input files and intermediate bam files are still available inside the cromwell directory but the output directory is organized and free of extra logs files and scripts.
The recommended workflow if to remove files from `delete.list` only (in case diskspace is an issue). And then symlink files from `masterfiles.list` in an empty directory. So all files other than input files and intermediate bam files are still available inside the cromwell directory but the output directory is organized and free of extra logs files and scripts.

It's expected that `unresolvedfiles.list` and `unexpectedfiles.list` are empty. If they are not empty, the files listed there will need to be looked at. Please review files before deleting to ensure nothing useful is removed.
It's expected that `unresolvedfiles.list` and `unexpectedfiles.list` are empty. If they are not empty, the files listed there will need to be looked at. Please review files before deleting to ensure nothing useful is removed.