-
Notifications
You must be signed in to change notification settings - Fork 46
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Add] README for global haplo benchmark study
- Loading branch information
1 parent
6beafd4
commit 30e3121
Showing
1 changed file
with
20 additions
and
0 deletions.
There are no files selected for viewing
20 changes: 20 additions & 0 deletions
20
resources/auxiliary_workflows/benchmark/resources/multi_setup/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
This repository contains the Snakemake workflow to reproduce the benchmarking study for the global haplotype reconstruction methods presented in https://doi.org/10.1101/2023.10.16.562462. | ||
|
||
The notebooks in the directory `workflow/notebooks/` can be used to reproduce the figures of Figure 4. | ||
|
||
Here is a step-by-step guide on how to run this workflow. | ||
1. Clone the repository of V-pipe 3.0 into your working directory: `git clone https://github.com/cbg-ethz/V-pipe.git` | ||
2. Go into the directory of the benchmarking study for the global haplotype reconstruction `cd V-pipe/resources/auxiliary_workflows/benchmark/resources/multi_setup` | ||
3. The parameters to reproduce the synthetic dataset of varying coverage is here: `config_distance_varycoverage/params.csv` with the configuration file `config_distance_varycoverage/config.yaml` where simulation mode, replicate number and methods to be executed are defined. | ||
4. The parameters to reproduce the synthetic dataset of varying distance pattern is here: `config_distance_varyparams/params.csv` with the configuration file `config_distance_varyparams/config.yaml` where simulation mode, replicate number and methods to be executed are defined. | ||
5. The parameters to reproduce the real dataset is here: `config_realdata/params.csv` with the configuration file `config_realdata/config.yaml` where replicate number and methods to be executed are defined. | ||
6. The methods to execute must be define in a Python script in this directory: `V-pipe/resources/auxiliary_workflows/benchmark/resources/method_definitions` | ||
- Haploclique: `V-pipe/resources/auxiliary_workflows/benchmark/resources/method_definitions/haploclique.py` | ||
- PredictHaplo: `V-pipe/resources/auxiliary_workflows/benchmark/resources/method_definitions/predicthaplo.py` | ||
- HaploConduct: `V-pipe/resources/auxiliary_workflows/benchmark/resources/method_definitions/haploconduct.py` | ||
- CliqueSNV: `V-pipe/resources/auxiliary_workflows/benchmark/resources/method_definitions/cliquesnv.py` | ||
7. Now the workflow is ready, go back to the directory `V-pipe/resources/auxiliary_workflows/benchmark/resources/multi_setup`. | ||
8. To install the needed Conda environments execute: `snakemake --conda-create-envs-only --use-conda -c1`. | ||
9. To submit the workflow to a lsf-cluster execute `./run_workflow.sh`, otherwise execute the workflow with `snakemake --use-conda -c1` | ||
10. The workflow will provide the results in the directory `results`. | ||
11. When the workflow has terminated and all result files were generated, figures from Figure 4 from the manuscript can be generated by executing the notebooks in `workflow/notebooks/`. |