Skip to content

Latest commit

 

History

History

evaluation

Evaluation

This folder contains the functions to evaluate the results of our model and benchmarking tools via among others TAD size calculation, statistical measures like the Jaccard index and Venn diagrams. An exact description of the scripts to run the tools for benchmarking can be found in the folder ./tools_benchmarking and the associated README. The folder structure is shown below. The main scripts, which can be run independently, are marked. Below the purpose of each main script is discussed and it is described how to call each of these scripts.

├── tad_detection
│   ├── evaluation
│   │   ├── tad_dicts
│   │   ├── evaluate.py
│   │   ├── parameters.json
│   │   └── utils_evaluate.py

In the folder tad_dicts, dictionaries containing the tad region information for different methods, cell lines, and resolutions can be found. The functions in utils_model.py are used as helper functions calculate the Jaccard index and further statistics, generate Venn-Diagrams and calculate the TAD region size.

Scripts

evaluate.py

usage: evaluate.py [-h] --path_parameters_json PATH_PARAMETERS_JSON

Run evaluation pipeline on multiple experiments and comparing results by
calculating statistics (Size TADs, Count TADs, Jaccard index etc.), creating
Venn-Diagrams and Hi-C maps with TAD regions.

optional arguments:
  -h, --help            show this help message and exit
  --path_parameters_json PATH_PARAMETERS_JSON
                        path to JSON with parameters.

The results of this script are located in the given output directory.

Parameters

The parameters.json file contains the used parameters for the different functions in this folder. In the parameters.json file several variables can be set, which will be described below:

parameters.json

variables:
  cell_line: cell line, for which evaluation script is performed (and which kind of data is inputted)
                                                        "GM12878"
  chromosomes: chromosomes, for which dataset in preprocessing.py or node annotations node_annotations.py or chromosome length dict in chr_len_dict.py should be created
                                                        "all", ["1", "2", ...]
  dataset_name: name of dataset
                                                        "gm12878_no_filter_no_binary_graph_conv_supervised"
  node_feature_encoding: genomic annotations used in dataset creation
                                                        ["CTCF", "RAD21", "SMC3", "Number_HousekeepingGenes"]
  output_directory: output directory of statistics and polts generated by evaluate.py, for example:
                                                        "./output/"
  paths_predicted_tads_per_tad_prediction_methods: paths of dictionaries with predicted TADs
                                                        ["./tad_detection/evaluation/results/Arrowhead_GM12878_100kb_dict.p",
                                                        "./tad_detection/evaluation/results/TopDom_GM12878_100kb_dict.p"]
  path_topdom_bed: path to a bed file
                                                        "./tad_detection/evaluation/tools_benchmarking/TopDomTests/bin/bintest.bed"
  scaling_factor: resolution of Hi-C adjacency matrix
                                                        25000, 100000
  tad_prediction_methods: list of strings of the tad prediction methods to evaluate
                                                        ["Arrowhead","TopDom"]