MinCutTAD: Interpretable graph neural network - driven TAD prediction from Hi-C chromatin interactions and chromatin states
- GNN algorithm driven by spectral clustering to detect TADs. Constructed with GraphConv, a message passing layer, and if the algorithm is unsupervised with a MinCut pooling layer.
- Message passing refers to the smoothening of the information among the directly surrounding node features.
- Pooling refers to the aggregation of strongly similar nodes, thereby reducing the graph domain and forming sub clusters.
- Utilizes Hi-C matrices data & genomic annotations (CTCF, RAD21, SMC3, # of housekeeping genes) for the provided genomic loci of chromosomes
Two approaches:
- Supervised uses Arrowhead solutions as labels for the genomic bins and optimizes towards classifying the graph nodes accordingly to those.
- Unsupervised: no labels are provided to the model, and it determines whether regions belong to a TAD or not and aggregate them. Therefore, its main goal is to cluster single TAD regions together.
Further descriptions can be found in our 10 page report or our 2 page digest.
The folder structure of the repsoitory is shown below. The folders ./TopResults
, ./cmap_files
, ./node_annotations
and ./ressources
contain files necessary for running the scripts in the folder ./tad_detection
.
├── cmap_files │ ├── 25kb │ │ ├── GM12878 │ │ │ └── intra │ │ └── IMR-90 │ │ └── intra │ └── 100kb │ ├── GM12878 │ │ ├── inter │ │ └── intra │ └── IMR-90 │ └── intra ├── node_annotations ├── ressources ├── tad_detection │ ├── evaluation │ ├── model │ ├── preprocessing │ └── utils_general.py ├── Digest_TeamHA1.pdf ├── LICENSE ├── README.md ├── Report_TeamHA1.pdf └── environment.yml
The scripts developed as part of this project can be found in the folder ./tad_detection
and the corresponding subfolders.
An exact description of the preprocessing scripts can be found in the folder ./tad_detection/preprocessing
and the associated README.
An exact description of the training scripts can be found in the folder ./tad_detection/model
and the associated README.
An exact description of the evaluation scripts can be found in the folder ./tad_detection/evaluation
and the associated README.
An exact description of the benchmarking tools scripts can be found in the folder ./tad_detection/evaluation/tools_benchmarking
and the associated README.
The tools must be run with ./MeetEU
as the working directory. An environment.yml
file with a list of all the necessary packages for our model and scripts is available in the repository. Please note that some of the packages may only be available for UNIX-based operating systems. The usage of a HPC with access to a GPU is highly recommended for the training of the model.
Sample data to run this algorithm can be found here.