MDkNN

Chromatin DoD(Degree of Disorder) measurements using MDkNN(Mean Distance to k-Nearest Neighbors) method.

This method is similar to XiaoTaoWang's Aggregate Preference method[1], but allow differential analysis(statistical comprasion) between two samples.

Installing dependencies

We recommand use conda to manage the environment and installing dependencies:

$ conda create -n mdknn python=3.6
$ conda activate mdknn
(mdknn)$ conda install numpy scipy matplotlib pandas scikit-learn statsmodels tqdm
(mdknn)$ conda install -c bioconda cooler

Usage

Calculate DoD value of single sample

The input files is a tab splited TAD(or any chromosome region) list like this:

chr1	520000	1280000
chr1	2120000	2320000
chr1	2320000	2600000
chr1	2600000	2640000
chr1	2680000	3360000
chr1	3360000	3800000
...

And the corrsponding Hi-C matrix stored in .cool format.

To calculate their DoD using the following command:

(mdknn)$ python mdknn.py -p sample.cool -t tads.txt -O output.txt -k 3 --ww 7 --pw 4

The sample.cool and tads.txt is your input Hi-C matrix and TAD list. -O specified the output path. -k is a parameter represent the number of significant interaction's neighbors used for calculate the DoD. In most cases, 3 is a appropriate value. -ww and -pw is parameters for interaction selection and noise filtering. They are sensitive to the matrix resolution, recommand set pw to 4 and ww to 7 at 5 kb, and (2, 5) at 10 kb.

Statistical comparison of two samples

The input is almost same to the single sample version, but need one more Hi-C matrix of second sample. Note that matrices of two samples to be compared are best build with same amount of contact pairs.

(mdknn)$ python mdknn.py -p sample1.cool -p2 sample2.cool -t tads.txt -O output.txt -k 3 --ww 7 --pw 4

Parameters

(mdknn)$ python mdknn.py --help
usage: mdknn.py [options]

MDKNN -- Mean Distance of k Nearest Neighbors

optional arguments:
  -h, --help            show this help message and exit
  -O OUTPUT, --output OUTPUT
                        Output file name. (default: None)

Relate to the input:
  -p PATH, --path PATH  Path to the cool URI (default: .)
  -p2 PATH_2, --path-2 PATH_2
                        Path to the cool URI (default: None)
  -t TAD_FILE, --tad-file TAD_FILE
                        Path to the TAD file. (default: None)
  --out-long-range OUT_LONG_RANGE
                        Output the position of long-range points. (default:
                        None)

Feature calculation:
  -k K                  Number of nearest neighbors. (default: 3)
  --pw PW               Width of the interaction region surrounding the peak.
                        According to experience, we set it to 1 at 20 kb, 2 at
                        10 kb, and 4 at 5 kb. (default: 2)
  --ww WW               Width of the donut region Set it to 3 at 20 kb, 5 at
                        10 kb, and 7 at 5 kb. (default: 5)
  --top TOP             Parameter for noisy interaction filtering. By default,
                        30 percent noisy interactions will be eliminated.
                        (default: 0.7)
  --ratio RATIO         Specifies the sample ratio of significant interactions
                        for TAD. (default: 0.05)
  --gap GAP             Maximum gap ratio. (default: 0.2)

References

[1] Wang, Xiao-Tao, et al. "Structural heterogeneity and functional diversity of topologically associating domains in mammalian genomes." Nucleic acids research 43.15 (2015): 7237-7246.

Thanks

Thanks XiaoTaoWang for the implementation is based on his TADLib.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
doc/img		doc/img
example_data		example_data
README.md		README.md
mdknn.py		mdknn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDkNN

Installing dependencies

Usage

Calculate DoD value of single sample

Statistical comparison of two samples

Parameters

References

Thanks

About

Releases 1

Packages

Languages

GangCaoLab/MDkNN

Folders and files

Latest commit

History

Repository files navigation

MDkNN

Installing dependencies

Usage

Calculate DoD value of single sample

Statistical comparison of two samples

Parameters

References

Thanks

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages