-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathREADME.txt
96 lines (85 loc) · 3.04 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Scripts used to extract nodes and edges from MINER data in 2016 and later in 2019
The Utils directory consists of general scripts that can be used to process multiple datasets.
In addition, there are directories (with readmes on how to generate the respective snap tables) for each of the following Modes and CrossNets:
Modes
- Genes
- HUGO (http://www.genenames.org/cgi-bin/statistics)
- GeneOntology (http://geneontology.org/page/download-go-annotations)
- Proteins
- STRING (http://string-db.org/cgi/download.pl)
- Functions
- GeneOntology (http://geneontology.org/page/download-ontology)
- Chemicals
- Drugbank (http://www.drugbank.ca/)
- Diseases
- DiseaseOntology (http://disease-ontology.org/)
- CTD (http://ctdbase.org)
- OMIM (http://www.omim.org/)
CrossNets
- Gene-Protein
- ENSEMBL Genes, Human genes (http://www.ensembl.org/biomart/martview)
- Protein-Protein
- STRING (http://string-db.org/cgi/download.pl)
- Gene-Function
- GeneOntology (http://geneontology.org/docs/download-go-annotations/)
- Function-Function
- GeneOntology (http://geneontology.org/page/download-ontology)
- Chemical-Chemical
- Drugbank (http://www.drugbank.ca/)
- Chemical-Gene
- Drugbank (http://www.drugbank.ca/)
- Disease-Disease
- DiseaseOntology (http://disease-ontology.org/)
- Disease-Gene
- CTD (http://ctdbase.org)
- Disease-Chemical
- CTD (http://ctdbase.org)
- Disease-Function
- CTD (http://ctdbase.org)
New Datasets can be found at /dfs/scratch2/MINER-BIO/data-miner-201908.
Old Datasets can be found at /dfs/scratch2/MINER-BIO/data-miner.
The latest graph can be found at /dfs/scratch2/MINER-BIO/work-data-miner-v2/farzaan/snap
Here's a quick look at the new miner dataset:
-------------------------------
Modes | Nodes
-------------------------------
Chemical | 13,339
Protein | 22,406,877
Gene | 106,536
Function | 48,969
Disease | 25,969
--------------------------------------
Cross-Nets | Edges
--------------------------------------
Chemical-Chemical | 2,712,183
Chemical-Gene | 20,644
Function-Function | 249,828
Gene-Function | 481,543
Gene-Protein | 18,650
Disease-Disease | 9,383
Disease-Gene | 64,109,210
Disease-Function | 2,138,340
Disease-Chemical | 2,643,750
Protein-Protein | 2,147,483,643
The old miner-dataset at a glance:
-------------------------------
Modes | Nodes
-------------------------------
Chemical | 11,367
Protein | 8,254,694
Gene | 104,004
Function | 46,564
Disease | 22,299
--------------------------------------
Cross-Nets | Edges
--------------------------------------
Chemical-Chemical | 95,246
Chemical-Gene | 15,424
Function-Function | 119,464
Gene-Function | 481,733
Gene-Protein | 17,930
Disease-Disease | 6,877
Disease-Gene | 42,475,361
Disease-Function | 784,457
Disease-Chemical | 1,334,088
Protein-Protein | 1,847,117,370