-
Notifications
You must be signed in to change notification settings - Fork 6
/
TODO.txt
174 lines (128 loc) · 5.4 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
(*) add tests for Rule consistency checks
(*) SMILES_parser : atoms with [..] definition should not be filled with protons,
only with the explicitly defined within [..] label
(so done by Daylight SMILES)
(*) MoleculeUtil::insertGroupID(..) : replace vorspezifizierte gruppen mit
identifier (zB ATP, CoA,...)
- dabei absichern, dass nur proxynode zusätzliche kanten hat.
(*) logo
(*) Tutorials for standard tasks using SGM/GGL
- graph definition/loading
- (sub)graph matching
- ring perception
- graph loading from or writing to GML (definition of graph via GML)
- GG rule loading from GML
- GG rule application
- SMILES parsing / writing
- reaction creation via GG application
(*) ITS with node mapping, ie.
- each atom with ID as class information
- each bond/atom that is changed during reaction gets a "BEFORE@AFTER" label
such that all after labels are generated by removing everything "*@"; if
this leads to an empty label, the bond is removed.
-> to be used for reaction rate prediction via NSPDK etc.
(*) pattern parser schreiben oder graph parser erweitern
(*) constraints nur im 'left' context der rules erlauben und parsen ?!?
(*) chem/GS_SMILES_* mergen since a lot of code redundancy
(*) implementation of SSSR by Figueras
(*) consider that GML chemical rules have to be tuned to and aware of aromatic labels !!!
(*) BUG / CHECK : GA_OrderCheck funktioniert nur für Rules ohne Node INDEL !!!
=============
road map : (17.02.2011 Xtof, Fabrizio, Martin)
=============
ok (1) graph kernel integration
ok - generating of graph features for Graph_Interface objects
ok - SVM model class
(2) graph isomorphism filter based on hash of graph features
- if equal hash : run GM_vf2
- check different D/R values and their impact
ok (3) aromaticity perception
ok - create new aromaticity SVM model subclass
ok - for each ring : graph-kernel features : predict with SVM model
ok - relabeling according to prediction
ok --> different aromaticity predictors based on different data sets (PubChem, ChEBI,..)
(4) aromaticity rewrite in rule application
ok - integrate (3) in MR_ApplyRule via new GS_* instance
- apply (3) if ITS produces a ring OR touches a ring
ok - handle cases where relabeling is not unique / not working
(5) jankowski implementation
ok - general implementation
- Alberty conform pH correction of energy terms
(6) reaction rate estimation
ok - based on arrhenius and delta energy based on (4)
(7) reaction rate calculation based on ITS Molecular dynamics
--> or learned via NSPDK based on that data
- try classification task first : "likely/unlikely to happen"
--> later via regression directly rate/deltaE prediction
- apply for reaction planer
======================
SVM / graph kernel : (17.02.2011 Xtof, Fabrizio, Martin)
======================
(1) aromaticity detection based on NMR data
- data (xtof)
- ring classification extraction from NMR spectrum
- train and test
--> new aromaticity predictor
(2) reaction rate prediction
- given a "simple" reaction (linear free energy relation ...)
--> DATA ?!?!?
- derive set of <educt/rule/product> triples + features to learn
+ energies from homologs...
+ quatum mechanics simulation
+ ...
--> see (7) from above: using ITS instead of triple
- train and test
--> extend to other reactions, if working
(3) molecule energy prediction
- data based on
+ Jankowski decomposition
+ MD simulation ???
- evaluation
+ compare SVM to Jankowski
+ what is closer to MD: SVM/Jankowski
(4) graph kernel feature-based canonicalization for SMILES generation
[STUDENT PROJECT ?!?]
- defines graph orbits by graph kernel features
- calculate SMILES based on these orbits
- test if unique for large set of molecules
--> generate X node numbering permutations per molecule : test SMILES
(5) active learning of reaction rate prediction (04.07.2011 Fabrizio, Martin)
- problem: reaction rate calculation to be learned via MD is hard
--> want to have as less MD runs for training as possible
- start with data set and test on a large set
--> take those most uncertain as new MD candidates to increase training set
--> iterate
--> show this is fucking great! (lower number of MD, ...)
=============
to implement:
=============
+ ToyChemUtil : utilitiy class mit static members die zentrale funktionalitäten
sammelt (alles aus bin/toyChemUtil.hh etc.)
+ via OpenBabel
- Molekülgröße
- Orbitalinformation
- PROTON/H-Atom Auffüllen
- aromaticity prediction
library:
- check OpenBabel for SMILES writer
--> ggl::chem::SMILESwriterOB v2.1 : PROBLEM : keine kanonischen SMILES derzeit !!
======================
to create test for :
======================
* ggl/chem/MoleculeUtil::isConsistent :
+ ein check fuer jeden error der geworfen werden kann
+ ggl/chem/RC_*
===========
to check:
===========
- funktioniert symmetry breaking mit constraints? nicht für alle constraints (zB noEdge)
ok - ist die verwendung von ggl::GS_STL_pushUnique schneller als ggl::chem::GS_SMILES
--> nein
===================
molecule-checker
===================
wo gebraucht:
- user input : implemented
- nach rule application : implemented
welche feature gebraucht:
- aromaticity prediction