TODO.txt


(*) add tests for Rule consistency checks

(*) SMILES_parser : atoms with [..] definition should not be filled with protons,
    only with the explicitly defined within [..] label 
    (so done by Daylight SMILES)

(*) MoleculeUtil::insertGroupID(..) : replace vorspezifizierte gruppen mit 
    identifier (zB ATP, CoA,...)
    - dabei absichern, dass nur proxynode zusätzliche kanten hat.

(*) logo

(*) Tutorials for standard tasks using SGM/GGL
 - graph definition/loading
 - (sub)graph matching
 - ring perception
 - graph loading from or writing to GML (definition of graph via GML)
 - GG rule loading from GML
 - GG rule application
 - SMILES parsing / writing
 - reaction creation via GG application
 
(*) ITS with node mapping, ie. 
   - each atom with ID as class information
   - each bond/atom that is changed during reaction gets a "BEFORE@AFTER" label
     such that all after labels are generated by removing everything "*@"; if 
     this leads to an empty label, the bond is removed.
   -> to be used for reaction rate prediction via NSPDK etc.
 
(*) pattern parser schreiben oder graph parser erweitern
 
(*) constraints nur im 'left' context der rules erlauben und parsen ?!?

(*) chem/GS_SMILES_* mergen since a lot of code redundancy

(*) implementation of SSSR by Figueras

(*) consider that GML chemical rules have to be tuned to and aware of aromatic labels !!! 

(*) BUG / CHECK : GA_OrderCheck funktioniert nur für Rules ohne Node INDEL !!!

 =============
  road map :  (17.02.2011 Xtof, Fabrizio, Martin)
 =============

ok (1) graph kernel integration
ok - generating of graph features for Graph_Interface objects
ok - SVM model class 

(2) graph isomorphism filter based on hash of graph features
 - if equal hash : run GM_vf2
 - check different D/R values and their impact

ok (3) aromaticity perception 
ok - create new aromaticity SVM model subclass
ok - for each ring : graph-kernel features : predict with SVM model
ok - relabeling according to prediction
ok --> different aromaticity predictors based on different data sets (PubChem, ChEBI,..)

(4) aromaticity rewrite in rule application
ok - integrate (3) in MR_ApplyRule via new GS_* instance 
 - apply (3) if ITS produces a ring OR touches a ring
ok - handle cases where relabeling is not unique / not working

(5) jankowski implementation
ok - general implementation
 - Alberty conform pH correction of energy terms

(6) reaction rate estimation
ok - based on arrhenius and delta energy based on (4)
 
(7) reaction rate calculation based on ITS Molecular dynamics
 --> or learned via NSPDK based on that data
 - try classification task first : "likely/unlikely to happen"
   --> later via regression directly rate/deltaE prediction
 - apply for reaction planer


 ======================
  SVM / graph kernel : (17.02.2011 Xtof, Fabrizio, Martin)
 ======================
 
(1) aromaticity detection based on NMR data
 - data (xtof)
 - ring classification extraction from NMR spectrum
 - train and test
 --> new aromaticity predictor

(2) reaction rate prediction
 - given a "simple" reaction (linear free energy relation ...)
   --> DATA ?!?!?
 - derive set of <educt/rule/product> triples + features to learn
   + energies from homologs... 
   + quatum mechanics simulation
   + ...
   --> see (7) from above: using ITS instead of triple
 - train and test
 --> extend to other reactions, if working

(3) molecule energy prediction
 - data based on
   + Jankowski decomposition
   + MD simulation ???
 - evaluation
   + compare SVM to Jankowski
   + what is closer to MD: SVM/Jankowski


(4) graph kernel feature-based canonicalization for SMILES generation
    [STUDENT PROJECT ?!?]
 - defines graph orbits by graph kernel features
 - calculate SMILES based on these orbits
 - test if unique for large set of molecules
   --> generate X node numbering permutations per molecule : test SMILES

(5) active learning of reaction rate prediction (04.07.2011 Fabrizio, Martin)
 - problem: reaction rate calculation to be learned via MD is hard
   --> want to have as less MD runs for training as possible
 - start with data set and test on a large set
   --> take those most uncertain as new MD candidates to increase training set
     --> iterate
     --> show this is fucking great! (lower number of MD, ...)
     

 =============
 to implement:
 =============

 + ToyChemUtil : utilitiy class mit static members die zentrale funktionalitäten
   sammelt (alles aus bin/toyChemUtil.hh etc.)
 + via OpenBabel
   - Molekülgröße
   - Orbitalinformation
   - PROTON/H-Atom Auffüllen
   - aromaticity prediction
 
 library:
 - check OpenBabel for SMILES writer 
   --> ggl::chem::SMILESwriterOB v2.1 : PROBLEM : keine kanonischen SMILES derzeit !!

 
 ======================
  to create test for :
 ======================
 
 * ggl/chem/MoleculeUtil::isConsistent : 
   + ein check fuer jeden error der geworfen werden kann
   
 + ggl/chem/RC_*
 
 
 ===========
  to check:
 ===========
 
 - funktioniert symmetry breaking mit constraints? nicht für alle constraints (zB noEdge)
 
ok - ist die verwendung von ggl::GS_STL_pushUnique schneller als ggl::chem::GS_SMILES
   --> nein


===================
 molecule-checker
===================

wo gebraucht:
 - user input : implemented
 - nach rule application : implemented
 
welche feature gebraucht:
 - aromaticity prediction