Are aromatic rings kekulized before assigning parameters? #12
Replies: 7 comments
-
Short answer: Yes Medium answer: Aromaticity is perceived using the MDL model both in the For reasons I vaguely understand (and was surprised to learn!) but would do a poor job of reiterating, MDL considers the bonds in thiophene to be non-aromatic: In [1]: from openeye import oechem
In [2]: from openff.toolkit import Molecule
In [3]: oemol = Molecule.from_smiles("c1ccSc1").to_openeye()
In [4]: oechem.OEAssignAromaticFlags(oemol, oechem.OEAroModel_MDL)
In [5]: for atom in oemol.GetAtoms(): # Same results if you look at bonds
...: print(atom.GetIdx(), atom.IsAromatic())
...:
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False so, given this, it's not a surprise that it's using the generic single- and double-bonded sp2 carbon parameters. The trick here is that aromaticity information is stored separately from bond order in In [2]: [b.is_aromatic for b in Molecule.from_smiles("c1ccSc1").bonds]
Out[2]: [False, False, False, False, False, False, False, False, False]
In [3]: [b.is_aromatic for b in Molecule.from_smiles("c1ccccc1").bonds]
Out[3]: [True, True, True, True, True, True, False, False, False, False, False, False]
In [4]: [b.bond_order for b in Molecule.from_smiles("c1ccSc1").bonds]
Out[4]: [2, 1, 2, 1, 1, 1, 1, 1, 1]
In [5]: [b.bond_order for b in Molecule.from_smiles("c1ccccc1").bonds]
Out[5]: [2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1] So you're getting the behavior: based on MDL definitions, thiophene should get Longer answer: This is a long-standing point of friction in our infrastructure and related tools, mostly because molecule inputs sometimes contain incomplete information. If you're somehow short on reading material, openforcefield/openff-toolkit#697 includes a thorough summary of how a few corner cases can compound upon each other. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for the detailed explanation! @mattwthompson
I tried this in rdkit: rkmol = Chem.MolFromSmiles('c1cscc1')
Chem.SanitizeMol(rkmol, Chem.SANITIZE_ALL ^ Chem.SANITIZE_CLEANUPCHIRALITY)
Chem.SetAromaticity(rkmol, Chem.AromaticityModel.AROMATICITY_MDL) # same result if using AROMATICITY_RDKIT
print([str(b.GetBondType()) for b in rkmol.GetBonds()])
print([at.GetIsAromatic() for at in rkmol.GetAtoms()])
Chem.Kekulize(rkmol)
print([str(b.GetBondType()) for b in rkmol.GetBonds()])
print([at.GetIsAromatic() for at in rkmol.GetAtoms()]) and get ['AROMATIC', 'AROMATIC', 'AROMATIC', 'AROMATIC', 'AROMATIC']
[True, True, True, True, True]
['SINGLE', 'DOUBLE', 'SINGLE', 'SINGLE', 'DOUBLE']
[True, True, True, True, True] rdkit does have a note saying the MDL aromaticity model is not well documented: Did I do anything wrong in this example or does this mean thiophene is a case where OpenEye and RDKit behaves differently? |
Beta Was this translation helpful? Give feedback.
-
I'm not sure why that code is reporting aromaticity, I don't get the same result while going through OpenFF: In [4]: rdmol = Molecule.from_smiles("c1ccSc1").to_rdkit()
In [5]: [b.GetIsAromatic() for b in rdmol.GetBonds()]
Out[5]: [False, False, False, False, False, False, False, False, False] the docs you linked include a mention of five-membered rings always being non-aromatic so I suspect some detail is off in your code. Somebody else who is better with RDKit will probably be able to be of more help than me! |
Beta Was this translation helpful? Give feedback.
-
Chem.SanitizeMol(rkmol, Chem.SANITIZE_ALL ^ Chem.SANITIZE_CLEANUPCHIRALITY) I think that the Chem.SetAromaticity call may not be overwriting the previous values, which are set by the SanitizeMol call. If I instead have the sanitize step skip setting aromaticity, it shows what we would expect: from rdkit import Chem
rkmol = Chem.MolFromSmiles('c1cscc1')
Chem.SanitizeMol(rkmol, Chem.SANITIZE_ALL ^ Chem.SANITIZE_CLEANUPCHIRALITY ^ Chem.SANITIZE_SETAROMATICITY)
#Chem.Kekulize(rkmol, clearAromaticFlags=True) # This is another way to blow away pre-existing aro info
Chem.SetAromaticity(rkmol, Chem.AromaticityModel.AROMATICITY_MDL)
print([str(b.GetBondType()) for b in rkmol.GetBonds()])
print([at.GetIsAromatic() for at in rkmol.GetAtoms()])
Chem.Kekulize(rkmol)
print([str(b.GetBondType()) for b in rkmol.GetBonds()])
print([at.GetIsAromatic() for at in rkmol.GetAtoms()])
|
Beta Was this translation helpful? Give feedback.
-
Thank you. It is somewhat related to the Kekulize function in rdkit: if I use this, as is being used in openff in the above link Chem.Kekulize(rkmol, clearAromaticFlags=True) I got the same result as openff-toolkit. |
Beta Was this translation helpful? Give feedback.
-
Excellent. Thanks for opening the high-quality issue @wenyan4work. We've noticed a similar issue in our work to make a protein force field - The MDL aromaticity model doesn't like some amino acid side chains with 5-membered rings. So we try (and at least mostly succeed) to write the SMIRKS in our force fields so that they assign parameters that keep these sorts of rings flat, even if the aromaticity model doesn't cooperate :-P So I think the answer is "everything is working as intended, it just looks funny". Did this answer your original question? |
Beta Was this translation helpful? Give feedback.
-
Thank you @j-wags . Yes my original question has been answered. I think probably the answer is 'rdkit may benefit from some crowd-funded document and demostration' :) |
Beta Was this translation helpful? Give feedback.
-
I was playing with the 'inspect_assigned_parameters' script in example folder, and tried to label a thiophene ring. The only thing I changed is
molecule = Molecule.from_smiles("c1ccSc1")
in the following codeI got bonds:
All CC and CS bonds are either single or double.
Same thing happens for a simpler benzene. I got bonds:
There is actually 'aromatic' bond definition in the offxml file (b5):
What is the correct behavior?
Beta Was this translation helpful? Give feedback.
All reactions