Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDkit MCS Subgraph search failed #60

Open
ano302 opened this issue Dec 16, 2024 · 7 comments
Open

RDkit MCS Subgraph search failed #60

ano302 opened this issue Dec 16, 2024 · 7 comments

Comments

@ano302
Copy link

ano302 commented Dec 16, 2024

I have a problem, that the MCS routine fails depending on atom ordering.
I manipulate a molecule and try to find my original atoms back.
Important is, that it fails when 'removeHs = False'.
Without H it works.

Here is a minimal example:

from rdkit import Chem
import lomap


print(lomap.__version__)
print('First example.')
small_mol = Chem.MolFromSmiles('O=CN1C[C@H]2C[C@H]2C1')
parent_mol = Chem.MolFromSmiles('O=CN1C[C@H]2C[C@H]2C1')

mapping = dict(lomap.MCS.getMapping(small_mol, parent_mol, hydrogens=True))
print(mapping)

print('second example')
mol1_str = """mol1
  MOE2024           3D

 17 18  0  0  1  0  0  0  0  0999 V2000
   -2.6702    1.3462    3.2300 O   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4087    1.7256    2.1070 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7301    2.7069    1.7906 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7166    0.9354    1.2379 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6849    1.4846    0.3171 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.1285    1.7012   -0.6378 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.2432    2.3779    0.7159 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.3589    0.3840    0.1762 C   0  0  1  0  0  0  0  0  0  0  0  0
    0.8340    0.3334   -0.7872 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.1364   -0.1545    1.3376 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.1299    0.3888    2.2666 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.0783   -0.6388    1.1487 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1239   -0.8317    0.8873 C   0  0  1  0  0  0  0  0  0  0  0  0
   -0.3276   -1.8878    0.7103 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4645   -0.5000    1.5383 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.2530   -1.0931    1.1098 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4191   -0.6769    2.5995 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  2  4  1  0  0  0  0
  4  5  1  0  0  0  0
  4 15  1  0  0  0  0
  5  6  1  0  0  0  0
  5  7  1  0  0  0  0
  5  8  1  0  0  0  0
  8  9  1  0  0  0  0
  8 10  1  0  0  0  0
  8 13  1  0  0  0  0
 10 11  1  0  0  0  0
 10 12  1  0  0  0  0
 10 13  1  0  0  0  0
 13 14  1  0  0  0  0
 13 15  1  0  0  0  0
 15 16  1  0  0  0  0
 15 17  1  0  0  0  0
M  END
$$$$
"""
mol2_str="""mol2      
  MOE2024           3D

 17 18  0  0  1  0  0  0  0  0999 V2000
   -1.7017    0.3482    0.1455 N   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3996    1.3949   -0.3379 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4504    2.3158    0.2240 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.9721    1.3024   -1.4066 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5629   -1.5416    2.7219 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2277   -1.1501    1.9520 C   0  0  1  0  0  0  0  0  0  0  0  0
   -2.6433   -1.7097    1.7980 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.6057   -1.9776    0.7060 C   0  0  1  0  0  0  0  0  0  0  0  0
   -1.3576   -2.9039    0.1878 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4953   -0.9910   -0.4764 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.9445    0.2700    1.4164 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.4411   -1.0028    1.5705 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.9255   -2.5453    2.4386 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5155   -1.0703   -0.9473 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.2842   -1.1803   -1.2043 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3044    1.0204    2.1202 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.1222    0.4016    1.2340 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1 10  1  0  0  0  0
  1 11  1  0  0  0  0
  2  3  1  0  0  0  0
  2  4  2  0  0  0  0
  5  6  1  0  0  0  0
  6  7  1  0  0  0  0
  6  8  1  0  0  0  0
  6 11  1  0  0  0  0
  7  8  1  0  0  0  0
  7 12  1  0  0  0  0
  7 13  1  0  0  0  0
  8  9  1  0  0  0  0
  8 10  1  0  0  0  0
 10 14  1  0  0  0  0
 10 15  1  0  0  0  0
 11 16  1  0  0  0  0
 11 17  1  0  0  0  0
M  END
"""
mol1 = Chem.MolFromMolBlock(mol1_str, removeHs = False)
mol2 = Chem.MolFromMolBlock(mol2_str, removeHs = False)

print('Mapping')
mapping = dict(lomap.MCS.getMapping(mol1, mol2, hydrogens=True))
@pbuslaev
Copy link

But why do you need to set hydrogens=True? In general, rdkit MCS works much slower with hydrogens, and can produce poor results.

@ano302
Copy link
Author

ano302 commented Dec 16, 2024

I defined a dihedral angle by the atoms with ids 0,1,2,3.
Then I make some transformations. (Normally i start from a big atom and truncate it). Now I want to know, what are the atom ids in this transformed molecule so I can make a mapping.
image

@pbuslaev
Copy link

I believe, that with hydrogen=False the MCS should still return the mapping that returns hydrogens. The only difference is that hydrogens are not taken into account when computing MCS, which makes sense.

Also, I think that you can achieve what you want, without using a get_mapping method. Just running a wrapper should be enough:

mcs = lomap_mcs.MCS(mol1, mol2)
mapping_string = mcs.all_atom_match_list()
mapping_dict = dict(
    (map(int, v.split(':')) for v in mapping_string.split(','))
)

@ano302
Copy link
Author

ano302 commented Dec 17, 2024

With the example above, I get this:

mapping = dict(lomap.MCS.getMapping(small_mol, parent_mol, hydrogens=True))
print("Hydrogen = True",mapping) # 17 atoms are matched  {0: 2, 1: 1, 2: 3, 3: 0, 4: 9, 5: 13, 6: 14, 7: 7, 12: 5, 14: 10, 15: 15, 16: 16, 9: 6, 10: 11, 11: 12, 13: 4, 8: 8}


mapping = dict(lomap.MCS.getMapping(small_mol, parent_mol, hydrogens=False))
print("Hydrogen = False",mapping) # 8 atoms are matched {0: 2, 1: 1, 2: 0, 3: 6, 4: 5, 6: 3, 7: 7, 5: 4}

mcs = lomap.MCS(mol1, mol2)
mapping_string = mcs.all_atom_match_list()
print('mapping_string',mapping_string)
mapping_dict = dict(
    (map(int, v.split(':')) for v in mapping_string.split(','))
)
print("Mapping dict", mapping_dict) # 2 atoms are matched, {1: 1, 2: 2} 

It works (definition of works: No error, I get a mapping) when I change the following lines


        if mcs_mol.HasSubstructMatch(mcs_mol):
            mcs_sub = mcs_mol.GetSubstructMatch(mcs_mol)
        else:
            raise ValueError('RDkit MCS Subgraph search failed')

to


        if mcs_mol.HasSubstructMatch(mcs_mol):
            mcs_sub = mcs_mol.GetSubstructMatch(mcs_mol)
        #else:
            #raise ValueError('RDkit MCS Subgraph search failed')

mcs_sub is only needed if a png file is generated. Maybe, this code block could go into if fname:?

@pbuslaev
Copy link

I am getting 17 atoms matched:

mcs = MCS(mol1, mol2, threed=True)
mapping_string = mcs.all_atom_match_list()
print("mapping_string", mapping_string)
mapping_dict = dict(
    (map(int, v.split(":")) for v in mapping_string.split(","))
)
print("Mapping dict", mapping_dict)

# mapping_string 0:3,1:1,2:2,3:0,4:10,5:15,6:16,7:5,8:4,9:6,10:11,11:12,12:7,13:8,14:9,15:14,16:13
# Mapping dict {0: 3, 1: 1, 2: 2, 3: 0, 4: 10, 5: 15, 6: 16, 7: 5, 8: 4, 9: 6, 10: 11, 11: 12, 12: 7, 13: 8, 14: 9, 15: 14, 16: 13}

Note, that I am using threed=True

@ano302
Copy link
Author

ano302 commented Dec 17, 2024

With threed, I get the same results. I have to test, if this method will give me the right results for all other molecules.
(I inheritated the program, so have to add some test cases here.)
Is it an option for you to move the block I showed in my last comment to the if clause?
For me, it seems it may be a problem with HasSubstructMatch().

@pbuslaev
Copy link

I am not sure what getMapping method is doing. It has a part of MCS class functionality, but it is not doing lomap-style mcs. Also, the method is not tested and is not wrapped in the gufe_bindings. So, I think it is better to not use the method, and use the class instead, since the mapping in the class is supported, up-to-date, and well tested. As for getMapping method, it either should be reworked, or documentation has to be updated. Currently, at least to me, it is not 100% clear what is the purpose of the method and how it is related to the lomap mapping rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants