Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not able to smirksify double bonds in different clusters for cis/trans butene #100

Open
wutobias opened this issue Apr 29, 2022 · 1 comment

Comments

@wutobias
Copy link

I am trying to use chemper.smirksify.SMIRKSifier to build a list of smarts patterns from clusters for cis- and trans-butene. Based on my clustering, CC single bonds should be discriminated between cis- and trans-butene. I.e. the CC single bonds are in the same cluster within a molecule but in different clusters between the two different molecules (see CC_single_different below). When running chemper.smirksify.SMIRKSifier to build the smarts list, I am getting the following error message:

ClusteringError: 
                      SMIRKSifier was not able to create SMIRKS for the provided
                      clusters with 5 layers. Try increasing the number of layers
                      or changing your clusters

I suppose this results from not finding matches with reference types, but this is just a guess. Is there a way to build a smarts list using SMIRKSifier that discriminates the two different CC single bonds? If not, what would be a good place in the code to start to implement that?

Here is an example that illustrates the above (see Buten.zip for rdkit molecules attached as JSON):

from chemper.mol_toolkits import mol_toolkit
from chemper.smirksify import SMIRKSifier, print_smirks
from rdkit import Chem

with open("./cis-Buten.json", "r") as fopen:
    cis_buten = Chem.JSONToMols(
        fopen.read()
    )[0]
with open("./trans-Buten.json", "r") as fopen:
    trans_buten = Chem.JSONToMols(
        fopen.read()
    )[0]


CC_single_different = [ 
    ('cc_single1', [[(0, 1), (2, 3)], []]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
            [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]
           ]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
    ('cc_single2', [[], [(0, 1), (2, 3)]])
]

CC_single_same = [ 
    ('cc_single', [[(0, 1), (2, 3)], [(0, 1), (2, 3)]]),
    ('ch_single', [[(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)],
           [(0, 4), (0, 5), (0, 6), (1, 7), (2, 8), (3, 9), (3, 10), (3, 11)]]),
    ('cc_double', [[(1, 2)], [(1, 2)]]),
]

molecules = [cis_buten, trans_buten]

### The following works nicely.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_same, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)

### The following will not work.
bond_smirksifier = SMIRKSifier(
    molecules, 
    CC_single_different, 
    max_layers=5,
    strict_smirks=True,
    verbose=False
)
smirks3k = bond_smirksifier.reduce(max_its=100, verbose=False)
print_smirks(smirks3k)
@davidlmobley
Copy link
Member

Per discussion on Slack, it looks like we don't sample decorators for chirality, so this would have to be added. I think we're willing to take a pull request to do that if you'd like to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants