Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmallMoleculeComponent.to_openff() does not preserve the molecule protonation state #297

Open
LilDojd opened this issue Mar 14, 2024 · 7 comments · May be fixed by #462
Open

SmallMoleculeComponent.to_openff() does not preserve the molecule protonation state #297

LilDojd opened this issue Mar 14, 2024 · 7 comments · May be fixed by #462
Assignees
Milestone

Comments

@LilDojd
Copy link
Contributor

LilDojd commented Mar 14, 2024

Expected Behavior

All explicit hydrogens are conserved when converting to OpenFF Molecule

Current Behavior

OFFMolecule is constructed directly from SmallMoleculeComponent._rdkit with __init__:

def to_openff(self):
"""OpenFF Toolkit Molecule representation of this molecule
Note
----
This is a copy of this object, and modifying the OpenFF copy does not
alter the original object.
"""
from openff.toolkit.topology import Molecule as OFFMolecule
m = OFFMolecule(self._rdkit, allow_undefined_stereo=True)
m.name = self.name
return m

This essentially calls

OFFMolecule.from_rdkit(rdmol, allow_undefined_stereo=True, hydrogens_are_explicit=False)
which leads to rdkit adding hydrogens

Possible Solution

Use .from_rdkit() constructor directly with hydrogens_are_explicit set to True

Steps to Reproduce

  1. Use sdf in gdp_correct.sdf.zip
gdp = SmallMoleculeComponent.from_sdf_file("gdp_correct.sdf")
gdp.to_openff().to_file("gdp_ugly.sdf", "sdf")

> gdp_ugly.sdf:

@LilDojd
Copy link
Contributor Author

LilDojd commented Mar 14, 2024

I know this particular SDF has incorrect bond orders, but I think users' input protonation state should still generally be respected

@ijpulidos
Copy link

ijpulidos commented Jul 24, 2024

That's a good question, is there a special reason why we don't want hydrogens to be explicit? I know there's an use case for this flag, but I can't think of why we would prefer to default to False here.

@mikemhenry
Copy link
Contributor

if a SmallMoleculeComponent is supposed to be explicit (I think it is), then we do need to add hydrogens_are_explicit=True

@IAlibay
Copy link
Member

IAlibay commented Jan 20, 2025

Agreed with everyone here - the base assumption for SMCs is that all hydrogens have been explicitly set. We should enforce this with hydrogens_are_explicit=True.

@IAlibay IAlibay added this to the Release 1.3 milestone Jan 20, 2025
@IAlibay
Copy link
Member

IAlibay commented Jan 20, 2025

@dotsdl I'm adding this to the v1.3 milestone since @jthorton is seeing cases for this happen in standard PLB systems.

@dotsdl
Copy link
Member

dotsdl commented Jan 21, 2025

Sounds good, thanks @IAlibay! Will discuss in today's dev call.

@IAlibay
Copy link
Member

IAlibay commented Jan 21, 2025

Do we know if there are any gotchas we need to be aware when using hydrogens_are_explicit=True?
I.e. is there a reason why this flag isn't on by default with OpenFF (aside from the likely "hydrogens are usually implicit by default in RDKit")?

Possibly pinging @j-wags here.

@dotsdl dotsdl moved this from Sprint - Available to Sprint - In Progress in gufe : advancement sprints Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Sprint - In Progress
Development

Successfully merging a pull request may close this issue.

6 participants