Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANi2x subset #276

Open
chrisiacovella opened this issue Oct 10, 2024 · 0 comments
Open

ANi2x subset #276

chrisiacovella opened this issue Oct 10, 2024 · 0 comments

Comments

@chrisiacovella
Copy link
Member

@wiederm Mentioned interested in having a smaller ani2x dataset (larger than our testing set) for training examination.
@jchodera suggested limiting to molecules with C, H, O, which I think is good. This would allow us to more directly compare with PhAlkEthOH.

PhAlkEthOH has 12,271 unique molecules, ANI2x has 16,514 unique molecules. I'm not sure how many molecules are in ANI2x with only C, H, O, but if this number is less than PhAlkEthOH, we can create a smaller subset of it to match.

It might be interesting to see the overlap of these datasets. The ANI2x dataset does not contain the smiles strings for the molecules, but probably could do some other relevant comparisons. I think something as simple as looking at the overlap of molecular weight (since we are limited to CHO) would probably be good. Could also just do this as two plots, one for molecules with O, one for molecules without O.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant