-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XlogP node generates different values #5
Comments
The cdk one now uses smarts as well, I will check back in see if there was a regression but TBH it should really just use JPLogP. Again could be something subtle like the aromatic bond status if coming from molfile or missing expl H which reading the CDK needs to be present. yeah Xlogp sucks in my opinion |
👍 - I was going to expose the new descriptors but at the moment I'm working my way through the 'breaking' changes by switching out to 2.7. |
More investigation needed. The LogP on PubChem is 0.72, PC also reports XLogP3 as 0.8 so shows the implementation still isn't perfect but 0.47 is closer :-). For ref: ALogP - 0.7122999999999997 The JP is probably over fitted on XLogP, IRRC he left out CDK ALogP because it was broken at the time. There is a test case for this though that notes:
It was ignored as [nH0] used to be rejected. |
cdk/cdk@1d71596 before cdk/cdk@30738c7 - 0.47 Regression somewhere here: |
There were no changes between cdk/cdk@f8f22b0 and cdk/cdk@142caf1 in the XLogP descriptor so it is something external. 2 year gap... so need a better approach to find the issue. |
OK tracked it down to the AminoAcid smarts check
Hmm I'm not sure that molecule should be an amino acid :-) |
OK found it and can see how it happened. I'll can post a PR but not super happy as in my world this pattern is NOT an amino acid, indeed checking the XLogP paper it shows it should NOT be -1.69. Also shows the implementation is still not quite right and TBH so much of the CDK descriptors are partially finished I wouldn't trust them at all. How the regression happened. We updated the query atom container creation "QueryAtomContainer.createBasic" to include aromaticity status on the atoms (cdk/cdk@b154fea#diff-6ef2498d7f5198cfda274535ea47ff7b916aa4c6f1a9aa5b50e9e26b3915cde5R50). Previously it would create queries like this "[#6]:[#6]" - since an aromatic bond implies it's atoms here are aromatic it's cleaner to write this as "c:c". This is all fine except the case where the caller then tries to mess around with the pattern after the fact. In XLogP it would convert Here was where it was dicking around with the query:
Then later on I can and removed all this special query manipulation and replaced it with a SMARTS pattern. Since this was now a no-op (can't have aromatic bond between aliphatic atoms) I made it:
A more accurate pattern to what it needed would have been before the basicQueryContainer change would be:
Indeed changing it to that reverts the value to -1.69 but as I highlight above this is wrong anyways. |
Ref molecule is picolinic acid 689 not 688 (as per test case). What a mess :-) |
Thanks for the info John. If there's already a going to be a some loss in functionality with the upgrade of the KNIME nodes to CDK v2 (ambit/sketcher) maybe there's not all that much utility in bringing some of the nodes with better replacements forward. Could just bin the XLogP node and create a JPLogP (or just a molecular descriptors node that allows the selection of JPLogP |
And why is it calling a copied version of the class named SmartXLogPDescriptor instead of calling the CDK implementation?
The text was updated successfully, but these errors were encountered: