Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHEBI:53020 #8

Open
amalik01 opened this issue Dec 13, 2023 · 7 comments
Open

CHEBI:53020 #8

amalik01 opened this issue Dec 13, 2023 · 7 comments

Comments

@amalik01
Copy link

https://wwwdev.ebi.ac.uk/chebi/beta/CHEBI:53020

It seems to add the number 2 in front of the formula which needs fixing.

The Mass, InChI and InChIKey calculated are different to the ones generated by ACD which is unusual.

@eloyfelix
Copy link
Member

for some reason the S GROUP is detected twice, we'll need to look at it so it is probably the reason why molecular formula has the 2 in front and the weights also differ from ACD calculations. I expect ACD show smaller values?

we'll need to check on that

InChI and InChI key values are the ones provided by rdkit by directly calling the InChI software so that's much tricker.

@amalik01
Copy link
Author

The formula and mass generated by ACD is:
image

The InChI and InChIKey is :

InChI=1S/C87H150N2O72P2.C20H34/c1-20(2)5-6-136-162(130,131)161-163(132,133)160-76-37(89-22(4)102)48(113)65(31(15-98)145-76)150-75-36(88-21(3)101)47(112)66(30(14-97)144-75)151-85-64(129)72(157-86-74(53(118)42(107)26(10-93)142-86)159-87-73(52(117)41(106)27(11-94)143-87)158-84-63(128)70(44(109)29(13-96)141-84)156-83-62(127)69(43(108)28(12-95)140-83)154-81-58(123)51(116)40(105)25(9-92)139-81)46(111)35(149-85)19-135-78-61(126)71(155-82-60(125)55(120)68(33(17-100)147-82)153-80-57(122)50(115)39(104)24(8-91)138-80)45(110)34(148-78)18-134-77-59(124)54(119)67(32(16-99)146-77)152-79-56(121)49(114)38(103)23(7-90)137-79;1-7-18(4)12-9-14-20(6)16-10-15-19(5)13-8-11-17(2)3/h20,23-87,90-100,103-129H,5-19H2,1-4H3,(H,88,101)(H,89,102)(H,130,131)(H,132,133);7,11,14-15H,8-10,12-13,16H2,1-6H3/b;18-7+,19-15+,20-14+/t23-,24-,25-,26-,27-,28-,29-,30-,31-,32-,33-,34-,35-,36-,37-,38-,39-,40-,41-,42-,43-,44-,45-,46-,47-,48-,49+,50+,51+,52+,53+,54-,55-,56+,57+,58-,59+,60+,61+,62-,63+,64+,65-,66-,67-,68-,69+,70+,71+,72+,73+,74+,75+,76-,77+,78+,79-,80-,81-,82-,83-,84-,85+,86-,87-;/m1./s1

DSXMJBFOHIPMOF-DSJWLFSWSA-N

@eloyfelix
Copy link
Member

do you know which version of inchi is ACD using?

@amalik01
Copy link
Author

amalik01 commented Dec 13, 2023

I think it is version 1.05

image

@eloyfelix
Copy link
Member

eloyfelix commented Feb 13, 2024

more potential molecules with issues:
37633
51133
51133
53334
53019
53020
53022
53023
53325
53398
28427
81539
81539
84166
141517
81539
53571
53498
53723
53742
59081
59085

all seem to have multiple nested SGROUPS (with SPL, parent list). We should maybe not calculate them until we can find a fix. Some of the structures may be incorrect as well.

@eloyfelix
Copy link
Member

using index and PARENT properties in SGROUPS may give enough infromation to calculate them:

for sg in Chem.GetMolSubstanceGroups(mol):
    print(sg.GetPropsAsDict())

@eloyfelix
Copy link
Member

eloyfelix commented Feb 16, 2024

now fixed in the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants