Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query for Complexes without an active subunit in Reactome #327

Open
vanaukenk opened this issue Nov 6, 2024 · 17 comments
Open

Query for Complexes without an active subunit in Reactome #327

vanaukenk opened this issue Nov 6, 2024 · 17 comments

Comments

@vanaukenk
Copy link
Collaborator

  1. Find all complexes in Reactome that are the catalyst for a reaction- keep these
  2. Filter this for complexes that contain gene products - keep these
  3. Filter for complexes that don't have an active subunit - keep these

Output
A list of Reactome complexes - does the complex have an active subunit - in which pathways does the complex catalyze a reaction

@sjm41
Copy link
Collaborator

sjm41 commented Nov 7, 2024

@ukemi
Copy link

ukemi commented Nov 14, 2024

We will take a look at these results on the 'weeds' call.

@deustp01
Copy link
Collaborator

We will take a look at these results on the 'weeds' call.

I haven't looked at the list yet - do you want to wait until next week? By then I should have a much better sense of what needs to be patched and whether some of the patching could be at least partly automated.

@ukemi
Copy link

ukemi commented Nov 14, 2024

Also see #302

@ukemi
Copy link

ukemi commented Nov 14, 2024

We could, but I have found a couple issues with the query. It seems that there are complexes that aren't controllers/enablers/catalysts included, which inflates the numbers. Let's try to clean that up and then get into the real weeds of the biology next week, ok?

@deustp01
Copy link
Collaborator

If it's possible to generate a list of these irregular ones, that would be useful.

@ukemi
Copy link

ukemi commented Nov 14, 2024

It would probably be easier to modify the query. It looks like step 1 didn't work right.

@ukemi
Copy link

ukemi commented Nov 14, 2024

For example, R-HSA-109857 is a binding reaction without a catalyst. We will eventually merge it into the next reaction since it is substrate binding, but it should not be in the results of this query.

@ukemi
Copy link

ukemi commented Nov 14, 2024

Hi @dustine32
Here are some reactions where there is no complex as the enablers/catalysts.
R-HSA-109857 is a binding reaction without a catalyst
R-HSA-6790025 is a black box reaction with a positive regulator, but no catalyst/enabler
R-HSA-111966 is a binding reaction with a negative regulator, but no catalyst/enabler
R-HSA-1801587 is a black box reaction with positive regulators, but no catalyst.

Side note:
R-HSA-1237129 is a reaction with a catalyst that is a complex, but the complex is a single gene product coupled to an ion. Would it be possible to make the gene product the catalyst in cases like this? It would mean that we don't have to hand edit these to choose an active subunit.

@ukemi
Copy link

ukemi commented Nov 14, 2024

PS. Let me know if you want a separate ticket for the side note.

@deustp01
Copy link
Collaborator

deustp01 commented Nov 14, 2024

but the complex is a single gene product coupled to an ion. Would it be possible to make the gene product the catalyst in cases like this?

I'm imagining a script could take the list of components of a complex, remove all items that are not gene products (EWAS class members), and uniquify** that list. If only one EWAS is left, that is the candidate active unit. If more than one, human review would be needed to determine whether there is an emergent function.

**But if two different EWASs both derive from the same UniProt gene product and differ in their modifiedResidue attributes, that's a distinction that is not of interest to GO-CAM (it's a kind of homodimer at the granularity of GO-CAM, I think) and we should figure out how to take one of them as the active unit

@ukemi
Copy link

ukemi commented Nov 14, 2024

Good question about the homodimer. Since in many cases we are summing up to the level of the gene, maybe just take the gene. But this would result in a curation policy to never curate at the level of modified or processed forms. In MGI I think we would use a PRO id if it were available for these. So is it a homodimer?
It's kind of like the old dilemma about homotypic cell adhesion. Is it homotypic cell adhesion if two neurons stick to one another even if one is glutaminergic and one is adrenergic and you don't know that they are not identical? At what level is the homotypic cutoff?

@dustine32
Copy link
Collaborator

@ukemi Thanks for the examples! I see in the first example the complex in the report is actually a regulator, not a catalyst. This quickly points to the report bug, which is checking active sites for any reaction's Control (like a regulator) and not specifically Catalysis. I'll make this code update and regenerate the report.

@dustine32
Copy link
Collaborator

PS. Let me know if you want a separate ticket for the side note.

@ukemi It might be best to clarify this request in the existing #302 ticket you referred to earlier since that also involves breaking down complexes into single protein enablers.

Basically, the code would try shedding all ions (could I just discard all SmallMolecule components?) and then uniqify the proteins to see if only one single protein class remains?

@deustp01
Copy link
Collaborator

could I just discard all SmallMolecule components? ...

Yes, absolutely. Most small molecules / chemicals that form complexes with gene products may be ions, but the ley difference for Reactome and GO-CAM is gene-product versus not-gene-product.

... and then uniqify the proteins to see if only one single protein class remains?

Yes again, where the process exactly is to identify the UniProt IDs associated with all EWAS instances in the complex, and uniquify on those UniProt IDs.
Anyway, that's what I think we should do first. @ukemi points out that this approach will not distinguish among forms of a single gene product (UniProt intsance) that differ funcitonally due to different covalent modifications. This is an information loss going from Reactome to UniProt but my best guess is that if this loss gives us a reliable, easy-to-maintain conversion process that does not require additional ontologized identifiers from REACTO or, hypothetically because they aren't implemented yet and may be hard to maintain, PRO, that's a reasonable trade-off, especially as the lost information is preserved in the Reactome instance reliably mapped to the stripped-down GO-CAM one.

@nataled
Copy link
Collaborator

nataled commented Nov 15, 2024

@deustp01 when you say "they aren't implemented yet" you refer to PRO in GO-CAMs? Because I've had the Reactome-mapped terms for quite some time.

@deustp01
Copy link
Collaborator

@nataled We need some work on the Reactome end, and probably some discussion about maintenance once the mappings are installed. My fault - I let this one drop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants