Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract just metabolic subset of genes? #34

Open
avelar-ageing opened this issue Feb 26, 2024 · 3 comments
Open

How to extract just metabolic subset of genes? #34

avelar-ageing opened this issue Feb 26, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@avelar-ageing
Copy link

I am interested in downloading metabolic enzymes from pathways. For example in the omega3 senescence pathway (https://www.wikipathways.org/pathways/WP5424.html) there are various genes that are not directly linked to metabolism, including p21. I think it it should be possible to identify metabolism genes using all genes involved in conversion MIM interactions? Is there a method of just extracting these genes as opposed to all genes in the pathway using the R package?

Thanks

@egonw
Copy link
Member

egonw commented Feb 26, 2024

@DeniseSl22, didn't we write a SPARQL query for this at some point in time? Or was that just on my long wish-/todo list?

@egonw egonw self-assigned this Feb 26, 2024
@egonw egonw added the question Further information is requested label Feb 26, 2024
@egonw
Copy link
Member

egonw commented Feb 26, 2024

The pathway WP5424 is not in the RDF yet, but the following SPARQL should give you some idea how to do this:

SELECT ?wpid ?catalyst ?source ?target WHERE {
  ?pathway a wp:Pathway ;
      dc:identifier / dcterms:identifier ?wpid .
  ?catalysis a wp:Catalysis ;
    dcterms:isPartOf ?pathway ;
    wp:source / rdfs:label ?catalyst ;
    wp:participants ?reaction .
  ?reaction a wp:Interaction .
  OPTIONAL { ?reaction wp:source ?source }
  OPTIONAL { ?reaction wp:target ?target }
} ORDER BY ASC(?catalysis)

@DeniseSl22
Copy link

@avelar-ageing , thanks for your question!
I've modified the query of @egonw slightly, see below.

I believe that the reactions without a clear source and/or target are not relevant in this case (and require some curation on our side). There are also a bunch of interactions between two metabolites which have not been drawn with the MIM-Catalysis interaction type, but with a regular arrow. I've reworked that line in the SPARQL query (see below), so you can comment it out to see the difference in response (# is used for comments in SPARQL).
When only including interactions of type MIM:Catalysis, you would receive 5296 results; if commenting out this line, you get 6189 results (so ~900 more). I've also added a way to unify to one database type (Wikidata, others are possible, e.g. HMDB, ChEBI, PubChem) for the metabolite annotations, in case you would want to merge the data at a later stage. Unifying the enzyme annotations can be done in a similar matter (to HGNC, Ensembl, UniProt, etc.)

Also note that this is for all pathway (WikiPathways and Reactome) and all species.
Hope the above helps, if not ask another question here.

SELECT DISTINCT ?wpid ?catalyst ?source ?sourceDb ?target ?targetDb WHERE {
  ?pathway a wp:Pathway ;
      dc:identifier / dcterms:identifier ?wpid .
 # ?catalysis a wp:Catalysis .
  ?catalysis dcterms:isPartOf ?pathway ;
    wp:source / rdfs:label ?catalyst ;
    wp:participants ?reaction .
  ?reaction a wp:Interaction .
  ?reaction wp:source ?source .
  ?source a wp:Metabolite . 
  OPTIONAL{?source wp:bdbWikidata ?sourceDb .}
  
  ?reaction wp:target ?target .
  ?target a wp:Metabolite . 
  OPTIONAL{?target wp:bdbWikidata ?targetDb .}
} ORDER BY ASC(?source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants