How to extract just metabolic subset of genes? #34

avelar-ageing · 2024-02-26T15:59:33Z

I am interested in downloading metabolic enzymes from pathways. For example in the omega3 senescence pathway (https://www.wikipathways.org/pathways/WP5424.html) there are various genes that are not directly linked to metabolism, including p21. I think it it should be possible to identify metabolism genes using all genes involved in conversion MIM interactions? Is there a method of just extracting these genes as opposed to all genes in the pathway using the R package?

Thanks

egonw · 2024-02-26T16:02:07Z

@DeniseSl22, didn't we write a SPARQL query for this at some point in time? Or was that just on my long wish-/todo list?

egonw · 2024-02-26T20:19:15Z

The pathway WP5424 is not in the RDF yet, but the following SPARQL should give you some idea how to do this:

SELECT ?wpid ?catalyst ?source ?target WHERE {
  ?pathway a wp:Pathway ;
      dc:identifier / dcterms:identifier ?wpid .
  ?catalysis a wp:Catalysis ;
    dcterms:isPartOf ?pathway ;
    wp:source / rdfs:label ?catalyst ;
    wp:participants ?reaction .
  ?reaction a wp:Interaction .
  OPTIONAL { ?reaction wp:source ?source }
  OPTIONAL { ?reaction wp:target ?target }
} ORDER BY ASC(?catalysis)

DeniseSl22 · 2024-03-01T12:51:55Z

@avelar-ageing , thanks for your question!
I've modified the query of @egonw slightly, see below.

I believe that the reactions without a clear source and/or target are not relevant in this case (and require some curation on our side). There are also a bunch of interactions between two metabolites which have not been drawn with the MIM-Catalysis interaction type, but with a regular arrow. I've reworked that line in the SPARQL query (see below), so you can comment it out to see the difference in response (# is used for comments in SPARQL).
When only including interactions of type MIM:Catalysis, you would receive 5296 results; if commenting out this line, you get 6189 results (so ~900 more). I've also added a way to unify to one database type (Wikidata, others are possible, e.g. HMDB, ChEBI, PubChem) for the metabolite annotations, in case you would want to merge the data at a later stage. Unifying the enzyme annotations can be done in a similar matter (to HGNC, Ensembl, UniProt, etc.)

Also note that this is for all pathway (WikiPathways and Reactome) and all species.
Hope the above helps, if not ask another question here.

SELECT DISTINCT ?wpid ?catalyst ?source ?sourceDb ?target ?targetDb WHERE {
  ?pathway a wp:Pathway ;
      dc:identifier / dcterms:identifier ?wpid .
 # ?catalysis a wp:Catalysis .
  ?catalysis dcterms:isPartOf ?pathway ;
    wp:source / rdfs:label ?catalyst ;
    wp:participants ?reaction .
  ?reaction a wp:Interaction .
  ?reaction wp:source ?source .
  ?source a wp:Metabolite . 
  OPTIONAL{?source wp:bdbWikidata ?sourceDb .}
  
  ?reaction wp:target ?target .
  ?target a wp:Metabolite . 
  OPTIONAL{?target wp:bdbWikidata ?targetDb .}
} ORDER BY ASC(?source)

egonw self-assigned this Feb 26, 2024

egonw added the question Further information is requested label Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to extract just metabolic subset of genes? #34

How to extract just metabolic subset of genes? #34

avelar-ageing commented Feb 26, 2024

egonw commented Feb 26, 2024

egonw commented Feb 26, 2024

DeniseSl22 commented Mar 1, 2024

How to extract just metabolic subset of genes? #34

How to extract just metabolic subset of genes? #34

Comments

avelar-ageing commented Feb 26, 2024

egonw commented Feb 26, 2024

egonw commented Feb 26, 2024

DeniseSl22 commented Mar 1, 2024