Methodology details, and `write.gmt` helper functions? #18

dereckmezquita · 2022-04-17T19:25:53Z

Hi I came across your package which could potentially save me a lot of work so I thank you.

Could you publish the details on your methods for converting between human to X species? I need this information in order to be able to cite you in my research.

Also will you consider adding helper functions to convert from the data.frame types to a type which can be easily written as a .gmt pathway file?

The text was updated successfully, but these errors were encountered:

igordot · 2022-04-18T01:11:28Z

Thank you for your interest. The gene conversion happen using a different package babelgene. The vignette includes some background info, but let me know if anything is unclear. The code for pre-processing the data is available as well if you really want to dive deep.

There are a few different GMT writer functions available, such as cmapR::write_gmt, pathwayPCA::write_gmt, immcp::write_gmt, and rWikiPathways::writeGMT. I have not tried any of them, but I am not sure another function would be solving any new problems.

dereckmezquita · 2022-04-18T01:41:41Z

Thank you for that, babelgene I will look into that.

And thank you for pointing those write gmt functions out for me.

I've written one myself in the past; I suppose what I was really asking for is helper functions for extracting/selecting a database set for example hallmark and then having it extract the related genes along with gene set description URL and the pathway (gene set) name and genes (in original order) and putting it into a different format which could then written to a file as a gmt.

For example, convert HALLMARK dataset to a list of character vectors (list pathways/gene sets; vector gene sets). This should be a list of 50 elements (50 pathways) (as HALLMARK has only 50 pathways) each element of this list holds a character vector of the pathway (gene set) name first, then the description URL as in the standard GMT distributed by Broad, and then the genes.

This object could could then be written line by line using a \t separator would do it.

The tricky parts I am facing in accomplishing this task is extracting the elements relating to specific gene set collections and getting the original order of the genes in a given gene set.

Might you be able to give me some information as to how I could re-find the original order the genes in a given gene set are supposed to go in? As I've understood GSEA gmt files have gene sets and these are in a specific order from most to least important. I don't see this information (ordering) included in the datasets offered here; am I missing something?

As proof of concept I would like to be able to convert the Homo sapiens data back to separate gmt files, which match those distributed by Broad. I don't know how I would get the gene order though.

I am looking for a way to extract the genes relating to these 5 specific pathway collections:

msigdb.v7.5.1.symbols.gmt.txt
c2.cp.kegg.v7.5.1.symbols.gmt.txt
c2.cp.reactome.v7.5.1.symbols.gmt.txt
c5.go.bp.v7.5.1.symbols.gmt.txt
h.all.v7.5.1.symbols.gmt.txt

Finally thank you again for the package, it is a lot of work - matching human and X species gene names is not a trivial task.

rLannes · 2022-11-22T20:34:41Z

Hi made a custom function,
This function is very BASIC and assume the file does not exist.
feel free to change it as I am very foreign to the R way.

`
to_gmt <- function(data ,gene_id, out_file){

# write a GMT file at <out_file> from the tibble passed in <data> using the column <gene_id> as gene id
# gene_id must take a value present in the tibble colnames.
sets = data %>%  split(x = data[[gene_id]] , f = .$gs_name)
for (name_set in names(sets)){
    description =  data[data$gs_name == name_set, "gs_description"][[1]][1]
    genes = sets[[name_set]]
    genes[length(genes)] = paste(genes[length(genes)], "\n", sep="")
    cat(name_set, description, genes, sep="\t", file = out_file, append = TRUE)
}

}
`

igordot added enhancement New feature or request question Further information is requested labels Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methodology details, and `write.gmt` helper functions? #18

Methodology details, and `write.gmt` helper functions? #18

dereckmezquita commented Apr 17, 2022

igordot commented Apr 18, 2022

dereckmezquita commented Apr 18, 2022 •

edited

Loading

rLannes commented Nov 22, 2022

Methodology details, and write.gmt helper functions? #18

Methodology details, and write.gmt helper functions? #18

Comments

dereckmezquita commented Apr 17, 2022

igordot commented Apr 18, 2022

dereckmezquita commented Apr 18, 2022 • edited Loading

rLannes commented Nov 22, 2022

Methodology details, and `write.gmt` helper functions? #18

Methodology details, and `write.gmt` helper functions? #18

dereckmezquita commented Apr 18, 2022 •

edited

Loading