Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finding BioProject associated with a SRA SRP id #5

Open
bswhite opened this issue May 5, 2020 · 4 comments
Open

finding BioProject associated with a SRA SRP id #5

bswhite opened this issue May 5, 2020 · 4 comments

Comments

@bswhite
Copy link

bswhite commented May 5, 2020

library(rentrez)
lookup.srp <- function(srp) {
r_search <- entrez_search(db="gds", term=paste0(srp, "[ACCN]"))
r_search$ids
}

get.sra.bioproject <- function(srp) {
ids <- lookup.srp(srp)
if(length(ids) != 1) { stop("Got multiple ids\n") }
gse.id <- entrez_summary(db="gds", id=ids)$accession
get.gse.bioproject(gse.id)
}

> get.sra.bioproject("SRP212810")

[1] "PRJNA552370"

@jaeddy
Copy link

jaeddy commented May 8, 2020

@bswhite I copied this to a wiki page to keep as a reference.

@jaeddy jaeddy closed this as completed May 8, 2020
@bswhite
Copy link
Author

bswhite commented May 13, 2020

@jaeddy do you know how to translate a BioProject PRJNA id into an SRA SRP ID? i.e., the reverse of above? I spent an hour screwing around with rentrez last night to now available. I don't understand the various NCBI databases and their links -- is there a data model? I have seen rentrez_dbs(), rentrez_db_links(), but they're only so helpful.

@bswhite bswhite reopened this May 13, 2020
@jaeddy
Copy link

jaeddy commented May 13, 2020

@bswhite probably the best way for now is just to use this table — e.g.. I combined the BioProject metadata with all of the matched study, run, experiment, sample metadata from SRA; I think most of the relevant GEO/GSE information should be covered as well.

I can share the code I used, but it's pretty convoluted. Let me know if you run into any cases where you can't find a match. The BPs in the table are all those matched to our PubMed IDs, so it's possible we're missing some (for datasets that aren't linked to publications).

@vpchung vpchung transferred this issue from mc2-center/csbc-pson-dcc Aug 30, 2022
@vpchung
Copy link
Member

vpchung commented Aug 30, 2022

@mc2-center/data-team I forget, is BioProject still part of any of the Publication tables/manifests? If not, I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants