Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Exon to map against an association table between the transcript and exon tables #85

Open
jsaquing opened this issue Sep 24, 2021 · 2 comments
Labels
design high-level design ideas, brainstorming postponed This should be handled in the future, but is low priority right now

Comments

@jsaquing
Copy link
Contributor

Exon objects in Biosurfer are considered different if they belong to different Transcript objects, even if they have the same coordinates. However, this leads to duplication of information in the underlying SQL database's exon table and long insert query times when populating the database. (I haven't measured select query times specifically but I imagine they are impacted as well.)

A possible solution might be to have the exon table only store a single row for each unique exon (as defined by its genomic coordinates), while somehow still mapping each Exon object to a combination of a genomic exon and a transcript from the database. It seems like this can be accomplished by having a transcript-exon association table (where column 1 holds transcript IDs, column 2 holds exon IDs, and each row represents a transcript-includes-exon relationship) and then mapping Exon against a join of the association and exon tables, as per [1].

@jsaquing jsaquing added postponed This should be handled in the future, but is low priority right now design high-level design ideas, brainstorming labels Sep 24, 2021
@gsheynkman
Copy link
Member

How much of an issue is this? Does this fall under "nice to optimize" or "we have to do this because we are dead in the water"?

@jsaquing
Copy link
Contributor Author

It's something I'd like to try optimizing eventually, but it isn't necessary right now (hence the postponed label). I just wanted to write this down for future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design high-level design ideas, brainstorming postponed This should be handled in the future, but is low priority right now
Projects
None yet
Development

No branches or pull requests

2 participants