Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should i add a figure for pseudogenes? #14

Open
taylorreiter opened this issue Jun 27, 2022 · 0 comments
Open

should i add a figure for pseudogenes? #14

taylorreiter opened this issue Jun 27, 2022 · 0 comments

Comments

@taylorreiter
Copy link
Member

For genomes that had at least species-level representatives in GTDB, the largest source of error was non-coding reads being predicted as coding (Figure @fig:orpheum_fig A).
We hypothesized that these reads originated from pseudogenes as these sequences would likely not be annotated as coding in the genomes from which the reads were simulated from, but may retain some k-mers contained in the database.
To assess this hypothesis, we used annotation files produced by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP), which annotates pseudogenes, for the 23 genomes for which these files were available [@doi:10.1093/nar/gkw569; @doi:10.1093/nar/gkaa1105].
On average, 12.4% (SD = 13.8%) of non-coding reads that were predicted to be coding fell within pseudogenes annotated by the PGAP pipeline.

olga commented: Is there a figure for noncoding reads in pseudogenes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant