You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For genomes that had at least species-level representatives in GTDB, the largest source of error was non-coding reads being predicted as coding (Figure @fig:orpheum_fig A).
We hypothesized that these reads originated from pseudogenes as these sequences would likely not be annotated as coding in the genomes from which the reads were simulated from, but may retain some k-mers contained in the database.
To assess this hypothesis, we used annotation files produced by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP), which annotates pseudogenes, for the 23 genomes for which these files were available [@doi:10.1093/nar/gkw569; @doi:10.1093/nar/gkaa1105].
On average, 12.4% (SD = 13.8%) of non-coding reads that were predicted to be coding fell within pseudogenes annotated by the PGAP pipeline.
olga commented: Is there a figure for noncoding reads in pseudogenes?
The text was updated successfully, but these errors were encountered:
olga commented: Is there a figure for noncoding reads in pseudogenes?
The text was updated successfully, but these errors were encountered: