Skip to content

Releases: Gaius-Augustus/GALBA

v1.0.11 - deleting transcripts with CDS features on opposite strands

22 Dec 12:43
Compare
Choose a tag to compare

X-Mas Release 2023: DIAMOND denoises AUGUSTUS predictions

15 Dec 16:04
Compare
Choose a tag to compare

This released was inspired by the manuscript

Newly Sequenced Genomes Reveal Patterns of Gene Family Expansion in select Dragonflies (Odonata: Anisoptera)

by

Ethan R. Tolman, Christopher D. Beatty, Paul B. Frandsen, Jonas Bush, Or R. Bruchim, Ella Simone Driever, Kathleen M. Harding, Dick Jordan, Manpreet K. Kohli, Jiwoo Park, Seojun Park, Kelly Reyes, Mira Rosari, Jisong L. Ryu, Vincent Wade, Jessica L. Ware

https://doi.org/10.1101/2023.12.11.569651

The authors state in the manuscript:

"While our genome annotations initially had a high (>50,000) number of genes compared to the annotation of P. flavescens, by conservatively retaining only genes which had a BLAST hit to a protein sequence from P. flavescens [27], we were able to generate highly complete annotations (fig 1. A,B), further supporting the efficacy of this pipeline in insects."

I adopted the idea, added a new script filter_gtf_by_diamond_against_ref.py that does the same thing, using DIAMOND. I chose diamond only because of speed, the result should be highly similar to using BLAST.

Calling the script is integrated into galba.pl . This approach can substantially increase specificity for a marginal tradeoff in specificity.

Accuracy comparison before and after DIAMOND filter for denoising AUGUSTUS predictions in GALBA:

D. melanogaster

before

gene_Sn	71.07
gene_Sp	71.09
trans_Sn	48.45
trans_Sp	63.74
cds_Sn	78.45
cds_Sp	87.54

after

gene_Sn	71.02
gene_Sp	73.28
trans_Sn	48.42
trans_Sp	65.42
cds_Sn	78.43
cds_Sp	88.90

Mus musculus

before

gene_Sn	70.64
gene_Sp	38.34
trans_Sn	28.70
trans_Sp	35.26
cds_Sn	77.43
cds_Sp	82.34

after

gene_Sn	70.29
gene_Sp	66.63
trans_Sn	28.55
trans_Sp	56.33
cds_Sn	77.10
cds_Sp	92.23

Acknowledgement

We thank Tolman et al. for describing this very simple but highly effective idea!

Debugged accuracy evaluation, improved training gene selection

18 Sep 11:30
Compare
Choose a tag to compare
  • @tomasbruna changed miniprothint to additionally output only the best gene per locus (instead of several) -> these are now training genes in GALBA
  • debugged automated accuracy evaluation

Improved training gene selection

15 Sep 15:03
Compare
Choose a tag to compare
  • @tomasbruna extended miniprothint to output training genes for GALBA. His implementation is much better than the original implementation in GALBA. GALBA therefore now uses this miniprothint functionality
  • @tomasbruna also improved specificity of hints, it should now be safter to use proteins of more distant degree of relatedness (accuracy tests on large scale still pending)
  • galba_cleanup has been ported to python (no change in functionality)

Fixing redundant augargs

15 May 14:22
Compare
Choose a tag to compare

Related to this issue: #32 (comment)
I fixed that augargs** are not passed twice to pygustus when running AUGUSTUS in ab initio mode

Alternative transcript prediction restored

30 Mar 08:51
f96a026
Compare
Choose a tag to compare

Key difference to the previous release is a bugfix that restores prediction of alternative transcripts if evidence for such is present

Running miniprot only once

29 Mar 14:56
4c05e45
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.0.4...v1.0.5

Better jsonfile protection

15 Mar 20:03
82a8276
Compare
Choose a tag to compare
  • pygustus jsonfile is now locked during fixing, this makes it safe to run multiple Galba processes in parallel
  • usexisting disappears from instructions in case of error

pygustus json config file fix

12 Mar 10:34
Compare
Choose a tag to compare
  • automatically update an outdated (typo containing) json file with pygustus and augustus parameters in $AUGUSTUS_CONFIG_PATH/parameters/ if necessary
  • redirect miniprot stderr output to file
  • catch star containing lines in miniprot output
  • bugfix in miniprothint (for case of only 1 reference proteome/coverage 1, actual fix is in miniprothint repository)

miniprothint integration

08 Mar 12:38
Compare
Choose a tag to compare
  • initial miniprothint integration boosts accuracy
  • iterative training boosts accuracy
  • runtime is much worse than previous release
  • will in the future take measures to speed up GALBA