Annotated FASTA, effective transcript renaming, and e-value cutoffs
This version includes several new command line options, a new FASTA output, and a major refactor of the under the hood. For end users, the relevant changes are:
- An annotated FASTA file, named like
<transcriptome_filename>.dammit.fasta
, which has a summary of the annotations in the headers and renamed transcript IDs. - A
--name
option specifying how to rename the transcript IDs in the FASTA header. - A CSV file mapping the new names back to the original names, stored as
<transcriptome_filename>.names.csv
- A
--evalue
option specifying the evalue cutoff to pass to HMMER, Infernal, crb-blast, and LAST.
For developers, the most significant changes can be found in the AnnotetateHandler
class in annotate.py
. Each set of tasks has been put into a separate method, and the get_tasks()
method is now an iterator over all those tasks. Furthermore, the output filenames are now all stored as attributes on the AnnotateHandler
instead of in a results dictionary. The result is that it is now easier to remix this class, which is a necessary step toward a future API for accessing dammit results in Python with the pydata stack.
As usual, dammit can be upgraded with:
pip install dammit --upgrade
Users having problems with database dependencies should try removing their databases.doit.db
file in their databases directory; additionally, resume functionality will fail for runs done with older versions of dammit, as the renamed transcriptome file is a dependency for almost all the tasks (so, take caution with older runs). Conveniently, execution of HMMER in particular is much faster now that their is a sane e-value cutoff, so rerunning older analyses is less prohibitive.
For any other problems, please file an issue on the tracker!