Release Annotated FASTA, effective transcript renaming, and e-value cutoffs · dib-lab/dammit

This version includes several new command line options, a new FASTA output, and a major refactor of the under the hood. For end users, the relevant changes are:

An annotated FASTA file, named like <transcriptome_filename>.dammit.fasta, which has a summary of the annotations in the headers and renamed transcript IDs.
A --name option specifying how to rename the transcript IDs in the FASTA header.
A CSV file mapping the new names back to the original names, stored as <transcriptome_filename>.names.csv
A --evalue option specifying the evalue cutoff to pass to HMMER, Infernal, crb-blast, and LAST.

For developers, the most significant changes can be found in the AnnotetateHandler class in annotate.py. Each set of tasks has been put into a separate method, and the get_tasks() method is now an iterator over all those tasks. Furthermore, the output filenames are now all stored as attributes on the AnnotateHandler instead of in a results dictionary. The result is that it is now easier to remix this class, which is a necessary step toward a future API for accessing dammit results in Python with the pydata stack.

As usual, dammit can be upgraded with:

pip install dammit --upgrade

Users having problems with database dependencies should try removing their databases.doit.db file in their databases directory; additionally, resume functionality will fail for runs done with older versions of dammit, as the renamed transcriptome file is a dependency for almost all the tasks (so, take caution with older runs). Conveniently, execution of HMMER in particular is much faster now that their is a sane e-value cutoff, so rerunning older analyses is less prohibitive.

For any other problems, please file an issue on the tracker!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotated FASTA, effective transcript renaming, and e-value cutoffs