Skip to content

Annotated FASTA, effective transcript renaming, and e-value cutoffs

Compare
Choose a tag to compare
@camillescott camillescott released this 10 Dec 04:44
· 416 commits to master since this release

This version includes several new command line options, a new FASTA output, and a major refactor of the under the hood. For end users, the relevant changes are:

  • An annotated FASTA file, named like <transcriptome_filename>.dammit.fasta, which has a summary of the annotations in the headers and renamed transcript IDs.
  • A --name option specifying how to rename the transcript IDs in the FASTA header.
  • A CSV file mapping the new names back to the original names, stored as <transcriptome_filename>.names.csv
  • A --evalue option specifying the evalue cutoff to pass to HMMER, Infernal, crb-blast, and LAST.

For developers, the most significant changes can be found in the AnnotetateHandler class in annotate.py. Each set of tasks has been put into a separate method, and the get_tasks() method is now an iterator over all those tasks. Furthermore, the output filenames are now all stored as attributes on the AnnotateHandler instead of in a results dictionary. The result is that it is now easier to remix this class, which is a necessary step toward a future API for accessing dammit results in Python with the pydata stack.

As usual, dammit can be upgraded with:

pip install dammit --upgrade

Users having problems with database dependencies should try removing their databases.doit.db file in their databases directory; additionally, resume functionality will fail for runs done with older versions of dammit, as the renamed transcriptome file is a dependency for almost all the tasks (so, take caution with older runs). Conveniently, execution of HMMER in particular is much faster now that their is a sane e-value cutoff, so rerunning older analyses is less prohibitive.

For any other problems, please file an issue on the tracker!