Skip to content

Increase in generalizability of "kallisto bus"

Compare
Choose a tag to compare
@Yenaled Yenaled released this 17 Jan 05:02
· 764 commits to master since this release
83bde90

New features

  • kallisto quant-tcc: This new command can run the EM algorithm on a supplied transcripts-compatibility counts (TCC) matrix file, such as that generated by "bustools count", to generate transcript-level estimates. When a gene-mapping file is supplied, gene-level abundances will also be outputted. Effective length normalization will only be performed if a kallisto index is supplied and if fragment length information is provided.
  • New technologies were added to "kallisto bus": -x SmartSeq3 (--tag can be used to supply a 5′ tag sequence that identifies UMI-containing reads), -x BDWTA (BD Rhapsody), -x Visium (10x Visium), -x SPLIT-SEQ (SPLiT-seq preprocessing), and -x Bulk (for preprocessing non-demultiplexed Bulk RNA-seq files)
  • "kallisto bus" can be run with no technology specified: In this case, it will either process a batch file (supplied via --batch) like in the old "kallisto pseudo" or will process fastQ files supplied directly on the command line, treating each fastQ file or each pair of fastQ file (if --paired is specified) as an individual sample. This is useful for generating BUS files when each sample is in a separate fastQ file. With bustools and kallisto quant-tcc, this feature effectively entirely deprecates the old "kallisto pseudo".
  • Strand-specificity is now enabled by default for 10X, SureCell, CelSeq, BD Rhapsody, and Smart-seq3 UMI technologies (unstranded is default for other technologies) and the user can override this by supplying --fr-stranded, --rf-stranded, and --unstranded options.
  • Various performance improvements (mostly in regards to data ingestion throughput)
  • A minimal form of the kallisto index is outputted in a file named index.saved and a file containing fragment length distributions (flens.txt) is outputted when "kallisto bus" is run on paired-end reads (which can be specified via the option --paired). This is so kallisto quant-tcc can perform effective length normalization should the need arise.

Deprecation

  • "kallisto pseudo" is now deprecated and will be removed in a future release; users should supply batch files of fastQ file names to "kallisto bus" instead

Fixes

  • Issue #319 : header import
  • Issue #272 : "kallisto quant" and "kallisto pseudo" inconsistency (now fixed)