Skip to content

Version 0.9.0

Compare
Choose a tag to compare
@etal etal released this 17 Aug 18:45
· 573 commits to master since this release

In addition to bug fixes, documentation updates, and usability improvements, this release includes some larger changes:

  • The off-target bins in .cnn and .cnr files are now assigned the label "Antitarget" instead of "Background" in the "gene" column.

    The label "Background" in existing files will still be handled the same way, but new output files generated with CNVkit 0.9.0 and later will use the "Antitarget" label -- so, earlier versions of CNVkit may have problems with files produced by CNVkit 0.9.0. Some command line options and API keyword arguments similarly replace "background" with "antitarget", with shims in place for compatibility with existing scripts. (#171)

  • The sub-packages 'genome' and 'tabio' are now in a separate top-level package 'skgenome', still included in the CNVkit distribution. (See "Python API" below.)

    This does not affect the command-line usage of CNVkit, but clears the way to extract a scikit-genome package that can be installed and used separately from CNVkit for computing with genomic intervals.

Documentation

  • Link to an example VCF file that contains matched tumor and normal samples and will work nicely with CNVkit.
  • Describe the breaks command's output columns. (#220)
  • Show a Python code example customizing a plot with matplotlib.pyplot. (#196)

Dependencies

  • pysam: Raise minimum to 0.10; support new version 0.11.2.1 (#218; thanks @chapmanb)
  • pandas: Support new version 0.20.1 (#215)
  • numpy: Support new version 0.13 (#235, #238)

Commands

batch:

  • Log the CNVkit version number at the start of the run.
  • Print a message at the end if no tumor/test samples were specified. (#214)
  • Clarify error messages for bad option combinations. (#216)
  • Removed the deprecated, suppressed/invisible option --split. It was a shim in the 0.8 series to support old scripts.

reference:

  • Ensure the inferred chromosomal sex matches between the targets and antitargets for the same sample. If the inferences do not match, prefer antitargets. (#234, #237)

fix:

  • Warn & don't reweight bins if most antitargets have no/low coverage. This avoids a variety of surprising downstream problems when the input was specified as hybrid capture (the default), but is actually from targeted amplicon sequencing, or otherwise has no reads mapped to most off-target bins.

segment:

  • Log the segmentation method and p-value/q-value threshold.

call:

  • Add option --center-at, for re-centering log2 values at a user-specified neutral value.
  • The option --center can be used without an argument, in which case it uses the default centering method 'median'.

diagram:

  • New option --title to add a custom title to the top of the generated figure. (#239; thanks @micknudsen)

export vcf:

  • When given a .cnr file corresponding to the usual segmented input file (.cns), emit the CIPOS and CIEND tags in the generated VCF. These indicate the "fuzzy" coordinates of segment breakpoints. Here, the ranges are simply the widths of the underlying bins adjacent to each segment breakpoint. These tags can help meta-methods aggregate/harmonize CNVkit's calls with those of other structural variant callers. (#72)

import-picard:

  • Don't accept directory as an argument (was deprecated).
  • Be a little more flexible in filenames accepted: instead of requiring input files to be named *.targetcoverage.??? or *.antitargetcoverage.???, strip the full suffix and default to 'targetcoverage.cnn' output suffix, or 'antitargetcoverage.cnn' if input filename contains 'antitarget'. Works the same for filenames following the earlier convention, but now is pretty safe for amplicon targets with arbitrary filenames, and behavior is generally less surprising.

Bug fixes

  • antitarget: Don't crash if -g/--access is not given (#207)
  • batch: Don't crash in 'wgs' mode when given just targets (-t) without a FASTA reference genome sequence (-f)
    -call --filter ampdel: Drop segments with copy number (cn field) between 0 and 5, exclusive, as the documentation indicates. Previously, it was just merging adjacent segments with copy number 1--4, but not dropping them. (#222)
  • export cdt: Match the CDT spec. Fix a regression in which columns could be swapped/misaligned versus the header. Add a dummy "EWEIGHT" row to ensure Java TreeView starts reading data from the correct line in the file.
  • export theta: Don't crash on bins where reference is NaN. (#168)
  • metrics, descriptives: Handle degenerate/trivial cases consistently. (#202)
  • segment: Handle sample names that are integers with leading zeros. (#213)
  • sex: Don't crash if chromosomes X and Y are both missing. (#236)
  • VCF parsing (call, scatter, segment):
    • Safely handle small or empty VCF files that previously could trigger a crash during BAF calculation. Now, with an empty VCF an all-blank "baf" will be emitted. (#218, #224; thanks @chapmanb)
    • Improve handling of Mutect2 VCF files, somewhat. Mutect2 VCFs are still not recommended as input to CNVkit; try FreeBayes or GATK HaplotypeCaller instead. (#195)

Python API

Moved sub-packages 'genome' and 'tabio' to separate top-level package 'skgenome'
(#201). The top-level cnvlib API is mostly the same otherwise, but supporting
modules were refactored to decouple skgenome from cnvlib and remove
redundancies. In particular:

  • Split module cnvlib.core split into skgenome.tabio and cnvlib.cmdutil
  • Remove GenomicArray static method row2label in favor of functions to_label and from_label in new module skgenome.rangelabel.
  • The SEG writer in 'tabio' now replaces chromosome names with 1-based integer indices, per SEG spec/convention. The export seg command now uses this writer directly.

Scripts

  • Remove the script coverage_bin_size.py, previously deprecated in favor of the autobin command.
  • Add skg_convert.py to convert between tabular formats (including BED and UCSC RefFlat).
  • Deprecate refFlat2bed.py in favor of skg_convert.py.
  • Add cnn_annotate.py to replace the "gene" field for each bin in a .cnn or .cnr file, given a gene annotation database like refFlat.txt. The need for this comes up occasionally when users notice at the end of an analysis that vendor-annotated targets are not the desired gene names.