Version 0.9.0
In addition to bug fixes, documentation updates, and usability improvements, this release includes some larger changes:
-
The off-target bins in .cnn and .cnr files are now assigned the label "Antitarget" instead of "Background" in the "gene" column.
The label "Background" in existing files will still be handled the same way, but new output files generated with CNVkit 0.9.0 and later will use the "Antitarget" label -- so, earlier versions of CNVkit may have problems with files produced by CNVkit 0.9.0. Some command line options and API keyword arguments similarly replace "background" with "antitarget", with shims in place for compatibility with existing scripts. (#171)
-
The sub-packages 'genome' and 'tabio' are now in a separate top-level package 'skgenome', still included in the CNVkit distribution. (See "Python API" below.)
This does not affect the command-line usage of CNVkit, but clears the way to extract a scikit-genome package that can be installed and used separately from CNVkit for computing with genomic intervals.
Documentation
- Link to an example VCF file that contains matched tumor and normal samples and will work nicely with CNVkit.
- Describe the
breaks
command's output columns. (#220) - Show a Python code example customizing a plot with matplotlib.pyplot. (#196)
Dependencies
- pysam: Raise minimum to 0.10; support new version 0.11.2.1 (#218; thanks @chapmanb)
- pandas: Support new version 0.20.1 (#215)
- numpy: Support new version 0.13 (#235, #238)
Commands
batch
:
- Log the CNVkit version number at the start of the run.
- Print a message at the end if no tumor/test samples were specified. (#214)
- Clarify error messages for bad option combinations. (#216)
- Removed the deprecated, suppressed/invisible option
--split
. It was a shim in the 0.8 series to support old scripts.
reference
:
- Ensure the inferred chromosomal sex matches between the targets and antitargets for the same sample. If the inferences do not match, prefer antitargets. (#234, #237)
fix
:
- Warn & don't reweight bins if most antitargets have no/low coverage. This avoids a variety of surprising downstream problems when the input was specified as hybrid capture (the default), but is actually from targeted amplicon sequencing, or otherwise has no reads mapped to most off-target bins.
segment
:
- Log the segmentation method and p-value/q-value threshold.
call
:
- Add option
--center-at
, for re-centering log2 values at a user-specified neutral value. - The option
--center
can be used without an argument, in which case it uses the default centering method 'median'.
diagram
:
- New option
--title
to add a custom title to the top of the generated figure. (#239; thanks @micknudsen)
export vcf
:
- When given a .cnr file corresponding to the usual segmented input file (.cns), emit the CIPOS and CIEND tags in the generated VCF. These indicate the "fuzzy" coordinates of segment breakpoints. Here, the ranges are simply the widths of the underlying bins adjacent to each segment breakpoint. These tags can help meta-methods aggregate/harmonize CNVkit's calls with those of other structural variant callers. (#72)
import-picard
:
- Don't accept directory as an argument (was deprecated).
- Be a little more flexible in filenames accepted: instead of requiring input files to be named
*.targetcoverage.???
or*.antitargetcoverage.???
, strip the full suffix and default to 'targetcoverage.cnn' output suffix, or 'antitargetcoverage.cnn' if input filename contains 'antitarget'. Works the same for filenames following the earlier convention, but now is pretty safe for amplicon targets with arbitrary filenames, and behavior is generally less surprising.
Bug fixes
antitarget
: Don't crash if-g
/--access
is not given (#207)batch
: Don't crash in 'wgs' mode when given just targets (-t
) without a FASTA reference genome sequence (-f
)
-call --filter ampdel
: Drop segments with copy number (cn
field) between 0 and 5, exclusive, as the documentation indicates. Previously, it was just merging adjacent segments with copy number 1--4, but not dropping them. (#222)export cdt
: Match the CDT spec. Fix a regression in which columns could be swapped/misaligned versus the header. Add a dummy "EWEIGHT" row to ensure Java TreeView starts reading data from the correct line in the file.export theta
: Don't crash on bins where reference is NaN. (#168)metrics
,descriptives
: Handle degenerate/trivial cases consistently. (#202)segment
: Handle sample names that are integers with leading zeros. (#213)sex
: Don't crash if chromosomes X and Y are both missing. (#236)- VCF parsing (
call
,scatter
,segment
):- Safely handle small or empty VCF files that previously could trigger a crash during BAF calculation. Now, with an empty VCF an all-blank "baf" will be emitted. (#218, #224; thanks @chapmanb)
- Improve handling of Mutect2 VCF files, somewhat. Mutect2 VCFs are still not recommended as input to CNVkit; try FreeBayes or GATK HaplotypeCaller instead. (#195)
Python API
Moved sub-packages 'genome' and 'tabio' to separate top-level package 'skgenome'
(#201). The top-level cnvlib
API is mostly the same otherwise, but supporting
modules were refactored to decouple skgenome
from cnvlib
and remove
redundancies. In particular:
- Split module
cnvlib.core
split intoskgenome.tabio
andcnvlib.cmdutil
- Remove GenomicArray static method
row2label
in favor of functionsto_label
andfrom_label
in new moduleskgenome.rangelabel
. - The SEG writer in 'tabio' now replaces chromosome names with 1-based integer indices, per SEG spec/convention. The
export seg
command now uses this writer directly.
Scripts
- Remove the script
coverage_bin_size.py
, previously deprecated in favor of theautobin
command. - Add
skg_convert.py
to convert between tabular formats (including BED and UCSC RefFlat). - Deprecate
refFlat2bed.py
in favor ofskg_convert.py
. - Add
cnn_annotate.py
to replace the "gene" field for each bin in a .cnn or .cnr file, given a gene annotation database like refFlat.txt. The need for this comes up occasionally when users notice at the end of an analysis that vendor-annotated targets are not the desired gene names.