Skip to content

Version 0.8

Compare
Choose a tag to compare
@etal etal released this 13 Sep 22:48
· 859 commits to master since this release

This is a larger release and the first update since our publication.

CNVkit now runs under Python 3 as well as 2.7. (#3, #101; thanks @mpschr)

File format changes:

  • New "depth" column in .cnn, .cnr, .cns
  • In .cns, "weight" is the sum, not mean, of bin-level weights within the segment

New script cnn_updater.py can be used to add the "depth" column to existing .cnn, .cnr and .cns files. However, most CNVkit commands should still work with pre-v0.8 files without using this script first. For best results, rebuild the .cnr and .cns for an ongoing study using the existing targetcoverage, antitargetcoverage and reference .cnn files.

Algorithmic changes:

  • reference, gender, call, diagram, export: Gender, or chromosomal sex, is now inferred with a statistical test instead of a fixed threshold, significantly improving the inferences on noisy or aneuploid samples. (#116)
  • reference, fix, call: Center log2 values by median of chromosome medians, by default. (#114)
  • reference, metrics, segmetrics: Improve the calculation of biweight location and biweight midvariance (now in descriptives.py).

These deprecated components (since 0.7.x) have been removed:

  • Commands rescale and loh -- use call and scatter, respectively, instead
  • Some options in export bed and export theta -- use call first instead
  • Script genome2access.py -- use cnvkit.py access instead

Updated commands:

batch:

  • New option --method, with choices "hybrid" (default), "wgs", "amplicon", to simplify/streamline usage with whole-genome or amplicon sequencing protocols. See documentation for details; in short, "wgs" and "amplicon" do not use antitargets or the edge/density bias correction; "wgs" by default uses the sequencing-accessible genome as the targets, and uses a more stringent significance threshold for segmentation.
  • Hide/deprecate --split option; it's always on now. To ensure bin coordinates do not change between batch runs (they generally won't anyway), use the -r/--reference option instead of specifying -t and -a in batch.
  • Add --drop-low-coverage option, which is passed to segment internally.
  • The -p/--processes option is also passed to coverage and segment internally (see below).

antitarget:

  • Increase the default average bin size from 100kb to 200kb.

coverage:

  • Parallelize coverage calculation over BED rows. The number of threads can be specified with the -p option. (#121; thanks @brentp)

segment:

  • Parallelize CBS and Haar segmentation methods across chromosomes. (#123, #125; thanks @brentp)

call:

  • New --filter option, with choices 'cn', 'ampdel', 'ci', 'sem' implemented.
  • With VCF b-allele frequencies (-v, 'baf'), always calculate the allele-specific integer copy numbers 'cn1' and 'cn2' so that 'cn1' is the larger one. BAF mirror direction stays majority-rules. (#105; thanks @mpschr)
  • If b-allele frequencies are used and total copy number is zero, report allelic copy numbers as 0, not NaN.

scatter:

  • Add --title option.
  • Allow selecting & labeling gene(s) w/ only segments as input.

heatmap, scatter:

  • Allow saving plots in any image file format supported by matplotlib, not just The file format is determined by the output filename's extension, e.g. 'png' saves in PNG format -- making it easier to integrate CNVkit plots with HTML reports. (#120; thanks @chapmanb)

diagram:

  • Add -g/--gender option to specify sample's known gender.

gainloss:

  • Make output tables more consistent across options. Show individual gene names (rather than all genes grouped within a segment in 1 row); don't show rows with no gene name; report the segment probe count instead of number of probes within the gene; show any extra columns present in the input .cns file. (#107, #108; thanks @mpschr)

gender:

  • Show column headers and Y-chromosome log2 values in the output table.

segmetrics:

  • Add stats options for mean, median, mode
  • Add MSE, SEM stats as options

metrics, segmetrics:

  • Add --drop-low-coverage option (like in segment and gainloss)

Internals:

  • New sub-package tabio: a more robust I/O framwork unifying support for tabular formats, including CNVkit's .cnn/.cnr/.cns, BED, SEG, VCF, GATK/Picard interval list, and text coordinates (chr:start:end). Base class GenomicArray and its derived classes CopyNumArray and VariantArray do not implement their own I/O, but rather are instantiated via tabio. The "import-" commands use this as well.
  • Removed rary.RegionArray; all functionality is now in tabio and GenomicArray.
  • New module "descriptives.py" implements descriptive statistics on plain numpy arrays or pandas Series instances, independent of CNVkit.
  • Better testing on Travis, covering Python 2.7, 3.4 and 3.5, on both Linux and OS X (thanks @kyleabeauchamp, @rmcgibbo, and @mpharrigan; #110)

Bug fixes:

  • batch: Errors in parallel processes will immediately be raised as exceptions at the top level, rather than dying silently. Previously, no error would occur until a missing output file was needed later in the pipeline. (#55)
  • segment:
    • Skip possible R warning text when parsing CBS output (#106) and run Rscript with the --vanilla option (#112; thanks @jsmedmar). Non-isolated R processes were prone to add various warning messages to the expected SEG output, which could crash the "segment" command for some users.
    • Handle zero-weight bins better (#128; thanks @chapmanb).
  • scatter:
    • Handle selected segments with an empty gene name (#104; thanks @mpschr).
    • Don't crash on zero-length GenomicArray/CopyNumArray inputs.
  • VCF parsing (now within tabio) improved:
    • More robust to missing genotype (GT) & depth (DP) fields (#102)
    • Handle VCFs from MuTect2 (#122)
  • export theta: don't crash when SNP VCF is a single, unpaired sample, or if segmented input (.cns) is empty.
  • heatmap: Avoid a possible crash if a sample is missing a chromosome.

Packaging:

  • Universal wheels are enabled for installation with pip (setup.cfg).

New & updated dependencies:

  • futures
  • futurize
  • numpy raised to version 1.9
  • pandas raised to version 0.18.1
  • pysam version 0.9.1.1 is specifically excluded