Skip to content

Version 0.8.3

Compare
Choose a tag to compare
@etal etal released this 18 Jan 02:02
· 700 commits to master since this release

Bug fixes and a few usability improvements. Notably, for the whole-genome sequencing workflow (batch -m wgs), bin size is now inferred from a sample's genome-wide coverage depth instead of using a fixed value, which should yield better results by default.

Dependencies

  • scipy: Raise minimum version to 0.15 (for the function scipy.stats.median_test)

New scripts

  • coverage_bin_size.py: Quickly estimate on- and off-target read depths to suggest reasonable bin sizes to use with the target and antitarget or batch commands. (#170)
  • guess_baits.py: In case the baited regions for a target capture panel are not known, use sample BAM files from sequencing with that panel to infer the likely captured regions. Works either guided, given a list of potential targets (e.g. all exons in a genome), or unguided, scanning all sequencing-accessible bases in the genome to find areas with elevated coverage.

Both scripts are preliminary and may be removed in a future release.

Global changes

  • Infer read lengths automatically from the given sample BAM files where needed (coverage and batch). Remove the hard-coded parameter cnvlib.params.READ_LEN. (#74)
  • Handle VCFs generated by LoFreq. This program does not emit sample genotypes, but locus depths and allele frequencies can be found in the INFO column instead -- unusual but technically within the VCF spec. (#173)

Commands

batch, coverage, segment:

  • The option -p/--processes can now be used without an argument to specify parallelizing across all available CPUs. The now-optional argument value is the maximum number of CPUs to use; the special value -p 0 was previously used to specify all CPUs (this still works).

batch:

  • Automatically estimate a reasonable average bin size in the whole-genome workflow, -m wgs, using a fast estimate of a given normal/control sample's genome-wide average coverage depth. (If multiple normals are given, the median-sized sample is used for this calculation.) This allows CNVkit to handle low-coverage/low-pass WGS data better by default. (#170)

coverage:

  • With --count, count all reads that overlap a region, but trim any portions of each read aligned outside the region from the number of bases counted. The result should now be closer to that without --count.

scatter:

  • In chromosome-level plots, the displayed x-axis range now matches the specified region (via -c or -g + -w) exactly. Previously, the displayed range depended on the bin locations. (#180)

Bug fixes

  • antitarget: Handle empty off-target regions safely. (bcbio/bcbio-nextgen#1696)
  • export theta: Rename argument --min-depth to --min-variant-depth, matching the equivalent argument in other commands. (#178; thanks @myronpeto)
  • scatter: Warn, don't crash, if a region in --region-list covers no bins. (#174; thanks @gabeng)

API changes

  • New module cnvlib.samutil for convenience functions on BAM files, using pysam.
  • New module cnvlib.autobin supporting the script coverage_bin_size.py. (#170)
  • Removed sub-package cnvlib.ngfrills, moving most functionality to samutil and tabio.
  • genome.GenomicArray: New method total_range_size, similar to pybedtools total_coverage()