Version 0.8.3
Bug fixes and a few usability improvements. Notably, for the whole-genome sequencing workflow (batch -m wgs
), bin size is now inferred from a sample's genome-wide coverage depth instead of using a fixed value, which should yield better results by default.
Dependencies
- scipy: Raise minimum version to 0.15 (for the function
scipy.stats.median_test
)
New scripts
coverage_bin_size.py
: Quickly estimate on- and off-target read depths to suggest reasonable bin sizes to use with thetarget
andantitarget
orbatch
commands. (#170)guess_baits.py
: In case the baited regions for a target capture panel are not known, use sample BAM files from sequencing with that panel to infer the likely captured regions. Works either guided, given a list of potential targets (e.g. all exons in a genome), or unguided, scanning all sequencing-accessible bases in the genome to find areas with elevated coverage.
Both scripts are preliminary and may be removed in a future release.
Global changes
- Infer read lengths automatically from the given sample BAM files where needed (
coverage
andbatch
). Remove the hard-coded parametercnvlib.params.READ_LEN
. (#74) - Handle VCFs generated by LoFreq. This program does not emit sample genotypes, but locus depths and allele frequencies can be found in the INFO column instead -- unusual but technically within the VCF spec. (#173)
Commands
batch
, coverage
, segment
:
- The option
-p
/--processes
can now be used without an argument to specify parallelizing across all available CPUs. The now-optional argument value is the maximum number of CPUs to use; the special value-p 0
was previously used to specify all CPUs (this still works).
batch
:
- Automatically estimate a reasonable average bin size in the whole-genome workflow,
-m wgs
, using a fast estimate of a given normal/control sample's genome-wide average coverage depth. (If multiple normals are given, the median-sized sample is used for this calculation.) This allows CNVkit to handle low-coverage/low-pass WGS data better by default. (#170)
coverage
:
- With
--count
, count all reads that overlap a region, but trim any portions of each read aligned outside the region from the number of bases counted. The result should now be closer to that without--count
.
scatter
:
- In chromosome-level plots, the displayed x-axis range now matches the specified region (via
-c
or-g
+-w
) exactly. Previously, the displayed range depended on the bin locations. (#180)
Bug fixes
antitarget
: Handle empty off-target regions safely. (bcbio/bcbio-nextgen#1696)export theta
: Rename argument--min-depth
to--min-variant-depth
, matching the equivalent argument in other commands. (#178; thanks @myronpeto)scatter
: Warn, don't crash, if a region in--region-list
covers no bins. (#174; thanks @gabeng)
API changes
- New module
cnvlib.samutil
for convenience functions on BAM files, using pysam. - New module
cnvlib.autobin
supporting the scriptcoverage_bin_size.py
. (#170) - Removed sub-package
cnvlib.ngfrills
, moving most functionality tosamutil
andtabio
. - genome.GenomicArray: New method
total_range_size
, similar to pybedtoolstotal_coverage()