Releases: cbg-ethz/V-pipe
V-pipe 3.0 - prerelease 1
Peer-review for publication
V-pipe 3.0 - prerelease 0
Prerelease for publication.
2.99.3 Primers
New features:
-
primer trimming using iVar (by default) or samtools
-
per sample primer protocol using new 4-column sample.tsv format (e.g.: for long running projects where protocols change to adapt to new variants)
-
frameshift_deletions_checks
: report on stop codons
Bug fixes:
- correctly compute runtime requirement within groups
Preview feature:
- first re-introduction of benchmarking features since V-pipe 2.0
see: resources/auxiliary_workflows/benchmark/
Documentation:
- Improved tutorial available inside subdirectory docs/
- V-pipe now relies on snakemake 7.11 to properly handle 'runtime' resource. Consider upgrading !
- by default the samples' tables
coverage.tsv.gz
andbasecnt.tsv.gz
are now 1-based: the first position of the genome is named '1' (as standard with genome notations, as in tools like samtools, and as used in formats like VCF)
This can be switched back to 0-based (numbering starts at 0, as in python tools like PySAM, and as in the BED format) in config file, in section general option tsvbased
2.99.2 - Reports and uploads
New features:
-
Mass-importer tools to assist building
samples/
directory structure from.fastq.gz
files.
(see: utils/README.md) -
Dehuman: V-pipe can now generate compressed
.cram
files from the raw reads that have been depleted from host (human) reads to assist upload to public databases such as, e.g, SRA.
(see config manual, sections "output" and "dehuman") -
Diversity: Computation of various diversity indices for the underlying samples following the review:
doi:10.1016/j.coviro.2021.06.002
(see config manual, sections "output" and "diversity")
Bugfixes:
frameshift_deletions_checks
: English language reports now correctly covers insertions.
Preview features:
- a series of tools and features that can be customized to assist in uploading results and raw reads to public databases such as ENA.
(see config manual, sections "output" and "upload",
see: workflow/scripts/prepare_upload_symlinks.sh)
2.99.1 Standardized usage
New features
- directory structure follows more closely
snakemake workflow catalog's standardized usage - configuration manual generated automatically from JSON schema
see: config/config.html
For legacy V-pipe 1.x/2.x users:
- the new directory structure requires adapting old INI files
please refer to config/README.md
NOTE:
- currently only the analysis of NGS viral data is fully tested and
guaranteed stable. - For other more advanced functionality (e.g., benchmarking) you might
want to wait until a future release.
Toward a new gen V-pipe
New features
This release incorporate numerous upgrades:
- Visualisation
- frameshift_deletions_checks
- sample consensus fasta generated by bcftools
- predicthaplo as an additional global haplotype engine
- support for extremely large cohorts
(e.g., stats computed per-sample and merged, instead of all BAMs) - snakemake resources
- Automatic testing
- Automatic Docker generation
- Snakemake standard JSON/YAML configuration
virus_base_config
instead of separate branches (like 'sars-cov2')
Legacy users
This is the last version that:
- uses a working directory structure that is still similar to…
- and can directly import configurations verbatim from…
…legacy versions of V-pipe 1.x/2.x and virus-branches (sars-cov2)
Working functionnality
- currently only the analysis of NGS viral data is fully tested and
guaranteed stable. - For other more advanced functionality (e.g., benchmarking) you might
want to wait until a future release.
V-pipe 2.0: Benchmark functionalities
- simBench and testBench modules added to simulated reads from virus populations and evaluate read alignment bias and SNV calls, respectively
- vpipeBench allows automated execution of the benchmark, from the generation of the in silico virus population to the evaluation of SNV calls
- vpipeBenchRunner enables simultaneous execution of multiple pipeline configurations
This version of V-pipe was used to run computations reported in Posada-Céspedes et al (doi:10.1101/2020.06.09.142919)
NOTE: During the Gibbs sampling performed by ShoRAH, several clusters may generate the same haplotype representative. Such collisions result in inflated posterior values. Also, the averaging of the haplotype abundances across iterations can be affected by floating-point precision problems. Fortunately, ShoRAH also reports the number of reads assigned to each haplotype per iteration which we use to correct the aforementioned quantities in post-processing. We are currently implementing the changes required to resolve these issues in future releases of ShoRAH.
V-pipe 1.0
- Downgrade LoFreq version to 2.1.3 due to a reported issue (CSB5/lofreq#89) and build the reference sequence index as a separate rule
- To build consensus sequences, report ambiguous base (N) when coverage is below 2 reads
- Generalize module to detect flow-cell cross-contamimation
- Basic report on number of reads after QC and alignment
- Allow to call SNVs with respect to a reference sequence, instead of the consensus sequences built from all data sets analysed within a single run of the pipeline
V-pipe 1.0 release candidate
This is a release candidate of Vpipe 1.0.
-
It has been tested on linux and macOS, using python 3.8 and snakemake 5.14.0. There are known limitations with executing SAVAGE for haplotype reconstruction on macOS.
-
Apart from VICUNA, all dependencies are managed by conda, and we strongly recommend to use
--use-conda
. VICUNA is only needed when reads are aligned with ngshmmalign and areferences/initial_consensus.fasta
file is not provided.
V-pipe pre-release
Changes:
- Split rules into separate files to modularize pipeline code