-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathrelease_notes.txt
272 lines (188 loc) · 8.82 KB
/
release_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
Release Notes
=============
2.28.3
- bugfix: major: refactor introduced a bug in variant filtering
(boolean was flipped) leading to overlapping and duplicate variants not being
filtered out.
2.28.2
- bugfix: major: duplicate variants now being discarded properly
2.28.1
- bugfix: major: now properly handling crowded variants
Previous modification to allow VCFs where a SNP is immediately followed by an
INS or DEL introduced different bugs (it allowed genuinely illegal variants to
pass). The current filtering correctly rejects such illegal variants but
passes legitimate crowded variants
- enhancement: Notebook based alignment analysis
2.27.0
- enhancement: aaftoolz (parallelization of composable analysis for AAF files)
- enhancement: tool to collate alignment accuracy vs variant content vs MQ data
from an AAF and write it out as a CSV
2.26.0
- enhancement: bamtoolz (parallelization of composable analysis) implemented
- enhancement: Paired-Alignment-Histogram implemented
- bugfix: tlen computation now handles reads fully inside insertions
- enhancement: SNPs directly adjacent to deletions and insertions are properly handled
No algorithm was changed - just two checks were made more permissive.
2.25.1
- bugfix: bam2illumina discards read with tlen = 0
Also discards reads with tlen < 0.95 read len, which allows us to have some reads with tlen = rlen
in the simulation (which tests some edge cases in the aligner) but not too many,
which screws up our stats.
- bugfix: god-aligner now writes out tlen
Does so matching the SAM spec.
- enhancement: new model illuminaHiSeq-FCA.pkl added
- bugfix: god-aligner now sets .mate_is_reverse too
2.25.0
- enhancement: flat coverage option added to read generation model
Flat coverage implemented for Illumina model. Example of usage included
2.24.0
- enhancement: Composable BAM analysis using cytoolz and pandas
2.23.0
- enhancement: Program now prints out version during run
2.22.0
- enhancement: placeholder figure for automated workflows
For automated workflows we sometimes have real (not simulated) data. For such workflows we simply
plot a placeholder figure to include in reports by passing a dummy long qname with the magic word
'deadbeef' in it
2.21.0
- enhancement: bam-to-truth tool implemented
2.20.0
- workaround: for qname bug in htslib
- bugfix: for hidden bug in qname parsing
- The qname now ends in a "*" instead of a "|"
- This is a breaking change as it changes the qname format, but
- it fixes an edge case (bug) that surprisingly we never ran into: In the old style qname
we checked for truncation (and hence when we should look in the overflow file) by checking
if the last character was not a '|'. It is possible for a truncated qname to end in a '|'
so there was a potential bug there. This is fixed by having an explicit, distinct termination
character '*'
- Having an explicit termination character means we do not need to know the truncation length
We just have to check for the '*' character.
- A tool is provided that will convert FASTQs and BAMs from the old to the new format
- The qname is now restricted to 240 chars by default and this works around the HTSLIB bug
- enhancement: subset-bam now saves paired reads
- enhancement: qname-stats tool
- bugfix: long qname file loading fix
- enhancement: vcf-complexity tool added
2.19.4
- bugfix: alignment analysis plotting now properly handles simulations with no variants
2.19.3
- bugfix: Fix to alignement scoring. Affects data with soft-clips and insertions
- bugfix: subset-bam: fixed v range to be inclusive
- enhancement: read count heatmap added to alignment analysis
- enhancement: read fate bar chart now has read count numbers
2.19.2
- Enhancement: god-aligner outputs V1 CIGARs by default, V2 CIGARs if requested
2.19.1
- workaround: pysam.index is finicky (on Linux). Adding option to god-aligner to skip indexing
Used on the platform and on platforms where pysam.index is finicky
2.19.0
- Enhancement: subset-bam tool (formerly poor-alignments) now performs many
functions related to extracting reads from a BAM based on d_err and variant size
2.18.0
- Enhancement: Added mean MQ heatmap over d_err and variant size
2.17.0
- Enhancement: un-paired read generation implemented
- Enhancement: synthetic read model generator
2.16.0
- Enhancement: poor-alignments implemented
2.15.2
- Bugfix: alignment-analysis bin_size is now adjusted such that we get at least two bins given the variant size range
2.15.1
- Enhancement: `call-fate`. Simplified code (removing a bug), changed output format to single VCF
2.15.0
- Enhancement: `call-fate` implemented
2.14.1
- Bugfix: `filter-variants` should keep at least one 0/0 or 0 entry per contig. Fixes issue #4
2.14.0
- Enhancement: added copy sequence insertion model
- docs: removed reference to `sampled-genome` which is not implemented yet
- Enhancement: can now truncate reads to any length shorter than the original model
2.13.1
- Bugfix: xmv code processes unmapped reads at end of file, only considers primary alignments
2.13.0
- Enhancement: xmv code made faster
- Bugfix: logic error in alignment scoring fixed. Only affected scoring of reads from inside insertions
- Bugfix: simulate-variants no longer generates 0 length deletions (looking like Ref=A, Alt=A etc. in VCF)
- Improvements to alignment analysis plots
- Bugfix: Reads in long insertions now properly placed just after the anchoring reference base
- Enhancement: Improved CLI for processing/replotting alignment metrics. The CLI is now split into two subcommands. You no longer have to pass in the BAM and qname side-car file when you just want to replot an existing alignment metrics data file.
2.12.2
- 'd_err_strict' option added to partition-bams
2.12.1
- Bugfix: fixed tag error in partition_bams code
Note: Partition BAMs is not parallelized and can only run serially through a BAM. For large BAMs (eg. > 50GB
one will usually run out of memory)
2.12.0
- Changed alignment error algorithm
- Alignment analysis reports reinstated, now with further detail and break down by variant size
- Added utility program to plot variant size distrib in a VCF file
2.11.2
- Bugfix: other IUPAC codes will no longer appear in reads
2.11.0
- Simple variant simulation implemented
2.10.0
- BAM partitioner implemented
2.9.2
- Bugfix: Forgot to commit benchmarking/__init__.py
2.9.1
- Fixed ploidy sniffing (was previously only looking at first variant in region)
- Added sample empty VCF files for human male and female + instructions on how to
take reference reads
2.9.0
- Added check for illegal overlaps in variants in variant filtering
- Fixed bug in filtering complex variants
- Adding programs to process eval.vcf from VCF benchmarking pipeline
2.8.7
- god-aligner now writes RG tag to reads (required for GATK)
- In read model description: Now plotting adjusted BQ, rather than simple average BQ
2.8.4
- Bugfix: PHRED (Base quality) score for perfect reads is now 40 ('I') to avoid confusing
GATK and other tools that think 60 (the old value) is erroneous.
2.8.3
- Bugfix: bracketed entries and breakends are now discarded from VCF file
2.8.0 (Internal testing release)
- qnames > 254 now handled (uses side car file)
- md tag-like string used to describe read corruption for corrupted reads
2.7.3 (Internal testing release)
- t_len < r_len bug in Illumina model fixed
- queue length explicitly set to avoid out of memory errors in Linux
2.7.1 (Internal testing release)
- Reads from N regions are discarded.
(Initially I thought to have this done via the bed file, but the reference is
peppered with stretches of 'N's and it's cumbersome for the user to craft such
a detailed BED file)
2.7.0 (Internal testing release)
- Added five empirical read models
2.6.0 (Internal testing release)
- MQ plots
- D_err plots
- New read model format
- Tool to extract read model from BAM
- Read corruption
2.5.0 (Internal testing release)
- Added god-aligner
2.4.0 (Internal testing release)
Algorithm changes
-----------------
- Read POS and CIGAR generation algorithm redesigned
- Ploidy of genome now inferred from VCF file.
Simulation will properly handle XY chromosomes and polyploidy IF the VCF GT is properly set
- Standard BED file is used to select regions
- BED file should avoid 'NNN' regions
- Read generation order is much less serial
Data changes
------------
- qname contains list of variant sizes carried by read
This makes variant based analyses of alignments easier
- CIGAR for reads from inside long insertion properly handled
- Name of sample included in read
- Can mix in viral contamination
- Can do tumor/normal mixes
Program design changes
----------------------
- Written for Python 3
- One entry command ('mitty')
This allows us convenient access to all mitty commands
- Better support for UNIX paradigms such as pipes and process substitution
- Better parallelization