-
Notifications
You must be signed in to change notification settings - Fork 6
/
RESULTS
167 lines (136 loc) · 8.15 KB
/
RESULTS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# Note: these results will vary somewhat from OS to OS, because
# some algorithms call rand().
First, comparing with published results
feb89 oct89 feb91 sep92 avg
2.77 4.02 3.30 6.29 4.10 % from ICASSP'99 paper by Povey & Woodland on Frame Discrimination (ML baseline)
3.20 4.10 2.86 6.06 4.06 % from decode_tri2c (which is triphone + CMN)
exp/decode_mono/wer:Average WER is 14.234421 (1784 / 12533) # Monophone system, subset
exp/decode_tri1/wer:Average WER is 4.420330 (554 / 12533) # First triphone pass
exp/decode_tri1_fmllr/wer:Average WER is 3.837868 (481 / 12533) # + fMLLR
exp/decode_tri1_regtree_fmllr/wer:Average WER is 3.789994 (475 / 12533) # + regression-tree
# results in decode_tri1_latgen with varying acoustic
# scale: looks like we had not tuned this too well (10 is the
# default).
exp/decode_tri1_latgen/wer:Average WER is 4.420330 (554 / 12533)
exp/decode_tri1_latgen/wer_6:Average WER is 3.917657 (491 / 12533)
exp/decode_tri1_latgen/wer_7:Average WER is 3.813931 (478 / 12533)
exp/decode_tri1_latgen/wer_8:Average WER is 3.869784 (485 / 12533)
exp/decode_tri1_latgen/wer_9:Average WER is 4.013405 (503 / 12533)
exp/decode_tri1_latgen/wer_10:Average WER is 4.085215 (512 / 12533)
exp/decode_tri1_latgen/wer_11:Average WER is 4.188941 (525 / 12533)
exp/decode_tri1_latgen/wer_12:Average WER is 4.420330 (554 / 12533)
exp/decode_tri1_latgen/wer_13:Average WER is 4.555972 (571 / 12533)
# Lattice oracle error rate for exp/decode_tri1_latgen/
# when acoustic scale is set to 10
Beam 0.01 Average WER is 4.085215 (512 / 12533)
Beam 0.5 Average WER is 3.702226 (464 / 12533)
Beam 1 Average WER is 3.279343 (411 / 12533)
Beam 5 Average WER is 1.412272 (177 / 12533)
Beam 10 Average WER is 0.582462 ( 73 / 12533)
# Results on a second pass of triphone system building--
# various configurations.
exp/decode_tri2a/wer:Average WER is 3.973510 (498 / 12533) # Second triphone pass
exp/decode_tri2a_fmllr/wer:Average WER is 3.590521 (450 / 12533) # + fMLLR
exp/decode_tri2a_fmllr_utt/wer:Average WER is 3.933615 (493 / 12533) # [ fMLLR per utterance ]
exp/decode_tri2a_dfmllr/wer:Average WER is 3.861805 (484 / 12533) # + diagonal fMLLR
exp/decode_tri2a_dfmllr_utt/wer:Average WER is 3.933615 (493 / 12533) # [ diagonal fMLLR per utterance]
exp/decode_tri2a_dfmllr_fmllr/wer:Average WER is 3.622437 (454 / 12533) # diagonal fMLLR, then estimate fMLLR and re-decode
exp/decode_tri2b/wer:Average WER is 3.143701 (394 / 12533) # Exponential transform
exp/decode_tri2b_fmllr/wer:Average WER is 3.055932 (383 / 12533) # +fMLLR
exp/decode_tri2b_utt/wer:Average WER is 3.295300 (413 / 12533) # [adapt per-utt]
exp/decode_tri2c/wer:Average WER is 3.957552 (496 / 12533) # Cepstral mean subtraction (per-spk)
exp/decode_tri2d/wer:Average WER is 4.316604 (541 / 12533) # MLLT (= global STC)
exp/decode_tri2e/wer:Average WER is 4.659698 (584 / 12533) # splice-9-frames + LDA features
exp/decode_tri2f/wer:Average WER is 3.885742 (487 / 12533) # splice-9-frames + LDA + MLLT
exp/decode_tri2g/wer:Average WER is 3.303279 (414 / 12533) # Linear VTLN
exp/decode_tri2g_diag/wer:Average WER is 3.135722 (393 / 12533) # Linear VTLN; diagonal adapt in test
exp/decode_tri2g_diag_fmllr/wer:Average WER is 3.063911 (384 / 12533) # as above but then est. fMLLR (another decoding pass)
exp/decode_tri2g_diag_utt/wer:Average WER is 3.399027 (426 / 12533)
exp/decode_tri2g_vtln/wer:Average WER is 3.239448 (406 / 12533) # Use warp factors -> feature-level VTLN + offset estimation
exp/decode_tri2g_vtln_diag/wer:Average WER is 3.127743 (392 / 12533) # feature-level VTLN + diag fMLLR
exp/decode_tri2g_vtln_diag_utt/wer:Average WER is 3.407006 (427 / 12533) # as above, per utt.
exp/decode_tri2g_vtln_nofmllr/wer:Average WER is 3.694247 (463 / 12533) # feature-level VTLN but no fMLLR
exp/decode_tri2h/wer:Average WER is 4.252773 (533 / 12533) # Splice-9-frames + HLDA
exp/decode_tri2i/wer:Average WER is 3.981489 (499 / 12533) # Triple-deltas + HLDA
exp/decode_tri2j/wer:Average WER is 3.853826 (483 / 12533) # Triple-deltas + LDA + MLLT
exp/decode_tri2k/wer:Average WER is 3.071890 (385 / 12533) # LDA + exponential transform (ET)
exp/decode_tri2k_utt/wer:Average WER is 3.039974 (381 / 12533) # per-utterance adaptation
exp/decode_tri2k_fmllr/wer:Average WER is 2.641028 (331 / 12533) # fMLLR (per-spk)
exp/decode_tri2k_regtree_fmllr/wer:Average WER is 2.688901 (337 / 12533) # +regression-tree
exp/decode_tri2l/wer:Average WER is 2.704859 (339 / 12533) # Splice-9-frames + LDA + MLLT + SAT (fMLLR in test)
exp/decode_tri2l_utt/wer:Average WER is 4.930982 (618 / 12533) # [ as decode_tri2l but per-utt in test. ]
# linear-VTLN on top of LDA+MLLT features.
exp/decode_tri2m/wer:Average WER is 3.223490 (404 / 12533) # offset-only transform after VTLN part of LVTLN
exp/decode_tri2m_diag/wer:Average WER is 3.119764 (391 / 12533) # diagonal transform after VTLN part of LVTLN
exp/decode_tri2m_diag_fmllr/wer:Average WER is 2.784649 (349 / 12533) # + fMLLR
exp/decode_tri2m_diag_utt/wer:Average WER is 3.279343 (411 / 12533) # [per-utt]
exp/decode_tri2m_vtln/wer:Average WER is 4.747467 (595 / 12533) # feature-space VTLN, plus offset-only transform
# (for some reason it failed)
exp/decode_tri2m_vtln_diag/wer:Average WER is 3.087848 (387 / 12533) # + diagonal transform
exp/decode_tri2m_vtln_diag_utt/wer:Average WER is 4.340541 (544 / 12533) # [per-utterance]
exp/decode_tri2m_vtln_nofmllr/wer:Average WER is 5.784728 (725 / 12533) # feature-space VTLN, with no fMLLR
# sgmma is SGMM without speaker vectors.
exp/decode_sgmma/wer:Average WER is 3.319237 (416 / 12533)
exp/decode_sgmma_fmllr/wer:Average WER is 2.928269 (367 / 12533)
exp/decode_sgmma_fmllr_utt/wer:Average WER is 3.303279 (414 / 12533)
exp/decode_sgmma_fmllrbasis_utt/wer:Average WER is 3.191574 (400 / 12533)
# sgmmb is SGMM with speaker vectors.
exp/decode_sgmmb/wer:Average WER is 2.521344 (316 / 12533)
exp/decode_sgmmb_fmllr/wer:Average WER is 2.377723 (298 / 12533)
exp/decode_sgmmb_utt/wer:Average WER is 2.728796 (342 / 12533)
# sgmmc is like sgmmb but with gender dependency
exp/decode_sgmmc/wer:Average WER is 2.720817 (341 / 12533)
exp/decode_sgmmc_fmllr/wer:Average WER is 2.489428 (312 / 12533)
# sgmmd is like sgmmb but with LDA+MLLT features.
exp/decode_sgmmd/wer:Average WER is 2.656986 (333 / 12533)
exp/decode_sgmmd_fmllr/wer:Average WER is 2.409639 (302 / 12533)
# sgmme is like sgmmb but with LDA+ET features.
exp/decode_sgmme/wer:Average WER is 2.337828 (293 / 12533)
exp/decode_sgmme_fmllr/wer:Average WER is 2.266018 (284 / 12533)
#### Note: stuff below this line may be out of date / not computed
# with most recent version of toolkit.
# note: when changing (phn,spk) dimensions from (40,39) -> (30,30),
# WER in decode_sgmmb/ went from 2.62 to 2.92
# when changing from (40,39) -> (50,39) [40->50 on iter 3],
# WER in decode_sgmmb/ went from 2.62 to 2.66 [and test likelihood
# got worse].
# sgmmc is as sgmmb but with gender-dependent UBM, with 250
# Gaussians per gender instead of 400 Gaussians. Note: use
# gender info in test.
exp/decode_sgmmc/wer:Average WER is 2.784649 (349 / 12533)
exp/decode_sgmmc_fmllr/wer:Average WER is 2.688901 (337 / 12533)
# notes on timing of training with ATLAS vs. MKL:
# all below are with -O0.
# tested time taken with "time steps/train_tri2a.sh"
# [on svatava]: with "default" compile (which is 32-bit+ATLAS)
# real 14m19.458s
# user 15m38.695s
# 64-bit+ATLAS:
#
# 64-bit+MKL:
# real 12m45.664s
# user 13m53.770s
# + removed -O0 -DKALDI_PARANOID:
# [made almost no difference to training]:
# real 12m31.829s
# user 13m48.967s
# sys 0m28.146s
# 64-bit but ATLAS instead of MKL
# [and with default options, which includes: -O0 -DKALI_PARANOID].
#real 10m50.088s
#user 12m6.914s
#sys 0m17.419s
# Did this again:
#real 10m17.891s
#user 11m28.695s
#sys 0m14.087s
# But when I tested "fmllr-diag-gmm-test", all after removing
# the options -O0 -DKALDI_PARANOID, the ordering of timing was
# different:
# 64-bit+ATLAS was 0.361s
# 64-bit+MKL was 0.307s
# 32-bit+ATLAS was 0.571s
# Testing /homes/eva/q/qpovey/sourceforge/kaldi/trunk/src/gmm/am-diag-gmm-test:
# 64-bit+ATLAS was 0.171s
# 32-bit+ATLAS was 0.205s
# 64-bit+MKL was 0.291s