23 May 03:25

xingchensong

v3.1.0

2d8bb97

v3.1.0 Latest

Latest

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

What's Changed

[ctc] Update search.py by @pengzhendong in #2398
fix mask to bias by @Mddct in #2401
[ssl/w2vbert] weight copy from meta w2vbert-2.0 by @Mddct in #2392
[lint] fix linter version by @xingchensong in #2405
[search] Update search.py by @xingchensong in #2406
fix mask bias dtype in sdpa by @Mddct in #2407
Fix ckpt conversion bug by @zhr1201 in #2399
[dataset] restrict batch type by @Mddct in #2410
[wenet/bin/recognize.py] modify args to be consistent with train by @Mddct in #2411
[transformer] remove pe to device by @Mddct in #2413
add timer for steps by @Mddct in #2416
[dataset] support repeat by @Mddct in #2415
- (!! breaking changes, we recommand step_save instead of epoch_save !!) 🚀🚀🚀
[transformer] fix sdpa u2pp training nan by @Mddct in #2419
- (!! important bug fix, enjoy flash attention without pain !!) 🚀🚀🚀
[transformer] fix sdpa mask for ShowRelAttention by @xingchensong in #2420
[runtime/libtorch] fix jit issue by @xingchensong in #2421
[dataset] add shuffle at shards tar/raw file level by @kakashidan in #2424
[dataset] fix cycle in recognize.py by @Mddct in #2426
[dataset] unify shuf conf by @Mddct in #2427
fix order by @Mddct in #2428
[runtime] upgrade libtorch version to 2.1.0 by @xingchensong in #2418
[torchaudio] Fix torchaudio interface error (#2352) by @lsrami in #2429
[paraformer] fsdp fix submodule call by @Mddct in #2431
fix modify by @Mddct in #2436
[deprecated dataset] small fix by @kakashidan in #2440
[dataset] add singal channel conf & processor by @kakashidan in #2439
fix list shuffle in recognize.py by @Mddct in #2446
fix list_shuffle in cv_conf by @Mddct in #2447
[runtime] Fixed failed compilation without ITN. Now, compiling ITN is mandatory. by @roney123 in #2444
[runtime] add blank_sacle in ctc_endpoint by @jia-jidong in #2374
fix step in continue training in steps mode by @Mddct in #2453
fix export_jit.py by @Mddct in #2455
[fix] fix copyright by @robin1001 in #2456
[fix] fix copyright by @xingchensong in #2457
fix llama rope by @Mddct in #2459
[train_engine] support fsdp by @Mddct in #2412
- (!! breaking changes, enjoy both fsdp & deepspeed !!) 🚀🚀🚀
[env] update python version and deepspeed version by @xingchensong in #2462
- (!! breaking changes, you may need to update your env !!) ❤❤❤
fix rope pos embdining by @Mddct in #2463
[transformer] add multi warmup and learning rate for different modules by @Mddct in #2449
- (!! Significant improvement on results of whisper !!) 💯💯💯
[whisper] limit language to Chinese by @xingchensong in #2470
[train] convert tensor to scalar by @xingchensong in #2471
[workflow] upgrad python version to 3.10 by @xingchensong in #2472
- (!! breaking changes, you may need to update your env !!) ❤❤❤
refactor cache behaviour in training mode (reduce compute cost and me… by @Mddct in #2473
fix ut by @Mddct in #2477
[transformer] Make MoE runnable by @xingchensong in #2474
[transformer] fix mqa by @Mddct in #2478
enable mmap in torch.load by @Mddct in #2479
[example] Add deespeed configs of different stages for illustrative purposes by @xingchensong in #2485
[example] Fix prefetch and step_save by @xingchensong in #2486
- (!! Significant decrease on cpu ram !!) 💯💯💯
[ctl] simplified ctl by @Mddct in #2483
[branchformer] simplified branchformer by @Mddct in #2482
[e_branchformer] simplified e_branchformer by @Mddct in #2484
[transformer] refactor cache by @Mddct in #2481
fix gradient ckpt in branchformer/ebranformer by @Mddct in #2488
[transformer] fix search after refactor cache by @Mddct in #2490
[transformer] set use_reentrant=False for gradient ckpt by @xingchensong in #2491
[transformer] fix warning: ignore(True) has been deprecated by @xingchensong in #2492
[log] avoid reduntant logging by @xingchensong in #2493
[transformer] refactor mqa repeat by @Mddct in #2497
[transformer] fix mqa in cross att by @Mddct in #2498
[deepspeed] update json config by @xingchensong in #2499
[onnx] clone weight for whisper by @xingchensong in #2501
[wenet/utils/train_utils.py] fix log by @Mddct in #2504
[transformer] keep high precisioin in softmax by @Mddct in #2508
[websocket] 8k and 16k support by @Sang-Hoon-Pakr in #2505
[Fix #2506] Specify multiprocessing context in DataLoader by @MengqingCao in #2507
[mask] set max_chunk_size according to subsample rate by @xingchensong in #2520
Revert "[Fix #2506] Specify multiprocessing context in DataLoader" by @xingchensong in #2521
[transformer] try to fix mga in onnxruntime by @Mddct in #2519
[utils] update precision of speed metric by @xingchensong in #2524
fix segmentfault in (#2506) by @MengqingCao in #2530

New modules and methods (from LLM community) by @Mddct & @fclearner 🤩🤩🤩

[transformer] support multi query attention && multi goruped by @Mddct in #2403
[transformer] add rope for transformer/conformer by @Mddct in #2458
LoRA support by @fclearner in #2049

New Contributors

@lsrami made their first contribution in #2429
@jia-jidong made their first contribution in #2374
@MengqingCao made their first contribution in #2507

Full Changelog: v3.0.1...v3.1.0

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

Contributors

Mddct, kakashidan, and 10 other contributors

Assets 2

09 Mar 06:50

xingchensong

v3.0.1

a93af33

WeNet 3.0.1

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

What's Changed

Fix loss returned by CTC model in RNNT by @kobenaxie in #2327
[dataset] new io for code reuse for many speech tasks by @Mddct in #2316
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
Fix eot by @Qiaochu-Song in #2330
[decode] support length penalty by @xingchensong in #2331
[bin] limit step when averaging model by @xingchensong in #2332
fix 'th_accuracy' not in transducer by @DaobinZhu in #2337
[dataset] support bucket by seq length by @Mddct in #2333
[examples] remove useless yaml by @xingchensong in #2343
[whisper] support arbitrary language and task by @xingchensong in #2342
- (!! breaking changes, happy whisper happy life !!) 💯💯💯
Minor fix decode_wav by @kobenaxie in #2340
fix comment by @Mddct in #2344
[w2vbert] support w2vbert fbank by @Mddct in #2346
[dataset ] fix typo by @Mddct in #2347
[wenet] fix args.enc by @Mddct in #2354
[examples] Initial whisper results on wenetspeech by @xingchensong in #2356
[examples] fix --penalty by @xingchensong in #2358
[paraformer] add decoding args by @xingchensong in #2359
[transformer] support flash att by 'torch scaled dot attention' by @Mddct in #2351
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
[conformer] support flash att by torch sdpa by @Mddct in #2360
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
[conformer] sdpa default to false by @Mddct in #2362
[transformer] fix bidecoder sdpa by @Mddct in #2368
[runtime] Configurable blank token idx by @zhr1201 in #2366
[wenet] modify - runtime/code/decoder more faster by @Sang-Hoon-Pakr in #2367
- (!! Significant improvement on warmup when using libtorch !!) 🚀🚀🚀
[lint] fix lint by @cdliang11 in #2373
[examples] better results on wenetspeech using revised transcripts by @xingchensong in #2371
- (!! Significant improvement on results of whisper !!) 💯💯💯
[dataset] support pad or trim for whisper decoding by @Mddct in #2378
[bin/recognize.py] support numworkers and compute dtype by @Mddct in #2379
- (!! Significant improvement on inference speed when using fp16 !!) 🚀🚀🚀
[whisper] fix decoding maxlen by @Mddct in #2380
fix whisper ckpt modify error by @fclearner in #2381
更新 recognize.py by @Mddct in #2383
[transformer] add cross attention by @Mddct in #2388
- (!! Significant improvement on inference speed of attention_beam_search !!) 🚀🚀🚀
[paraformer] fix some bugs by @Mddct in #2389
new modules and methods by @Mddct in 🤩🤩🤩
- [transformer] support bias by @Mddct in #2394
- [transformer] add gated mlp by @Mddct in #2395
- [transformer] add rms-norm by @Mddct in #2396
- [transformer] add norm eps by @Mddct in #2397

New Contributors

@Qiaochu-Song made their first contribution in #2330
@Sang-Hoon-Pakr made their first contribution in #2367

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

Full Changelog: v3.0.0...v3.0.1

Contributors

Mddct, xingchensong, and 7 other contributors

Assets 2

25 Jan 11:59

xingchensong

v3.0.0

baaa27a

WeNet 3.0.0

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

New Features

Support full part of bestrq #1869, #2060
Support GPU Tlg streaming #1878
Support streaming ASR web demo #1888
Support k2 rnnt loss and delay penality #1909
Supports context biasing #1931, #1936
Support ZeroPrompt (not merged) #1943
Support M1 Mac onnxruntime #1953
Support ITN runtime #2001, #2042, #2246
Support wav2vec2 #2034, #2035
Support part of w2vbert training #2039
wenet cli #2047, #2054, #2075, #2082, #2088, #2087, #2098, #2101, #2122 (!! simple and fast !!) 🛫
Support E-Branchformer module #2013
Support deepspeed #1849, #2168, #2123 (!! big big big !!) 💯
LoRA support (not merged) #2049
support batch decoding for ctc_prefix_beam_search & attention_rescoring #2059 (!! simple and fast !!) 🛫
support ali-paraformer #2067, #2078, #2093, #2096, #2099, #2124, #2139, #2140, #2155, #2219, #2222, #2277, #2282, #2289, #2314, #2324
support Contrastive learning for unified models #2100
support context biasing with ac automaton #2128, #2136
support whisper arch #2141, #2157, #2196, #2313, #2322, #2323
Support gradient checkpointing for Conformer & Transformer (whisper) #2173, #2275
ssh-launcher for multi-node multi-gpu training #2180, #2265
u2++-lite training support #2202
support blank penalty #2278
support speaker in dataset #2292
Whisper inference support in cpp runtime #2320

What's Changed

Upgrade libtorch CPU runtime with IPEX version #1893
Refine ctc alignment #1966
Use torchrun for distributed training #2020, #2021
Refine traning code #2055, #2103, #2123, #2248, #2252, #2253, #2270, #2286, #2288, #2312 (!! big changes !!) 🚀
mv all ctc functions to ctc_utils.py #2057 (!! big changes !!) 🚀
move search methods to search.py #2056 (!! big changes !!) 🚀
move all k2 related functions to k2 #2058
refactor and simplify decoding methods #2061, #2062
unify decode results of all decoding methods #2063
refactor(dataset): return dict instead of tuple #2106, #2111
init_model API changed #2116, #2216 (!! big changes !!) 🚀
move yaml saving to save_model() #2156
refine tokenizer #2165, #2186 (!! big changes !!) 🚀
deprecate wenetruntime #2194 (!! big changes !!) 🚀
use pre-commit to auto check and lint #2195
refactor(yaml): Config ctc/cmvn/tokenizer in train.yaml #2205, #2229, #2230, #2227, #2232 (!! big changes !!) 🚀
train with dict input #2242, #2243 (!! big changes !!) 🚀
[dataset] keep pcm for other task #2268
Updgrad torch to 2.x #2301 (!! big changes !!) 🚀
log everything to tensorboard #2307

New Bug Fixes

Fix NST recipe #1863
Fix Librispeech fst dict #1929
Fix bug when make shard.list for *.flac #1933
Fix bug of transducer #1940
Avoid problem during model averaging when there is parameter-tying. #2113
[loss] set zero_infinity=True to ignore NaN or inf ctc_loss #2299
fix android #2303

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
Many thanks to all the contributors !!!!! I love u all.

Assets 2

26 May 05:08

xingchensong

v2.2.1

ac9a261

WeNet 2.2.1

What's Changed

Add http server/client @aluminumbox #1670
Add Trt (Myelin) support for streaming ASR @yuekaizhang #1679
Support OpenVino @FionaZZ92 #1700
Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue for efficient conformer @zwglory #1701
Support ort backend in wenetruntime @xingchensong #1708
Support LFMMI @aluminumbox #1725
Support Paraformer @MrSupW & @robin1001 #1738 & #1749 & #1791 & #1795
Support part of bestrq @Mddct #1750 & #1754 & #1824
Remove concat after to simplify the code flow #1762 & #1763 & #1764
Add riva cuda tlg decoder @yuekaizhang #1773
Add CUDA TLG nbest and mbr decoding @yuekaizhang #1804
Support IPEX @ZailiWang #1816
Support Branchformer @kli017 #1845
Support GPU hotword @zwglory #1860

Contributors

Mddct, robin1001, and 8 other contributors

Assets 2

15 Jan 09:55

pengzhendong

v2.2.0

4870a53

WeNet 2.2.0

What's Changed

support exporting squeezeformer to onnx (CPU & GPU) by @yygle in #1593 and #1634
support horizon x3 pi by @xingchensong in #1597
support noisy student training by @NevermoreCY in #1600
support efficient conformer by @zwglory in #1636
add blank scale for wfst decoding by @simonwang517 in #1646

Contributors

zwglory, simonwang517, and 3 other contributors

Assets 2

25 Nov 13:06

pengzhendong

v2.1.0

73ac0e2

WeNet 2.1.0

What's Changed

allow instantiate multiple models in #1580
do not pack libtorch.so in python binding to reduce wheel in #1573 and #1576
support iOS by @Ma-Dan in #1549 🛫
support HLG decode by @aluminumbox in #1521 💯
support squeezeformer by @yygle in #1519 👍
support XPU by @imoisture in #1455 🚀
and so on ...

Contributors

aluminumbox, Ma-Dan, and 2 other contributors

Assets 2

21 Jun 10:17

robin1001

v2.0.1

bda6c86

WeNet Python Binding Models Pre-release

Pre-release

This release is for hosting the wenet python binding models.

Assets 4

14 Jun 01:38

robin1001

v2.0.0

d2c41cb

WeNet 2.0.0

The following features are stable.

U2++ framework for better accuracy
n-gram + WFST language model solution
Context biasing(hotword) solution
Very big data training support with UIO
More dataset support, including WenetSpeech, GigaSpeech, HKUST and so on.

Assets 2

21 Jun 07:27

robin1001

v1.0.0

7f00996

WeNet 1.0.0

Model

propose and support U2++, as the following graph shows, which uses both forward and backward information at training and decoding.

support dynamic left chunk training and decoding, so we can limit history chunk at decoding to save memory and computation.
support distributed training.

Dataset

Now we support the following five standard speech datasets, and we got SOTA result or close to SOTA result.

数据集	语言	数据量(h)	测试集	CER/WER	SOTA
aishell-1	中文	200	test	4.36	4.36(WeNet)
aishell-2	中文	1000	test_ios	5.39	5.39(WeNet)
multi-cn	中文	2385	/	/	/
librispeech	英文	1000	test_clean	2.66	2.10(EspNet)
gigaspeech	英文	10000	test	11.0	10.80(EspNet)

Productivity

Here are some features related to productivity.

LM support. Here is the system design or LM supporting. WeNet can work with/without LM according to your applications/scenarios.

timestamp support.
n-best support.
endpoint support.
gRPC support
further refine x86 server and on-device android recipe.

Assets 2

04 Feb 08:15

jschenxiaoyu

v0.1.0

a332081

WeNet 0.1.0

Major Features

Joint CTC/AED model structure
U2, dynamic chunk training support
Torchaudio support
Runtime x86 and android support

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New modules and methods (from LLM community) by @Mddct & @fclearner 🤩🤩🤩

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New Features

What's Changed

New Bug Fixes

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Model

Dataset

Productivity

Major Features

Releases: wenet-e2e/wenet

v3.1.0

What's Changed

New modules and methods (from LLM community) by @Mddct & @fclearner 🤩🤩🤩

New Contributors

Contributors

WeNet 3.0.1

What's Changed

New Contributors

Contributors

WeNet 3.0.0

New Features

What's Changed

New Bug Fixes

WeNet 2.2.1

What's Changed

Contributors

WeNet 2.2.0

What's Changed

Contributors

WeNet 2.1.0

What's Changed

Contributors

WeNet Python Binding Models

WeNet 2.0.0

WeNet 1.0.0

Model

Dataset

Productivity

WeNet 0.1.0

Major Features