Releases: wenet-e2e/wenet
Releases · wenet-e2e/wenet
v3.1.0
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
What's Changed
- [ctc] Update search.py by @pengzhendong in #2398
- fix mask to bias by @Mddct in #2401
- [ssl/w2vbert] weight copy from meta w2vbert-2.0 by @Mddct in #2392
- [lint] fix linter version by @xingchensong in #2405
- [search] Update search.py by @xingchensong in #2406
- fix mask bias dtype in sdpa by @Mddct in #2407
- Fix ckpt conversion bug by @zhr1201 in #2399
- [dataset] restrict batch type by @Mddct in #2410
- [wenet/bin/recognize.py] modify args to be consistent with train by @Mddct in #2411
- [transformer] remove pe to device by @Mddct in #2413
- add timer for steps by @Mddct in #2416
- [dataset] support repeat by @Mddct in #2415
- (!! breaking changes, we recommand
step_save
instead ofepoch_save
!!) 🚀🚀🚀
- (!! breaking changes, we recommand
- [transformer] fix sdpa u2pp training nan by @Mddct in #2419
- (!! important bug fix, enjoy flash attention without pain !!) 🚀🚀🚀
- [transformer] fix sdpa mask for ShowRelAttention by @xingchensong in #2420
- [runtime/libtorch] fix jit issue by @xingchensong in #2421
- [dataset] add shuffle at shards tar/raw file level by @kakashidan in #2424
- [dataset] fix cycle in recognize.py by @Mddct in #2426
- [dataset] unify shuf conf by @Mddct in #2427
- fix order by @Mddct in #2428
- [runtime] upgrade libtorch version to 2.1.0 by @xingchensong in #2418
- [torchaudio] Fix torchaudio interface error (#2352) by @lsrami in #2429
- [paraformer] fsdp fix submodule call by @Mddct in #2431
- fix modify by @Mddct in #2436
- [deprecated dataset] small fix by @kakashidan in #2440
- [dataset] add singal channel conf & processor by @kakashidan in #2439
- fix list shuffle in recognize.py by @Mddct in #2446
- fix list_shuffle in cv_conf by @Mddct in #2447
- [runtime] Fixed failed compilation without ITN. Now, compiling ITN is mandatory. by @roney123 in #2444
- [runtime] add blank_sacle in ctc_endpoint by @jia-jidong in #2374
- fix step in continue training in steps mode by @Mddct in #2453
- fix export_jit.py by @Mddct in #2455
- [fix] fix copyright by @robin1001 in #2456
- [fix] fix copyright by @xingchensong in #2457
- fix llama rope by @Mddct in #2459
- [train_engine] support fsdp by @Mddct in #2412
- (!! breaking changes, enjoy both fsdp & deepspeed !!) 🚀🚀🚀
- [env] update python version and deepspeed version by @xingchensong in #2462
- (!! breaking changes, you may need to update your env !!) ❤❤❤
- fix rope pos embdining by @Mddct in #2463
- [transformer] add multi warmup and learning rate for different modules by @Mddct in #2449
- (!! Significant improvement on results of whisper !!) 💯💯💯
- [whisper] limit language to Chinese by @xingchensong in #2470
- [train] convert tensor to scalar by @xingchensong in #2471
- [workflow] upgrad python version to 3.10 by @xingchensong in #2472
- (!! breaking changes, you may need to update your env !!) ❤❤❤
- refactor cache behaviour in training mode (reduce compute cost and me… by @Mddct in #2473
- fix ut by @Mddct in #2477
- [transformer] Make MoE runnable by @xingchensong in #2474
- [transformer] fix mqa by @Mddct in #2478
- enable mmap in torch.load by @Mddct in #2479
- [example] Add deespeed configs of different stages for illustrative purposes by @xingchensong in #2485
- [example] Fix prefetch and step_save by @xingchensong in #2486
- (!! Significant decrease on cpu ram !!) 💯💯💯
- [ctl] simplified ctl by @Mddct in #2483
- [branchformer] simplified branchformer by @Mddct in #2482
- [e_branchformer] simplified e_branchformer by @Mddct in #2484
- [transformer] refactor cache by @Mddct in #2481
- fix gradient ckpt in branchformer/ebranformer by @Mddct in #2488
- [transformer] fix search after refactor cache by @Mddct in #2490
- [transformer] set use_reentrant=False for gradient ckpt by @xingchensong in #2491
- [transformer] fix warning: ignore(True) has been deprecated by @xingchensong in #2492
- [log] avoid reduntant logging by @xingchensong in #2493
- [transformer] refactor mqa repeat by @Mddct in #2497
- [transformer] fix mqa in cross att by @Mddct in #2498
- [deepspeed] update json config by @xingchensong in #2499
- [onnx] clone weight for whisper by @xingchensong in #2501
- [wenet/utils/train_utils.py] fix log by @Mddct in #2504
- [transformer] keep high precisioin in softmax by @Mddct in #2508
- [websocket] 8k and 16k support by @Sang-Hoon-Pakr in #2505
- [Fix #2506] Specify multiprocessing context in DataLoader by @MengqingCao in #2507
- [mask] set max_chunk_size according to subsample rate by @xingchensong in #2520
- Revert "[Fix #2506] Specify multiprocessing context in DataLoader" by @xingchensong in #2521
- [transformer] try to fix mga in onnxruntime by @Mddct in #2519
- [utils] update precision of speed metric by @xingchensong in #2524
- fix segmentfault in (#2506) by @MengqingCao in #2530
New modules and methods (from LLM community) by @Mddct & @fclearner 🤩🤩🤩
- [transformer] support multi query attention && multi goruped by @Mddct in #2403
- [transformer] add rope for transformer/conformer by @Mddct in #2458
- LoRA support by @fclearner in #2049
New Contributors
- @lsrami made their first contribution in #2429
- @jia-jidong made their first contribution in #2374
- @MengqingCao made their first contribution in #2507
Full Changelog: v3.0.1...v3.1.0
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
WeNet 3.0.1
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
What's Changed
- Fix loss returned by CTC model in RNNT by @kobenaxie in #2327
- [dataset] new io for code reuse for many speech tasks by @Mddct in #2316
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
- Fix eot by @Qiaochu-Song in #2330
- [decode] support length penalty by @xingchensong in #2331
- [bin] limit step when averaging model by @xingchensong in #2332
- fix 'th_accuracy' not in transducer by @DaobinZhu in #2337
- [dataset] support bucket by seq length by @Mddct in #2333
- [examples] remove useless yaml by @xingchensong in #2343
- [whisper] support arbitrary language and task by @xingchensong in #2342
- (!! breaking changes, happy whisper happy life !!) 💯💯💯
- Minor fix decode_wav by @kobenaxie in #2340
- fix comment by @Mddct in #2344
- [w2vbert] support w2vbert fbank by @Mddct in #2346
- [dataset ] fix typo by @Mddct in #2347
- [wenet] fix args.enc by @Mddct in #2354
- [examples] Initial whisper results on wenetspeech by @xingchensong in #2356
- [examples] fix --penalty by @xingchensong in #2358
- [paraformer] add decoding args by @xingchensong in #2359
- [transformer] support flash att by 'torch scaled dot attention' by @Mddct in #2351
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
- [conformer] support flash att by torch sdpa by @Mddct in #2360
- (!! breaking changes, please update to torch2.x torchaudio2.x !!) 🚀🚀🚀
- [conformer] sdpa default to false by @Mddct in #2362
- [transformer] fix bidecoder sdpa by @Mddct in #2368
- [runtime] Configurable blank token idx by @zhr1201 in #2366
- [wenet] modify - runtime/code/decoder more faster by @Sang-Hoon-Pakr in #2367
- (!! Significant improvement on warmup when using libtorch !!) 🚀🚀🚀
- [lint] fix lint by @cdliang11 in #2373
- [examples] better results on wenetspeech using revised transcripts by @xingchensong in #2371
- (!! Significant improvement on results of whisper !!) 💯💯💯
- [dataset] support pad or trim for whisper decoding by @Mddct in #2378
- [bin/recognize.py] support numworkers and compute dtype by @Mddct in #2379
- (!! Significant improvement on inference speed when using fp16 !!) 🚀🚀🚀
- [whisper] fix decoding maxlen by @Mddct in #2380
- fix whisper ckpt modify error by @fclearner in #2381
- 更新 recognize.py by @Mddct in #2383
- [transformer] add cross attention by @Mddct in #2388
- (!! Significant improvement on inference speed of attention_beam_search !!) 🚀🚀🚀
- [paraformer] fix some bugs by @Mddct in #2389
- new modules and methods by @Mddct in 🤩🤩🤩
New Contributors
- @Qiaochu-Song made their first contribution in #2330
- @Sang-Hoon-Pakr made their first contribution in #2367
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
Full Changelog: v3.0.0...v3.0.1
WeNet 3.0.0
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
New Features
- Support full part of bestrq #1869, #2060
- Support GPU Tlg streaming #1878
- Support streaming ASR web demo #1888
- Support k2 rnnt loss and delay penality #1909
- Supports context biasing #1931, #1936
- Support ZeroPrompt (not merged) #1943
- Support M1 Mac onnxruntime #1953
- Support ITN runtime #2001, #2042, #2246
- Support wav2vec2 #2034, #2035
- Support part of w2vbert training #2039
- wenet cli #2047, #2054, #2075, #2082, #2088, #2087, #2098, #2101, #2122 (!! simple and fast !!) 🛫
- Support E-Branchformer module #2013
- Support deepspeed #1849, #2168, #2123 (!! big big big !!) 💯
- LoRA support (not merged) #2049
- support batch decoding for ctc_prefix_beam_search & attention_rescoring #2059 (!! simple and fast !!) 🛫
- support ali-paraformer #2067, #2078, #2093, #2096, #2099, #2124, #2139, #2140, #2155, #2219, #2222, #2277, #2282, #2289, #2314, #2324
- support Contrastive learning for unified models #2100
- support context biasing with ac automaton #2128, #2136
- support whisper arch #2141, #2157, #2196, #2313, #2322, #2323
- Support gradient checkpointing for Conformer & Transformer (whisper) #2173, #2275
- ssh-launcher for multi-node multi-gpu training #2180, #2265
- u2++-lite training support #2202
- support blank penalty #2278
- support speaker in dataset #2292
- Whisper inference support in cpp runtime #2320
What's Changed
- Upgrade libtorch CPU runtime with IPEX version #1893
- Refine ctc alignment #1966
- Use torchrun for distributed training #2020, #2021
- Refine traning code #2055, #2103, #2123, #2248, #2252, #2253, #2270, #2286, #2288, #2312 (!! big changes !!) 🚀
- mv all ctc functions to ctc_utils.py #2057 (!! big changes !!) 🚀
- move search methods to search.py #2056 (!! big changes !!) 🚀
- move all k2 related functions to k2 #2058
- refactor and simplify decoding methods #2061, #2062
- unify decode results of all decoding methods #2063
- refactor(dataset): return dict instead of tuple #2106, #2111
- init_model API changed #2116, #2216 (!! big changes !!) 🚀
- move yaml saving to save_model() #2156
- refine tokenizer #2165, #2186 (!! big changes !!) 🚀
- deprecate wenetruntime #2194 (!! big changes !!) 🚀
- use pre-commit to auto check and lint #2195
- refactor(yaml): Config ctc/cmvn/tokenizer in train.yaml #2205, #2229, #2230, #2227, #2232 (!! big changes !!) 🚀
- train with dict input #2242, #2243 (!! big changes !!) 🚀
- [dataset] keep pcm for other task #2268
- Updgrad torch to 2.x #2301 (!! big changes !!) 🚀
- log everything to tensorboard #2307
New Bug Fixes
- Fix NST recipe #1863
- Fix Librispeech fst dict #1929
- Fix bug when make shard.list for *.flac #1933
- Fix bug of transducer #1940
- Avoid problem during model averaging when there is parameter-tying. #2113
- [loss] set zero_infinity=True to ignore NaN or inf ctc_loss #2299
- fix android #2303
❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
Many thanks to all the contributors !!!!! I love u all.
WeNet 2.2.1
What's Changed
- Add http server/client @aluminumbox #1670
- Add Trt (Myelin) support for streaming ASR @yuekaizhang #1679
- Support OpenVino @FionaZZ92 #1700
- Support ONNX GPU export, add librispeech results, and fix V2 streaming decode issue for efficient conformer @zwglory #1701
- Support ort backend in wenetruntime @xingchensong #1708
- Support LFMMI @aluminumbox #1725
- Support Paraformer @MrSupW & @robin1001 #1738 & #1749 & #1791 & #1795
- Support part of bestrq @Mddct #1750 & #1754 & #1824
- Remove concat after to simplify the code flow #1762 & #1763 & #1764
- Add riva cuda tlg decoder @yuekaizhang #1773
- Add CUDA TLG nbest and mbr decoding @yuekaizhang #1804
- Support IPEX @ZailiWang #1816
- Support Branchformer @kli017 #1845
- Support GPU hotword @zwglory #1860
WeNet 2.2.0
What's Changed
- support exporting squeezeformer to onnx (CPU & GPU) by @yygle in #1593 and #1634
- support horizon x3 pi by @xingchensong in #1597
- support noisy student training by @NevermoreCY in #1600
- support efficient conformer by @zwglory in #1636
- add blank scale for wfst decoding by @simonwang517 in #1646
WeNet 2.1.0
What's Changed
WeNet Python Binding Models
This release is for hosting the wenet python binding models.
WeNet 2.0.0
The following features are stable.
- U2++ framework for better accuracy
- n-gram + WFST language model solution
- Context biasing(hotword) solution
- Very big data training support with UIO
- More dataset support, including WenetSpeech, GigaSpeech, HKUST and so on.
WeNet 1.0.0
Model
- propose and support U2++, as the following graph shows, which uses both forward and backward information at training and decoding.
- support dynamic left chunk training and decoding, so we can limit history chunk at decoding to save memory and computation.
- support distributed training.
Dataset
Now we support the following five standard speech datasets, and we got SOTA result or close to SOTA result.
数据集 | 语言 | 数据量(h) | 测试集 | CER/WER | SOTA |
---|---|---|---|---|---|
aishell-1 | 中文 | 200 | test | 4.36 | 4.36(WeNet) |
aishell-2 | 中文 | 1000 | test_ios | 5.39 | 5.39(WeNet) |
multi-cn | 中文 | 2385 | / | / | / |
librispeech | 英文 | 1000 | test_clean | 2.66 | 2.10(EspNet) |
gigaspeech | 英文 | 10000 | test | 11.0 | 10.80(EspNet) |
Productivity
Here are some features related to productivity.
- LM support. Here is the system design or LM supporting. WeNet can work with/without LM according to your applications/scenarios.
- timestamp support.
- n-best support.
- endpoint support.
- gRPC support
- further refine x86 server and on-device android recipe.
WeNet 0.1.0
Major Features
- Joint CTC/AED model structure
- U2, dynamic chunk training support
- Torchaudio support
- Runtime x86 and android support