Release WeNet 3.0.0 · wenet-e2e/wenet

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

New Features

Support full part of bestrq #1869, #2060
Support GPU Tlg streaming #1878
Support streaming ASR web demo #1888
Support k2 rnnt loss and delay penality #1909
Supports context biasing #1931, #1936
Support ZeroPrompt (not merged) #1943
Support M1 Mac onnxruntime #1953
Support ITN runtime #2001, #2042, #2246
Support wav2vec2 #2034, #2035
Support part of w2vbert training #2039
wenet cli #2047, #2054, #2075, #2082, #2088, #2087, #2098, #2101, #2122 (!! simple and fast !!) 🛫
Support E-Branchformer module #2013
Support deepspeed #1849, #2168, #2123 (!! big big big !!) 💯
LoRA support (not merged) #2049
support batch decoding for ctc_prefix_beam_search & attention_rescoring #2059 (!! simple and fast !!) 🛫
support ali-paraformer #2067, #2078, #2093, #2096, #2099, #2124, #2139, #2140, #2155, #2219, #2222, #2277, #2282, #2289, #2314, #2324
support Contrastive learning for unified models #2100
support context biasing with ac automaton #2128, #2136
support whisper arch #2141, #2157, #2196, #2313, #2322, #2323
Support gradient checkpointing for Conformer & Transformer (whisper) #2173, #2275
ssh-launcher for multi-node multi-gpu training #2180, #2265
u2++-lite training support #2202
support blank penalty #2278
support speaker in dataset #2292
Whisper inference support in cpp runtime #2320

Upgrade libtorch CPU runtime with IPEX version #1893
Refine ctc alignment #1966
Use torchrun for distributed training #2020, #2021
Refine traning code #2055, #2103, #2123, #2248, #2252, #2253, #2270, #2286, #2288, #2312 (!! big changes !!) 🚀
mv all ctc functions to ctc_utils.py #2057 (!! big changes !!) 🚀
move search methods to search.py #2056 (!! big changes !!) 🚀
move all k2 related functions to k2 #2058
refactor and simplify decoding methods #2061, #2062
unify decode results of all decoding methods #2063
refactor(dataset): return dict instead of tuple #2106, #2111
init_model API changed #2116, #2216 (!! big changes !!) 🚀
move yaml saving to save_model() #2156
refine tokenizer #2165, #2186 (!! big changes !!) 🚀
deprecate wenetruntime #2194 (!! big changes !!) 🚀
use pre-commit to auto check and lint #2195
refactor(yaml): Config ctc/cmvn/tokenizer in train.yaml #2205, #2229, #2230, #2227, #2232 (!! big changes !!) 🚀
train with dict input #2242, #2243 (!! big changes !!) 🚀
[dataset] keep pcm for other task #2268
Updgrad torch to 2.x #2301 (!! big changes !!) 🚀
log everything to tensorboard #2307

❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤
Many thanks to all the contributors !!!!! I love u all.