This roadmap for WeNet. WeNet is a community-driven project and we love your feedback and proposals on where we should be heading.
Please open up issues or discussion on github to write your proposal. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).
- ONNX support, see wenet-e2e#1103
- RNN-T support, see wenet-e2e#1261
- Self training, streaming
- Light weight, low latency, on-device model exploration
- TrimTail, see wenet-e2e#1487, paper link
- Audio-Visual speech recognition
- OS or Hardware Platforms
- IOS, wenet-e2e#1549
- Raspberry Pi, see wenet-e2e#1477
- Harmony OS
- ASIC XPU
- Horizon X3 pi, BPU, see wenet-e2e#1597
- Kunlun XPU, see wenet-e2e#1455
- Public Model Hub Support
- HuggingFace, see https://huggingface.co/spaces/wenet/wenet_demo
- ModelScope, see https://modelscope.cn/models/wenet/u2pp_conformer-asr-cn-16k-online/summary
- Vosk like models and API for developers.
- Models(Chinese/English/Japanese/Korean/French/German/Spanish/Portuguese)
- Chinese
- English
- API(python/c/c++/go/java)
- python
- Models(Chinese/English/Japanese/Korean/French/German/Spanish/Portuguese)
- U2++ framework for better accuracy
- n-gram + WFST language model solution
- Context biasing(hotword) solution
- Very big data training support with UIO
- More dataset support, including WenetSpeech, GigaSpeech, HKUST and so on.
- Streaming solution(U2 framework)
- Production runtime solution with
TorchScript
training andLibTorch
inference. - Unified streaming and non-streaming model(U2 framework)