This roadmap for WeNet. WeNet is a community-driven project and we love your feedback and proposals on where we should be heading.
Please open up issues or discussion on github to write your proposal. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).
- ONNX support, see #1103
- RNN-T support, see #1261
- Self training, streaming
- Light weight, low latency, on-device model exploration
- TrimTail, see #1487, paper link
- Audio-Visual speech recognition
- OS or Hardware Platforms
- ASIC XPU
- Public Model Hub Support
- HuggingFace, see https://huggingface.co/spaces/wenet/wenet_demo
- ModelScope, see https://modelscope.cn/models/wenet/u2pp_conformer-asr-cn-16k-online/summary
- Vosk like models and API for developers.
- Models(Chinese/English/Japanese/Korean/French/German/Spanish/Portuguese)
- Chinese
- English
- API(python/c/c++/go/java)
- python
- Models(Chinese/English/Japanese/Korean/French/German/Spanish/Portuguese)
- U2++ framework for better accuracy
- n-gram + WFST language model solution
- Context biasing(hotword) solution
- Very big data training support with UIO
- More dataset support, including WenetSpeech, GigaSpeech, HKUST and so on.
- Streaming solution(U2 framework)
- Production runtime solution with
TorchScript
training andLibTorch
inference. - Unified streaming and non-streaming model(U2 framework)