Production First and Production Ready End-to-End Text-to-Speech Toolkit
Note: This project is at its early statge now. Its design and implementation are subjected to change.
We suggest installing WeTTS with Anaconda or Miniconda. Clone this repo:
git clone https://github.com/wenet-e2e/wetts.git
Create environment:
conda create -n wetts python=3.8 -y
conda activate wetts
pip install -r requirements.txt
conda install -n wetts pytorch=1.11 torchaudio cudatoolkit=10.2 -c pytorch -y
Please note you should use cudatoolkit=11.3
for CUDA 11.3.
We mainly focus on end to end, production, and on-device TTS. We are going to use:
- backend: end to end model, such as:
- frontend:
- Text Normalization: WeTextProcessing
- Prosody & Polyphones: Unified Mandarin TTS Front-end Based on Distilled BERT Model
We plan to support a variaty of open source TTS datasets, include but not limited to:
- baker, Chinese Standard Mandarin Speech corpus open sourced by Data Baker.
- AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus.
- Opencpop, Mandarin singing voice synthesis (SVS) corpus open sourced by Netease Fuxi.
We plan to support a variaty of hardwares and platforms, including:
- x86
- Android
- Raspberry Pi
- Other on-device platforms
For Chinese users, you can aslo scan the QR code on the left to follow our offical account of WeNet. We created a WeChat group for better discussion and quicker response. Please scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.
![]() |
![]() |
---|
Or you can directly discuss on Github Issues.
- We borrow a lot of code from vits for VITS implementation.
- We refer PaddleSpeech for
pinyin
lexicon generation.