An implementation of Tacotron described in the paper using pytorch. Tacotron: Towards End-to-End Speech Synthesis
Published in INTERSPEECH 2017
- torch 1.3.0
- falcon 1.2.0
- inflect 0.2.5
- librosa 0.5.1
- numpy 1.13.3
- scipy 1.0.0
- Unidecode 0.4.21
- pandas 0.21.0
- LJ-Speech (English)
- KSS-dataset (Korean)
python train.py
- Change options in hyperparams.py
- cleaners option (26-th line) : from 'english_cleaners' to 'korean_cleaners'
- dataset option (29-th line) : from 'LJSpeech' to 'KSS'
- data_path option (30-th line)
-
Change the sample sentences for generating TTS wav files from english to korean during training. (xx-th line in train.py)
python train.py
- You can see the train loss graph.
- Furthermore, you can listen to generated wav files during training.
Loss | wav_files |
---|---|
![]() |
![]() |
tensorboard --logdir=runs
- Download pre-trained model.
- Change option in hyperparams.py
- If you want to generate english wav files, cleaners option (26-th line) should be 'english_cleaners'
- And if you want to generate korean wav files, cleaners option (26-th line) should be 'korean_cleaners'
- Generate TTS wav files
python eval.py --checkpoint_path ./pre_trained_model_path
LJ-Speech | KSS |
---|---|
![]() |
![]() |
If you have any questions or comments on my codes, please email to me. [email protected]
[1] https://github.com/soobinseo/Tacotron-pytorch
[2] https://github.com/hccho2/Tacotron-Wavenet-Vocoder-Korean