Use piper-phonemize to convert text to token IDs #453

csukuangfj · 2023-11-28T13:47:24Z

The generated wave with piper-phonemize for tokenization listens much better.

For the following text

  The sun shone bleakly in the sky, its meager light struggling 
to penetrate the thick foliage of the  forest. Birds sang their 
songs up in the crowns of the trees, fluttering from one branch to the other.

with the following scripts:

#!/usr/bin/env bash

python3 ./python-api-examples/offline-tts.py \
  --vits-model=./vits-piper-en_US-lessac-medium/en_US-lessac-medium.onnx \
  --vits-data-dir=./build-shared/install/share/espeak-ng-data \
  --output-filename=./with-piper-phonemize.wav \
  "The sun shone bleakly in the sky, its meager light struggling to penetrate the thick foliage of the forest. Birds sang their songs up in the crowns of the trees, fluttering from one branch to the other."

python3 ./python-api-examples/offline-tts.py \
  --vits-model=./vits-piper-en_US-lessac-medium/en_US-lessac-medium.onnx \
  --vits-lexicon=./vits-piper-en_US-lessac-medium/lexicon.txt \
  --vits-tokens=./vits-piper-en_US-lessac-medium/tokens.txt \
  --output-filename=./with-lexicon.wav \
  "The sun shone bleakly in the sky, its meager light struggling to penetrate the thick foliage of the forest. Birds sang their songs up in the crowns of the trees, fluttering from one branch to the other."

You can find the generated waves below.
(I have converted *.wav to *.mov since GitHub does not allow us to upload .wav)

with lexicon	with piper-phonemize
https://github.com/k2-fsa/sherpa-onnx/assets/5284924/39bf86af-4dde-4ce5-9ef1-f2df0a421c75	https://github.com/k2-fsa/sherpa-onnx/assets/5284924/7fcccc17-fa30-4178-8cea-112a760780b4

Notice

piper-phonemize is able to split a long text into senetences.
For each sentence, piper-phonemize adds BOS and EOS to it, so you can find that there is a pause between sentences.
The frontend code for piper-phonemize is super simple. We don't need a lexicon.txt or a tokens.txt anymore.

There should be no OOVs any longer.

The proununciation of a word is not fixed in the lexicon, rather it is determined by its surrounding words.

TODO

Refactor the Lexicon class
Support very long text by processing them separately
Update the meta data for exported models

CC @anita-smith1 @synesthesiam @MXC48 @rmcpantoja @beqabeqa473

It should fix the following issues:

though there is still some difference ( a slight loss in pronunciation compared to the original coqui model

FYI: Download links about Android APKs for piper models rhasspy/piper#257 (comment)

Why not make an Android port of piper_phonemize and use it in next gen TTS instead of a lexicon? These voices could be used in a screen reader in the future, and there will be many words will try to read that may not be in that lexicon.

FYI: Download links about Android APKs for piper models rhasspy/piper#257 (comment)

I am not sure you can cover everything.

It might be better to make a condition and use piper_phonemize for piper models.

Yes, this will result in adding espeak-ng, but it will be much better
than adding words manually

FYI: Download links about Android APKs for piper models rhasspy/piper#257 (comment)

single words seem to have poor pronunciations compared to same words in phrases.

rmcpantoja · 2023-11-28T14:01:15Z

That's OK👍🏻

Use piper-phonemize to convert text to token IDs

d212ac0

use tokens.txt to convert phonemes to IDs

ad5dada

This was referenced Nov 29, 2023

don't make connections #435

Closed

Language #454

Closed

csukuangfj added 17 commits November 29, 2023 19:03

fix building wheels

0f87339

Fix pip install on Windows

812fce8

test building wheels

397c32f

refactor lexicon

826a6d1

fix ci for c api

0a19b9c

Fix CI error about wstring_convert not found

3233c5d

Fix nodejs test

1183555

fix nodejs CI

c6ea934

remove extra files for pip install

81e916a

Fix swift api

3e18fec

Fix pkg-config

2b530da

Fix iOS

fb799f6

Fix kotlin

f43d4f3

Fix Android

4818962

fix building apk

d9dd0d7

fix android warnings

e3bf431

add more data

a8c68fb

csukuangfj changed the title ~~WIP: Use piper-phonemize to convert text to token IDs~~ Use piper-phonemize to convert text to token IDs Nov 30, 2023

csukuangfj merged commit 62dc3c3 into k2-fsa:master Nov 30, 2023
2 of 161 checks passed

csukuangfj deleted the english-piper-phonemize branch November 30, 2023 15:57

This was referenced Nov 30, 2023

Install from source, wrong path to libonnxruntime_providers_cuda.so #392

Closed

Update lexicon.cc for new Language #455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use piper-phonemize to convert text to token IDs #453

Use piper-phonemize to convert text to token IDs #453

csukuangfj commented Nov 28, 2023

rmcpantoja commented Nov 28, 2023 •

edited

Loading

Use piper-phonemize to convert text to token IDs #453

Use piper-phonemize to convert text to token IDs #453

Conversation

csukuangfj commented Nov 28, 2023

Notice

TODO

rmcpantoja commented Nov 28, 2023 • edited Loading

rmcpantoja commented Nov 28, 2023 •

edited

Loading