You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that I am lacking a bit the general understanding of the embedding encoding and the synthesizer. Please allow me to post three questions here:
If I only want to use and optimize Angela Merkel's voice, wouldn't it make sense to delete all voice inputs and only leave the voice of Merkel? Or is at least one other (female) voice needed to make it easier for the model to train itsself? What I did was to delete all male voices and leave Merkel's voice and that one from eva_k. Or would it have been more (time) efficient only to use the one target voice? And how about the parameter settings? In my case of two voices I changed the model settings to
But I have absolutely no idea if that makes sense. I have a Geforce 2070 with 8 GB RAM. I chose the parameters is that way to use nearly 100% of the RAM. But I also could have set
If I only want to train and optimize my own voice - am I right that I should only train the model with the (only existing) male voice and with my own audio samples I import by your cool Wikipedia read&record tool? And same parameter question like above.
Today I used the toolbox for the very first time. My Angela voice was trained so far for about 12 hours and the result was impressing! Not perfect of course, 12 hours are not enough, I know, but impressing. But what I do not understand: If I enter a german text into the text field and press the synthesize&vocode button again and again the audio output quality always changes. But why? I thought always the same model (embedding and synthesizer) is used. Same input = same output. So why does it change every time?
Best regards
Marc
The text was updated successfully, but these errors were encountered:
Hello!
It seems that I am lacking a bit the general understanding of the embedding encoding and the synthesizer. Please allow me to post three questions here:
speakers_per_batch = 2
utterances_per_speaker = 1000
But I have absolutely no idea if that makes sense. I have a Geforce 2070 with 8 GB RAM. I chose the parameters is that way to use nearly 100% of the RAM. But I also could have set
speakers_per_batch = 20
utterances_per_speaker = 100
to take the same amount of RAM.
If I only want to train and optimize my own voice - am I right that I should only train the model with the (only existing) male voice and with my own audio samples I import by your cool Wikipedia read&record tool? And same parameter question like above.
Today I used the toolbox for the very first time. My Angela voice was trained so far for about 12 hours and the result was impressing! Not perfect of course, 12 hours are not enough, I know, but impressing. But what I do not understand: If I enter a german text into the text field and press the synthesize&vocode button again and again the audio output quality always changes. But why? I thought always the same model (embedding and synthesizer) is used. Same input = same output. So why does it change every time?
Best regards
Marc
The text was updated successfully, but these errors were encountered: