Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how many data should i use to train this model? #19

Open
ArtemisZGL opened this issue Apr 6, 2020 · 4 comments
Open

how many data should i use to train this model? #19

ArtemisZGL opened this issue Apr 6, 2020 · 4 comments

Comments

@ArtemisZGL
Copy link

Thanks for your job first, I want to know that how many data should i use to train in your repo. I want to use the cmu arctic dataset for training english tts, but there is about one hour for a speaker, can it work to train in you repo?Because i use the nvidia repo to train but the result is bad. Also, the result is very different for different batch size. I also use some of the libritts data to train on nvdia repo, about 3hour for 10 speaker, but the result is quiet bad too. Do you have some idea about how to train on small dataset ?

@begeekmyfriend
Copy link
Owner

Multi-speaker is supported as well. For instance, you might collect 8 speakers and one hour for each corpus and record the directory in scripts/train_tacotron2.sh. Then the total amounts of data might help.

@ArtemisZGL
Copy link
Author

@begeekmyfriend Thanks for your reply, but one speaker in libritts just 10-20 min, if i just use about 10 speaker of this, will it work?

@begeekmyfriend
Copy link
Owner

I have no idea about your circumstances but it seems too little for everyone's corpus. You might try yourself but the evaluation can not be warranted.

@hassanShabbir1960
Copy link

Thankyou so much sir, for such an amazing work. @begeekmyfriend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants