Replies: 10 comments
-
>>> lissyx |
Beta Was this translation helpful? Give feedback.
-
>>> reuben |
Beta Was this translation helpful? Give feedback.
-
>>> stergro |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> ftyers |
Beta Was this translation helpful? Give feedback.
-
>>> stergro |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
>>> ftyers |
Beta Was this translation helpful? Give feedback.
-
>>> SamahZaro |
Beta Was this translation helpful? Give feedback.
-
>>> stergro |
Beta Was this translation helpful? Give feedback.
-
>>> stergro
[August 12, 2019, 11:45am]
For some smaller languages the 10 000 hour aim in the common voice
project is very ambitious. How usable are smaller datasets for machine
learning? How important is the number of irregularities in a language in
this context?
Let's take an extreme example and look at the Esperanto dataset with 20
h. The language is completely regular, has no exceptions and the
pronounciation is always clear. How big must a dataset be to create
usefull results in this case? And apart from constructed languages: are
there differences between the natural languages or do they all need the
same amount of data?
Edit: 10 000 hours instead of 100 000 and mentioned the common voice
project.
[This is an archived TTS discussion thread from discourse.mozilla.org/t/are-10-000-hours-of-recordings-necessary-for-every-language]
Beta Was this translation helpful? Give feedback.
All reactions