Need some clarification on training on already pretrained model #602

JRMeyer · 2021-03-08T03:03:58Z

JRMeyer
Mar 8, 2021
Maintainer

>>> Sushantmkarande
[April 19, 2019, 7:35am]

Hello,

I have successfully tried to train deepspeech 0.4.1 model on my own
dataset. these are some steps slash
downloaded mozilla common voice 22gb corpus for english. slash
I was going to create my new tsv. but could not able to figure out what
is client id in corpus tsv slash
so i just overwrite my own sentence in corpus tsv and replaced my mp3
file with corresponding tsv path name for 15 samples. slash
but this time I am going to create big data around 600 sample so my
question is what is client id in mozilla corpus or how do i create a big
sample data for this model is there any script available for the same.

2. accuracy on indian accent is very low. will it help if i retrain the
model using mozilla indian accent samples only which is already been
used to train the actual 0.4.1 model.

3. is there any preprocessing need to be done to minimize noise while
giving input as wav file to model to get prediction. I am using
pyaudio with this setting

CHUNK = 1024 slash
FORMAT = pyaudio.paInt16 slash
CHANNELS = 1 slash
RATE = 16000

[This is an archived TTS discussion thread from discourse.mozilla.org/t/need-some-clarification-on-training-on-already-pretrained-model]

JRMeyer · 2021-03-08T03:04:01Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[April 19, 2019, 9:33am]

> accuracy on indian accent is very low. will it help if i retrain the
> model using mozilla indian accent samples only which is already been
> used to train the actual 0.4.1 model.

that sounds like you are going to train several times on the same data:
bad idea

> is there any preprocessing need to be done to minimize noise while
> giving input as wav file to model to get prediction. I am using
> pyaudio with this setting

That's up to you

> I was going to create my new tsv. but could not able to figure out
> what is client id in corpus tsv slash
> so i just overwrite my own sentence in corpus tsv and replaced my mp3
> file with corresponding tsv path name for 15 samples.

I'm not sure what you did there ... You should just use import_cv2.py
to import Common Voice released dataset.

If you have your own dataset, why go through that complicated process?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:04:03Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> Sushantmkarande
[April 20, 2019, 5:33am]

quick question what
should be the epoch no for training pretrianed 0.4.1 model mine starting
at 1341 is it right.

Preprocessing ['/home/sush/Desktop/git_lfs_deepspeech/DeepSpeech/own_data/train1.csv']
Preprocessing done
Preprocessing ['/home/sush/Desktop/git_lfs_deepspeech/DeepSpeech/own_data/dev1.csv']
Preprocessing done
W Parameter --validation_step needs to be >0 for early stopping to work
I STARTING Optimization
I Training epoch 1341...
10% (17 of 168) |## | Elapsed Time: 0:03:48 ETA: 0:35:08

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:04:06Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> reuben
[April 22, 2019, 12:03am]

It'll change depending on your train set and batch size, so just ignore
that and always use negative values for the --epochs flag when fine
tuning. This has been updated in master to always be relative to avoid
confusion, FWIW.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:04:09Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> Sushantmkarande
[April 22, 2019, 6:48am]

thanks for the help appreciated

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need some clarification on training on already pretrained model #602

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Need some clarification on training on already pretrained model #602

JRMeyer Mar 8, 2021 Maintainer

Replies: 4 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author