Generate voice command dataset #814

JRMeyer · 2021-03-08T04:07:33Z

JRMeyer
Mar 8, 2021
Maintainer

>>> subhash
[September 2, 2019, 9:55am]

I want to create voice command dataset for english keywords and train my
model with it. I don't want the crowdsourced way to create it, is there
anyway to generate it?

This is what I plan to do to generate it:

From the web I found
that
we can use existing audio-transcript data and locate word in respective
audio files. Upon further exploring I found that there is this technique
called - FA (Forced alignment). Using FA, one can locate individual
word's timestamps in audio file and then I can extract them using sox or
something else. slash
2 Weeks ago Mozilla released FA using
deepspeech. I am not sure if it can
word level Forced alignment.

Before doing all this, I want to ask if anyone knows how to generate
voice command dataset programatically.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/generate-voice-command-dataset]

JRMeyer · 2021-03-08T04:07:36Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[September 2, 2019, 11:22am]

> 2 Weeks ago Mozilla released FA using
> deepspeech . I am not sure if it
> can word level Forced alignment.

Likely that
[ slash Tilman_Kamp](
can help
![:slight_smile:](

> I want to create voice command dataset for english keywords and train
> my model with it. I don't want the crowdsourced way to create it, is
> there anyway to generate it?

The best way to ask is what do you want to achieve ? Voice command ? If
so you can try just re-using the english model as-is and setup a
command-specific language model. We tested that to provide very good
results.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T04:07:38Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> subhash
[September 2, 2019, 12:06pm]

Let me elaborate on what I am after. I want to build a voice command use
case on Android app using deepspeech. The default model for android
works fine for the use case but is slow. It takes 4 seconds for 2
seconds voice command. This is not going to help me. I want to reduce
the latency. A similar discussion happened in this
thread
where you suggested to reduce complexity of the model (by reducing
n_hidden=2048 to lower value, I plan to use 256) and retrain it. I
believe the latency should reduce with this new model. Now I need data
to train it. I think I can not use the large dataset that the deepspeech
main model is trained with (Correct me if wrong). Hence I thought to
generate the voice command dataset from voice corpus.

On the voice command use case, I would be using this in various
applications and hence I can not have fix command set. For each
application, I might have a new command set. Hence I am looking forward
for a way to generate my training voice command dataset.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate voice command dataset #814

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Generate voice command dataset #814

JRMeyer Mar 8, 2021 Maintainer

Replies: 2 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author