Resample training input samples to align with inference restraints #742

JRMeyer · 2021-03-08T03:48:33Z

JRMeyer
Mar 8, 2021
Maintainer

>>> alchemi5t
[July 23, 2019, 11:54am]

Continuing the discussion from: slash

github.com/mozilla/DeepSpeech

#### Resampling to 16khz

by alchemi5t on 09:04AM - 23 Jul 19
UTC

1 commits changed 3 files with 19 additions and 5
deletions.

> There could be value in ensuring that training can be done at other
> sample rates than 16kHz, but I'm unsure that resampling is the proper
> solution, to be honest.

I have added a variable(can be changed to a flag) which can be set to
desired target SR. But won't having a single sample rate(same as
training and inference) improve convergence?

I've tested training models on different SR and inferencing on 16khz,
and as expected, the model produces unacceptable results(WER, CER, LOSS
and output put together). But the same model infers much better when
using the same SR test files(Test results after training).

It does retain the original audio characteristics after resampling.

Do you think this would not be useful for training?

[This is an archived TTS discussion thread from discourse.mozilla.org/t/resample-training-input-samples-to-align-with-inference-restraints]

JRMeyer · 2021-03-08T03:48:35Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> alchemi5t
[July 23, 2019, 11:57am]
[Maybe resam]ple it to a bracket near 16khz (E.g., 14-18 khz), if you
think other SRs might be worth preserving(If only for having different
SRs in the dataset for helping generalize the model).

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:38Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[July 23, 2019, 11:57am]

> But won't having a single sample rate(same as training and inference)
> improve convergence?

I don't get your point: this is exactly the current situation, training
at 16kHz, inference at 16kHz.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:41Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> alchemi5t
[July 23, 2019, 12:00pm]

The point being i can train at 44khz and not infer on it.

data](https://discourse.mozilla.org/t/trained-model-on-my-own-data/41810/42)
drop-close='true'
.box}

> [ slash lucifera678](
> ok ok listen here. go to util/flags.py . change audio_sample_rate to
> 16000(you set it as 44100). and then you'll see that you can export
> your model. Do i know if that screws up your model? I do not. But can
> you export it? yes. good luck.

slash
Like this guy.

Instead of him having to do whatever he did, if the code resampled his
44khz data to 16k, then he would have not had that issue.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:44Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[July 23, 2019, 12:01pm]

> The point being i can train at 44khz and not infer on it.

I don't understand a single word of what you say. Each sentence
contradicts the next one.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:46Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> alchemi5t
[July 23, 2019, 12:02pm]

I meant, I can train a model on data with 44khz but my inference will
require 16khz data which would make it hard for the model to predict
accurately, if that makes sense.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:49Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[July 23, 2019, 12:04pm]

I still don't get what you are trying to achieve. Can you describe
precisely your problem ?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:52Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> alchemi5t
[July 23, 2019, 12:11pm]

let me simplify the issue. I want to know if i can run inference on any
other SR other than 16k,ATM without code modification.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:54Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> reuben
[July 23, 2019, 12:14pm]

Yes, just specify the sample rate when passing in the audio. Every
function that takes audio samples in the API also takes a sample rate.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:48:57Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> alchemi5t
[July 23, 2019, 12:25pm]

Got it. I thought only 16k was possible(misinterpreted the readme.).
Thanks for clearing things up.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample training input samples to align with inference restraints #742

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Resample training input samples to align with inference restraints #742

JRMeyer Mar 8, 2021 Maintainer

Replies: 9 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author