You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I appreciate your efforts, nice work.
But your audio_toolkit was implement in librosa and numpy, which was not differentiable.
It might limited the application. Eg. If I have an TTS model to generated Mel spectrogram, and if your dvector if fully differentiable, we can use this like a discriminator, to force the TTS model output exactly as expected person.
From waveform to Melspectrogram, you can make preprocessing fully differentiable with torchaudio, and it seems it can keep consitency with librosa
The text was updated successfully, but these errors were encountered:
Hi, thanks for your suggestion. I'm actually considering ditching librosa for torchaudio especially after I chose to do silence trimming with sox instead of webrtcvad.
Since I'd like to make the preprocessing modules as simple as possible (import less packages as possible), I probably need some time to study the usage of sox effects in the most recent version of torchaudio.
I've developed completely new preprocessing toolkits which use torchaudio, can be compiled with TorchScript and be used anywhere without any dependencies.
I appreciate your efforts, nice work.
But your audio_toolkit was implement in librosa and numpy, which was not differentiable.
It might limited the application. Eg. If I have an TTS model to generated Mel spectrogram, and if your dvector if fully differentiable, we can use this like a discriminator, to force the TTS model output exactly as expected person.
From waveform to Melspectrogram, you can make preprocessing fully differentiable with torchaudio, and it seems it can keep consitency with librosa
The text was updated successfully, but these errors were encountered: