NOMAD has been trained by finetuning the wav2vec 2.0 small model using degraded speech samples from Librispeech. We provide a script that allows you to degrade Librispeech or any other clean dataset. As degrading the whole train-clean-100
partition of Librispeech will generate 225 GB of data, we do not recommend to do so. Instead, we directly provide a link to download the subset of the degraded Librispeech samples that we used to train and validate NOMAD. For the sake of completeness, we still provide instructions to generate the degraded samples as we did.
-
Download the datasets:
- Librispeech (clean data)
- MS-SNSD (required for background noise degradation)
-
Modify two parameters in the YAML configuration file
src/config/config_audio_degrader.yaml
root
path to Librispeechroot_noise
path to MS-SNSD
-
Run
src/utils/audio_degrader_training.py
This will generate:
- Degraded data in
train-clean-100-degraded
located where is Librispeech. - Converted wav files of clean Librispeech, which are needed to calculate the NSIM with ViSQOL (see below for more details).
- A file
degraded_data.csv
in your working directory, including information on the generated data. - A file
degraded_data_visqol_format.csv
in your working directory, which is formatted to run ViSQOL.
The script can be time-consuming, we recommend to run it in background e.g., using nohup
.
The NSIM is calculated using ViSQOL.
- Follow the instructions in the repo to install ViSQOL
- Set
--batch_input_csv degraded_data_visqol_format.csv
This will generate a csv file including patchwise NSIM scores. To train NOMAD we simply average them.
Triplet sampling can be done by running src/utils/nsim_triplet_sampling.py
.
This script samples data from train_nsim.csv
and valid_nsim.csv
respectively and creates the triplet to train and validate NOMAD which are saved intrain.csv
and valid.csv
.
Note that in the paper we have not used clean data in the triplets but background noise at 40 dB or OPUS/MP3 at 128 kbps which can be indistinguishable from clean data.
We have provided a modified script that also includes clean data together with the files train.csv
and valid.csv
which are the ones used to train NOMAD.