Utilities for preprocessing the Switchboard and WSJ corpora in Python3
- 2020.07.31 : wTIMIT support
Before using the utilities, some requirements must meet first:
- Install Python packages:
tqdm torchaudio
- Install sph2pipe executable program
- Convert
.sph
files inLDC2002S09-Hub5e_00
to.wav
files.python3 sph2wav.py .sph <path to sph2pipe> <path to LDC2002S09-Hub5e_00/english> SWB
- Split the eval2000 set by the rules in
hub5e_00.pem
.python3 swb_eval_splitter.py <path to LDC2002S09-Hub5e_00/english>
Convert .wv1
files in WSJ0
and WSJ1
to .wav
files.
python3 sph2wav.py .wv1 <path to sph2pipe> <path to WSJ0/WSJ1> WSJ
Convert .WAV
files in wTIMIT
to .wav
files.
python3 sph2wav.py .WAV <path to sph2pipe> <path to wTIMIT> wTIMIT