This repo contains some of the common publically available audio data that you can download for ASR or other speech activities.
Source | Link | Size (Hours) |
---|---|---|
Mozilla | Common Voice 10.0-tr | 79 |
Mozilla | Common Voice 10.0-en | 3050 |
Voxforge | Voxforge-en-16kHz | 130 |
Voxforge | Voxforge-tr-16kHz | 3 |
You can download and create manifest.jsonl
for different languages supported by the dataset.
- In the following script
common_voice_dataset.ipynb
change the version and language to you wish to download from Mozilla.
version = 'cv-corpus-8.0-2022-01-19'
language = 'cv-corpus-8.0-2022-01-19-en'
- In the following script
voxforge_dataset.ipynb
change the language to you wish to download from Voxforge.
VOXFORGE_URL_16kHz = 'http://www.repository.voxforge1.org/downloads/en/Trunk/Audio/Main/16kHz_16bit/'
Reference Github Link: ASR-Audio-Data-Links