Repository to train voice models with new speech recognition models using KALDI.
Active members of the team working on this repo include:
- Dhruv Rajani (Arizona State University)
- Jim Schwoebel (Boston, MA)
We have a custom phoneme dataset that can be accessed here.
The ASR project is basically to help build an ASR model from a database of phonemes that we have assembled. I have created a training dataset of about 40 phonemes to help recognize my own voice; so that could be a starting point. But there are other programs like KALDI (https://github.com/pykaldi/pykaldi) where you can build custom ASR models based on pre-trained models using things like GMMs and other acoustic and language models (these correct for things such as speaker frequencies and format lengths - the length of the intratrachael tube so that model is not overfitted to one speaker). The goal here is to sort of look deeper into these technologies and build a minimum viable ASR that could have a tailored vocabulary (this may be important for some of the surveys we're working on, especially for medical words in transcription).