This is a project to classify the voices of a person with articulation disorder using deep learning, and it was newly refactored based on my graduation project.
- Data
- 9 words, 89 each
- 뉴스(news), 리모컨(remote controller), 소리크게(volume up), 소리작게(volume down), 시간(time), 오늘일정(today schedule), 오늘날씨(today weather), 지니야(genie), 클로바(clova)
- why wav format?
- The wav format is uncompressed, being an exact copy of the source audio
- Train / Validation / Test = 55:17:17 = 0.6:0.2:0.2
- Data Preprocessing
- change files
- raw -> to_wav -> separated
- clover70.m4a -> clover70.wav: online converter(https://convertio.co/m4a-wav/)
- clover70.wav -> clover_0.wav ~ clover_69.wav: cut them one by one with 'WavePad'
Data Format Audio Channel Sample Rate Bit per Sample Encoding raw m4a mono 44.1 kHz 16 bit . to_wav wav mono 44.1 kHz 16 bit pcm separated wav mono 44.1 kHz 16 bit pcm - raw -> to_wav -> separated
- make train/test data
- RenameFiles.py
- rename separated raw data files on the local machine
- DivideTrainValTest.py
- divide files into trian files, validation files and test files on the local machine
- ExtractFeature.py
- extract MFCC features
- decide the length of input same
- normalize MFCC feature values to values between 0 and 1
- store features into a npz file
- MakeXFeatures.py
- combine all npz files into one npz file for train, validation and test.
- MakeLables.py
- make y labels for train,validation and test
- RenameFiles.py
- Data
- Data/test_X.npz
- Data/test_y.npz
- Data/train_X.npz
- Data/train_y.npz
- Data/val_X.npz
- Data/val_y.npz
- Train
- Train.ipynb
- Test
- Publication
- 2019 Korea Computer Congress (KCC), Jeju, Korea
- https://github.com/dlgur1994/Classifying-Words-of-a-Person-with-Articulation-Disorder-using-Deep-Learning/blob/main/Publication/조음%20장애인들의%20음성인식을%20위한%20AI%20어플리케이션.pdf