Classifying Words of a Person with Articulation Disorder

This is a project to classify the voices of a person with articulation disorder using deep learning, and it was newly refactored based on my graduation project.

Data

9 words, 89 each
- 뉴스(news), 리모컨(remote controller), 소리크게(volume up), 소리작게(volume down), 시간(time), 오늘일정(today schedule), 오늘날씨(today weather), 지니야(genie), 클로바(clova)
why wav format?
- The wav format is uncompressed, being an exact copy of the source audio
Train / Validation / Test = 55:17:17 = 0.6:0.2:0.2

Data Preprocessing

change files
- raw -> to_wav -> separated
  - clover70.m4a -> clover70.wav: online converter(https://convertio.co/m4a-wav/)
  - clover70.wav -> clover_0.wav ~ clover_69.wav: cut them one by one with 'WavePad'
Data Format Audio Channel Sample Rate Bit per Sample Encoding

raw m4a mono 44.1 kHz 16 bit .

to_wav wav mono 44.1 kHz 16 bit pcm

separated wav mono 44.1 kHz 16 bit pcm
make train/test data
- RenameFiles.py
  - rename separated raw data files on the local machine
- DivideTrainValTest.py
  - divide files into trian files, validation files and test files on the local machine
- ExtractFeature.py
  - extract MFCC features
  - decide the length of input same
  - normalize MFCC feature values to values between 0 and 1
  - store features into a npz file
- MakeXFeatures.py
  - combine all npz files into one npz file for train, validation and test.
- MakeLables.py
  - make y labels for train,validation and test
Data
- Data/test_X.npz
- Data/test_y.npz
- Data/train_X.npz
- Data/train_y.npz
- Data/val_X.npz
- Data/val_y.npz

Train

Train.ipynb
- use the CNN algorithm to classify data that has been changed into images
- The classification accuracy for version 1 with three layers of MLP was 75.56%
- Version 2 with CNN has a classification accuracy of 96.08%

Test

Test.ipynb
- confirm the result by input the test speech data into the trained model

Publication

2019 Korea Computer Congress (KCC), Jeju, Korea
https://github.com/dlgur1994/Classifying-Words-of-a-Person-with-Articulation-Disorder-using-Deep-Learning/blob/main/Publication/조음%20장애인들의%20음성인식을%20위한%20AI%20어플리케이션.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Award		Award
Data		Data
Images		Images
Preprocessing		Preprocessing
Presentation		Presentation
Publication		Publication
Test		Test
Train		Train
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifying Words of a Person with Articulation Disorder

About

Releases

Packages

Languages

Data	Format	Audio Channel	Sample Rate	Bit per Sample	Encoding
raw	m4a	mono	44.1 kHz	16 bit	.
to_wav	wav	mono	44.1 kHz	16 bit	pcm
separated	wav	mono	44.1 kHz	16 bit	pcm

dlgur1994/Classifying-Words-of-a-Person-with-Articulation-Disorder-using-Deep-Learning

Folders and files

Latest commit

History

Repository files navigation

Classifying Words of a Person with Articulation Disorder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages