Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on first trial: Passing sample rate to mfcc_hires.conf and doing the decoding #1

Open
SvenST89 opened this issue May 12, 2022 · 3 comments

Comments

@SvenST89
Copy link

SvenST89 commented May 12, 2022

Hey there!
First of all: Nice tutorial. Easy to follow and well-explained.
Yet, I encountered some errors on the first trials.
I have one question and two suggestions/comments for improvement.

1. What does line 61 in main.py really do? I could not figure out its sense. So I adjusted it for my needs accordingly (see bullet point 2)

2. I adjusted the code section in which we pass the sample rate to mfcc_hires.conf. I added the strip()-method to line 60 as the code was throwing an error on the first execution, as I had trailing spaces. So my suggestion looks as follows:

# Reformat the line to use the sample rate of the .wav file

line = line.strip().split("=")
print("list of line elements in mfcc_hires.conf file: ", line)
line[1] = sample_rate # overwrites the sample rate in the list 'line' at index position '1'
myseparator="="
line = myseparator.join(line)

3. I created a Kaldi-like 'text' file as the decoding step did not work without this file.

@completelyboofyblitzed
Copy link

Hey @SvenST89! Thank you for sharing, I bumped into a text file absent problem too, which kind of text file does it need?

@SvenST89
Copy link
Author

Hi @kak-to-tak, this text file contains transcriptions of each utterance in the audio file. If speaker information in your project setup is available, then the structure of each line in this 'text' file could may have the following structure: <speaker_id>_<utterance_ID> <transcription of each sentence/segment if you have segmented the audio file>. Check the Kaldi dummy tutorial here to get an idea of it. Usually, you have to prepare such a training file manually and make the transcription of the file. Why? You need to train the algorithm. If you do not train the 'brain' and feed it with transcriptions the algo will not learn how to transcribe.

Check also this Kaldi tutorial to get a glimpse of the functioning of Kaldi.

@completelyboofyblitzed
Copy link

Got it, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants