Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error While Running transcription = stt.transcribe("audio.wav") #36

Open
nayeem01 opened this issue Oct 17, 2024 · 10 comments
Open

Error While Running transcription = stt.transcribe("audio.wav") #36

nayeem01 opened this issue Oct 17, 2024 · 10 comments

Comments

@nayeem01
Copy link

ValueError: You are trying to return timestamps, but the generation config is not properly set. Make sure to initialize the generation config with the correct attributes that are needed such as no_timestamps_token_id. For more details on how to generate the approtiate config

@shhossain
Copy link
Owner

@nayeem01 Can you reproduce the error? If so, how?

@nayeem01
Copy link
Author

Screenshot 2024-10-19 090858

@nayeem01
Copy link
Author

@shhossain Whenever I try to run normally, it fails.

@aaqif-elo
Copy link

Same issue as @nayeem01

Repro:

=> New python project (venv with Python 3.10.11)
=> pip install banglaspeech2text

from banglaspeech2text import Speech2Text
stt = Speech2Text()
transcript = stt.recognize(path_to_audio_file)

The error message directed here:

huggingface/transformers#21878 (comment)

Full Traceback:

Traceback (most recent call last):
File "project\cli.py", line 41, in
transcript_file_path = transcribe.transcribe(audio_path)
File "project\transcribe.py", line 70, in transcribe
transcript = stt.recognize(temp_file_path)
File "project.venv\lib\site-packages\banglaspeech2text\speech2text.py", line 451, in recognize
return self.pipeline(data, *args, **kw)["text"] # type: ignore
File "project.venv\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 283, in call
return super().call(inputs, **kwargs)
File "project.venv\lib\site-packages\transformers\pipelines\base.py", line 1294, in call
return next(
File "project.venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in next
item = next(self.iterator)
File "project.venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 269, in next
processed = self.infer(next(self.iterator), **self.params)
File "project.venv\lib\site-packages\transformers\pipelines\base.py", line 1209, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "project.venv\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 515, in _forward
tokens = self.model.generate(
File "project.venv\lib\site-packages\transformers\models\whisper\generation_whisper.py", line 533, in generate
timestamp_begin = self._set_return_timestamps(
File "project.venv\lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1216, in _set_return_timestamps
raise ValueError(
ValueError: You are trying to return timestamps, but the generation config is not properly set. Make sure to initialize the generation config with the correct attributes that are needed such as no_timestamps_token_id. For more details on how to generate the approtiate config, refer to huggingface/transformers#21878 (comment)

@shhossain
Copy link
Owner

@nayeem01 @aaqif-elo Apologies for the delay; I've been busy. I’ll look into the issue and get it fixed today.

@shhossain
Copy link
Owner

@nayeem01 @aaqif-elo update to latest version and check if problem presists.

pip install BanglaSpeech2Text==1.0.9

@aaqif-elo
Copy link

Unfortunately, the issue seems to still be there.

I have updated to v1.0.9 and I can see the config added to speech2text.py

# set generation config
            generation_config_model = get_generation_model(self.raw_name)
            try:
                gen_config = transformers.GenerationConfig.from_pretrained(
                    generation_config_model
                )
                self.pipeline.model.generation_config = gen_config
            except Exception as e:
                pass

But I'm still getting the error

ValueError: You are trying to return timestamps, but the generation config is not properly set. Make sure to initialize the generation config with the correct attributes that are needed such as no_timestamps_token_id. For more details on how to generate the approtiate config, refer to https://github.com/huggingface/transformers/issues/21878#issuecomment-1451902363

@shhossain
Copy link
Owner

shhossain commented Nov 8, 2024

@aaqif-elo which model you are using?

@aaqif-elo
Copy link

stt = Speech2Text()

I did not define a model so I am assuming it defaults to base

model: str = "base",

@Mushfiq2002
Copy link

Hello,
Did the issue got fixed?Cz it does not seems so from my end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants