Error While Running transcription = stt.transcribe("audio.wav") #36

nayeem01 · 2024-10-17T07:04:01Z

ValueError: You are trying to return timestamps, but the generation config is not properly set. Make sure to initialize the generation config with the correct attributes that are needed such as no_timestamps_token_id. For more details on how to generate the approtiate config

The text was updated successfully, but these errors were encountered:

shhossain · 2024-10-18T17:55:48Z

@nayeem01 Can you reproduce the error? If so, how?

nayeem01 · 2024-10-19T03:09:39Z

nayeem01 · 2024-10-19T03:10:45Z

@shhossain Whenever I try to run normally, it fails.

aaqif-elo · 2024-11-08T00:45:04Z

Same issue as @nayeem01

Repro:

=> New python project (venv with Python 3.10.11)
=> pip install banglaspeech2text

from banglaspeech2text import Speech2Text
stt = Speech2Text()
transcript = stt.recognize(path_to_audio_file)

The error message directed here:

huggingface/transformers#21878 (comment)

Full Traceback:

Traceback (most recent call last):
File "project\cli.py", line 41, in
transcript_file_path = transcribe.transcribe(audio_path)
File "project\transcribe.py", line 70, in transcribe
transcript = stt.recognize(temp_file_path)
File "project.venv\lib\site-packages\banglaspeech2text\speech2text.py", line 451, in recognize
return self.pipeline(data, *args, **kw)["text"] # type: ignore
File "project.venv\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 283, in call
return super().call(inputs, **kwargs)
File "project.venv\lib\site-packages\transformers\pipelines\base.py", line 1294, in call
return next(
File "project.venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in next
item = next(self.iterator)
File "project.venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 269, in next
processed = self.infer(next(self.iterator), **self.params)
File "project.venv\lib\site-packages\transformers\pipelines\base.py", line 1209, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "project.venv\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 515, in _forward
tokens = self.model.generate(
File "project.venv\lib\site-packages\transformers\models\whisper\generation_whisper.py", line 533, in generate
timestamp_begin = self._set_return_timestamps(
File "project.venv\lib\site-packages\transformers\models\whisper\generation_whisper.py", line 1216, in _set_return_timestamps
raise ValueError(
ValueError: You are trying to return timestamps, but the generation config is not properly set. Make sure to initialize the generation config with the correct attributes that are needed such as no_timestamps_token_id. For more details on how to generate the approtiate config, refer to huggingface/transformers#21878 (comment)

shhossain · 2024-11-08T05:44:32Z

@nayeem01 @aaqif-elo Apologies for the delay; I've been busy. I’ll look into the issue and get it fixed today.

shhossain · 2024-11-08T14:14:13Z

@nayeem01 @aaqif-elo update to latest version and check if problem presists.

pip install BanglaSpeech2Text==1.0.9

aaqif-elo · 2024-11-08T20:09:00Z

Unfortunately, the issue seems to still be there.

I have updated to v1.0.9 and I can see the config added to speech2text.py

# set generation config
            generation_config_model = get_generation_model(self.raw_name)
            try:
                gen_config = transformers.GenerationConfig.from_pretrained(
                    generation_config_model
                )
                self.pipeline.model.generation_config = gen_config
            except Exception as e:
                pass

But I'm still getting the error

ValueError: You are trying to return timestamps, but the generation config is not properly set. Make sure to initialize the generation config with the correct attributes that are needed such as no_timestamps_token_id. For more details on how to generate the approtiate config, refer to https://github.com/huggingface/transformers/issues/21878#issuecomment-1451902363

shhossain · 2024-11-08T20:11:16Z

@aaqif-elo which model you are using?

aaqif-elo · 2024-11-08T21:44:10Z

stt = Speech2Text()

I did not define a model so I am assuming it defaults to base

model: str = "base",

Mushfiq2002 · 2025-01-22T13:16:51Z

Hello,
Did the issue got fixed?Cz it does not seems so from my end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error While Running transcription = stt.transcribe("audio.wav") #36

Error While Running transcription = stt.transcribe("audio.wav") #36

nayeem01 commented Oct 17, 2024

shhossain commented Oct 18, 2024

nayeem01 commented Oct 19, 2024

nayeem01 commented Oct 19, 2024

aaqif-elo commented Nov 8, 2024

shhossain commented Nov 8, 2024

shhossain commented Nov 8, 2024

aaqif-elo commented Nov 8, 2024

shhossain commented Nov 8, 2024 •

edited

Loading

aaqif-elo commented Nov 8, 2024

Mushfiq2002 commented Jan 22, 2025

Error While Running transcription = stt.transcribe("audio.wav") #36

Error While Running transcription = stt.transcribe("audio.wav") #36

Comments

nayeem01 commented Oct 17, 2024

shhossain commented Oct 18, 2024

nayeem01 commented Oct 19, 2024

nayeem01 commented Oct 19, 2024

aaqif-elo commented Nov 8, 2024

shhossain commented Nov 8, 2024

shhossain commented Nov 8, 2024

aaqif-elo commented Nov 8, 2024

shhossain commented Nov 8, 2024 • edited Loading

aaqif-elo commented Nov 8, 2024

Mushfiq2002 commented Jan 22, 2025

shhossain commented Nov 8, 2024 •

edited

Loading