refactor `multilingual` option #1148

MahmoudAshraf97 · 2024-11-16T16:08:59Z

Summary:

Added test for multilingual option with english-german audio
I've already removed output_language argument as it is redundant, you can get the same functionality with task="translate" and I've verified this with several model sizes
use the correct encoder_output for language detection
enabled the same functionality for batched inference

Copilot reviewed 3 out of 3 changed files in this pull request and generated no suggestions.

Comments skipped due to low confidence (1)

faster_whisper/transcribe.py:219

This line assumes tokenizer.language is always in prompt, which might cause a ValueError if not found. Add a check to ensure tokenizer.language is in prompt before getting its index.

language_token_index = prompt.index(tokenizer.language)

* Added test for `multilingual` option with english-german audio * removed `output_language` argument as it is redundant, you can get the same functionality with `task="translate"` * use the correct `encoder_output` for language detection in sequential transcription * enabled `multilingual` functionality for batched inference

MahmoudAshraf97 added 2 commits November 17, 2024 15:51

add tests

61723a3

initial commit

f4fd527

MahmoudAshraf97 force-pushed the multilingual branch from 8002d62 to f4fd527 Compare November 17, 2024 13:53

enable multiligual for batched inference

5277fae

MahmoudAshraf97 requested a review from Copilot November 19, 2024 19:57

Copilot AI reviewed Nov 19, 2024

View reviewed changes

MahmoudAshraf97 added 2 commits November 19, 2024 23:01

fix docstring

8762e8b

add warnings for invalid arguments combination

dc2a747

MahmoudAshraf97 changed the title ~~RFC: multilingual option~~ refactor multilingual option Nov 19, 2024

MahmoudAshraf97 marked this pull request as ready for review November 19, 2024 21:13

MahmoudAshraf97 merged commit bcd8ce0 into SYSTRAN:master Nov 19, 2024
3 checks passed

MahmoudAshraf97 deleted the multilingual branch November 19, 2024 21:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor `multilingual` option #1148

refactor `multilingual` option #1148

MahmoudAshraf97 commented Nov 16, 2024 •

edited

Loading

refactor multilingual option #1148

refactor multilingual option #1148

Conversation

MahmoudAshraf97 commented Nov 16, 2024 • edited Loading

Choose a reason for hiding this comment

refactor `multilingual` option #1148

refactor `multilingual` option #1148

MahmoudAshraf97 commented Nov 16, 2024 •

edited

Loading