Release v1.0.5.1 · Sharrnah/whispering

Standalone Release File (2.30 GB):
Download Server:

Changelog:

[FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
[FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
[FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
[FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)
[BUGFIX] Added missing OCR dependency in Standalone Release.

OCR Usage:

Select a window title either with the --ocr_window_name start argument
or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
Select OCR Language in the remote client.
Click on OCR transl..
If the OCR AI model is not already downloaded, it will first download it (might take a bit).
It then tries to focus the window with the title and take a screenshot,
After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

Provide feedback