Skip to content

Releases: Sharrnah/whispering

v1.1.0.0

23 Nov 17:56
Compare
Choose a tag to compare

Standalone Release File (2.30 GB):

Download Server:

Changelog:

  • [FEATURE] Added TTS (Text 2 Speech) using Silero
  • [FEATURE] Added model download retry, fallback and checksum check.
  • [FEATURE] Added FLAN-T5 conditioning.
  • [TASK] Code restructuring.

Text 2 Speech example:

fvzyuMpe.mp4

v1.0.7.1

14 Nov 15:05
8e45875
Compare
Choose a tag to compare

Standalone Release File (2.30 GB):

Download Server:

Changelog:

  • [BUGFIX] translate to speaker if flan-t5 question processing is disabled
  • [TASK] Added OSC-auto-processing option (To toggle OSC temporarily while app is running)

About FLAN-T5:

flan_process_only_questions and flan_whisper_answer can be enabled, to have FLAN-T5 only answer spoken questions.
That means the from whisperAI recognized text should include a question-typical word and a question-mark.

Since FLAN-T5 can do much more, there might be more possibilities to use this A.I. model in the future.

v1.0.7.0

13 Nov 14:35
3493c50
Compare
Choose a tag to compare

Standalone Release File (2.30 GB):

Download Server:

Changelog:

  • [FEATURE] Added experimental FLAN-T5 AI. supporting automatic answering, continuation to questions or phrases, spoken or written. (see more on https://analyticsindiamag.com/google-ai-introduces-flan-t5-a-new-open-source-language-model/).
  • [FEATURE] Added LID language classifier for auto-detecting the language of text.
  • [FEATURE] Added NLLB200 text translator. Supporting around 200 languages in a single model.
  • [FEATURE] Added config file. (To support more settings without having to add much more Command-line flags)
  • [FEATURE] Added bottom_align HTML parameter to websocket clients. (To make it easier to align streaming overlays at the bottom of the image)
  • [TASK] Updated dependencies
  • [CHANGE] (Breaking change if used as command-line flag!) renamed m2m100_size and m2m100_device to txt_translator_size and txt_translator_device accordingly,

About FLAN-T5:

flan_process_only_questions and flan_whisper_answer can be enabled, to have FLAN-T5 only answer spoken questions.
That means the from whisperAI recognized text should include a question-typical word and a question-mark.

Since FLAN-T5 can do much more, there might be more possibilities to use this A.I. model in the future.

v1.0.5.1

01 Nov 00:56
00f4d50
Compare
Choose a tag to compare

Standalone Release File (2.30 GB):
Download Server:

Changelog:

  • [FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
  • [FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
  • [FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
  • [FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)
  • [BUGFIX] Added missing OCR dependency in Standalone Release.

OCR Usage:

  • Select a window title either with the --ocr_window_name start argument
    or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
  • Select OCR Language in the remote client.
  • Click on OCR transl..
    If the OCR AI model is not already downloaded, it will first download it (might take a bit).
    It then tries to focus the window with the title and take a screenshot,
    After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

v1.0.5

31 Oct 23:11
837e611
Compare
Choose a tag to compare

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.5_win.zip (2.29 GB)

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

Changelog:

  • [FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
  • [FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
  • [FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
  • [FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)

OCR Usage:

  • Select a window title either with the --ocr_window_name start argument
    or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
  • Select OCR Language in the remote client.
  • Click on OCR transl..
    If the OCR AI model is not already downloaded, it will first download it (might take a bit).
    It then tries to focus the window with the title and take a screenshot,
    After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

v1.0.4

26 Oct 17:09
c39627c
Compare
Choose a tag to compare

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.4_win.zip (2.23 GB)

Changelog:

  • [TASK] Changed default recording sample rate to 16000, since the Whisper AI down-sampled it anyway.
  • [TASK] Added audio conversion using pydub (should remove ffmpeg dependency and allows audio processing in RAM)
  • [FEATURE] Added Threaded queue handling for Whisper AI. - This should speed up processing and remove delayed audio recordings.
  • [FEATURE] Added swap textual translation languages to websocket client.
  • [FEATURE] Made "condition on previous text" configurable without needing restart.

v1.0.3

23 Oct 16:48
71a1f0b
Compare
Choose a tag to compare

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.3_win.zip (2.21 GB)

Changelog:

  • [BUGFIX] Attention caching fix for Whisper AI Speed improvement (30% or even more on CPU).
  • [BUGFIX] open_browser argument with wrong path.
  • [FEATURE] Option to disable OSC ASCII conversion. (so it does not need a new release if VRC supports non-ASCII)
  • [FEATURE] Activate typing indicator on audio processing start + send processing start event over websocket.
  • [FEATURE] Show processing indicator on websocket clients.
  • [FEATURE] Broadcast setting changes to all websocket clients.
  • [FEATURE] Added show_transl_results argument to websocket clients to configure display of translations / transcriptions.

v1.0.2

21 Oct 10:27
5fe9a7a
Compare
Choose a tag to compare

v1.0.1

19 Oct 19:46
6f9db3a
Compare
Choose a tag to compare

v1.0.0

18 Oct 18:12
997f27b
Compare
Choose a tag to compare

Standalone Windows Version (Python + ffmpeg included)
Can be downloaded here:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2FWhispering_win32.zip (2.21 GB)

Only CUDA is recommended to install for GPU acceleration.

See included start-*.bat and get-device-list.bat for how to run it.
(same as mentioned in readme except python audioWhisper.py replaced with audioWhisper\audioWhisper.exe.)

do not run audioWhisper.exe directly, or it will create a new .cache directory and download the whisperAI model again.

websocket_remote/ and websocket_clients/* are included as well.

Read README.md for more infos.