Releases · Sharrnah/whispering

23 Nov 17:56

Sharrnah

v1.1.0.0

549dea0

v1.1.0.0

Standalone Release File (2.30 GB):

Download Server:

Changelog:

[FEATURE] Added TTS (Text 2 Speech) using Silero
[FEATURE] Added model download retry, fallback and checksum check.
[FEATURE] Added FLAN-T5 conditioning.
[TASK] Code restructuring.

Text 2 Speech example:

fvzyuMpe.mp4

Assets 2

14 Nov 15:05

Sharrnah

v1.0.7.1

8e45875

v1.0.7.1

Standalone Release File (2.30 GB):

Download Server:

Changelog:

[BUGFIX] translate to speaker if flan-t5 question processing is disabled
[TASK] Added OSC-auto-processing option (To toggle OSC temporarily while app is running)

About FLAN-T5:

flan_process_only_questions and flan_whisper_answer can be enabled, to have FLAN-T5 only answer spoken questions.
That means the from whisperAI recognized text should include a question-typical word and a question-mark.

Since FLAN-T5 can do much more, there might be more possibilities to use this A.I. model in the future.

Assets 2

13 Nov 14:35

Sharrnah

v1.0.7.0

3493c50

v1.0.7.0

Standalone Release File (2.30 GB):

Download Server:

Changelog:

[FEATURE] Added experimental FLAN-T5 AI. supporting automatic answering, continuation to questions or phrases, spoken or written. (see more on https://analyticsindiamag.com/google-ai-introduces-flan-t5-a-new-open-source-language-model/).
[FEATURE] Added LID language classifier for auto-detecting the language of text.
[FEATURE] Added NLLB200 text translator. Supporting around 200 languages in a single model.
[FEATURE] Added config file. (To support more settings without having to add much more Command-line flags)
[FEATURE] Added bottom_align HTML parameter to websocket clients. (To make it easier to align streaming overlays at the bottom of the image)
[TASK] Updated dependencies
[CHANGE] (Breaking change if used as command-line flag!) renamed m2m100_size and m2m100_device to txt_translator_size and txt_translator_device accordingly,

About FLAN-T5:

Since FLAN-T5 can do much more, there might be more possibilities to use this A.I. model in the future.

Assets 2

01 Nov 00:56

Sharrnah

v1.0.5.1

00f4d50

v1.0.5.1

Standalone Release File (2.30 GB):
Download Server:

Changelog:

[FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
[FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
[FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
[FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)
[BUGFIX] Added missing OCR dependency in Standalone Release.

OCR Usage:

Select a window title either with the --ocr_window_name start argument
or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
Select OCR Language in the remote client.
Click on OCR transl..
If the OCR AI model is not already downloaded, it will first download it (might take a bit).
It then tries to focus the window with the title and take a screenshot,
After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

Assets 2

31 Oct 23:11

Sharrnah

v1.0.5

837e611

v1.0.5

Standalone Release File:
~~https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.5_win.zip (2.29 GB)~~

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

Changelog:

[FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
[FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
[FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
[FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)

OCR Usage:

Select a window title either with the --ocr_window_name start argument
or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
Select OCR Language in the remote client.
Click on OCR transl..
If the OCR AI model is not already downloaded, it will first download it (might take a bit).
It then tries to focus the window with the title and take a screenshot,
After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

Assets 2

26 Oct 17:09

Sharrnah

v1.0.4

c39627c

v1.0.4

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.4_win.zip (2.23 GB)

Changelog:

[TASK] Changed default recording sample rate to 16000, since the Whisper AI down-sampled it anyway.
[TASK] Added audio conversion using pydub (should remove ffmpeg dependency and allows audio processing in RAM)
[FEATURE] Added Threaded queue handling for Whisper AI. - This should speed up processing and remove delayed audio recordings.
[FEATURE] Added swap textual translation languages to websocket client.
[FEATURE] Made "condition on previous text" configurable without needing restart.

Assets 2

23 Oct 16:48

Sharrnah

v1.0.3

71a1f0b

v1.0.3

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.3_win.zip (2.21 GB)

Changelog:

[BUGFIX] Attention caching fix for Whisper AI Speed improvement (30% or even more on CPU).
[BUGFIX] open_browser argument with wrong path.
[FEATURE] Option to disable OSC ASCII conversion. (so it does not need a new release if VRC supports non-ASCII)
[FEATURE] Activate typing indicator on audio processing start + send processing start event over websocket.
[FEATURE] Show processing indicator on websocket clients.
[FEATURE] Broadcast setting changes to all websocket clients.
[FEATURE] Added show_transl_results argument to websocket clients to configure display of translations / transcriptions.

Assets 2

21 Oct 10:27

Sharrnah

v1.0.2

5fe9a7a

v1.0.2

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.2_win.zip (2.20 GB)

Assets 2

19 Oct 19:46

Sharrnah

v1.0.1

6f9db3a

v1.0.1

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.1_win.zip (2.20 GB)

Assets 2

18 Oct 18:12

Sharrnah

v1.0.0

997f27b

v1.0.0

Standalone Windows Version (Python + ffmpeg included)
Can be downloaded here:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2FWhispering_win32.zip (2.21 GB)

Only CUDA is recommended to install for GPU acceleration.

See included start-*.bat and get-device-list.bat for how to run it.
(same as mentioned in readme except python audioWhisper.py replaced with audioWhisper\audioWhisper.exe.)

do not run audioWhisper.exe directly, or it will create a new .cache directory and download the whisperAI model again.

websocket_remote/ and websocket_clients/* are included as well.

Read README.md for more infos.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone Release File (2.30 GB):

Changelog:

Text 2 Speech example:

Standalone Release File (2.30 GB):

Changelog:

About FLAN-T5:

Standalone Release File (2.30 GB):

Changelog:

About FLAN-T5:

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

Releases: Sharrnah/whispering

v1.1.0.0

Standalone Release File (2.30 GB):

Changelog:

Text 2 Speech example:

v1.0.7.1

Standalone Release File (2.30 GB):

Changelog:

About FLAN-T5:

v1.0.7.0

Standalone Release File (2.30 GB):

Changelog:

About FLAN-T5:

v1.0.5.1

v1.0.5

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

v1.0.4

v1.0.3

v1.0.2

v1.0.1

v1.0.0