Skip to content

Releases: Sharrnah/whispering

v1.3.15.4

02 Feb 00:21
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.15.3)

  • [FEATURE] Add get_last_generation methods for TTS
  • [TASK] Add Greek F5 TTS Model
  • [TASK] Add silence after segments of F5-TTS generation
  • [TASK] F5 processing estimate for multi-segments
  • [TASK] Update libraries + fix for nltk
  • [TASK] Add pyctcdecode library
  • [TASK] Add normalization to F5 TTS
  • [TASK] unified tts event call
  • [BUGFIX] Channel error on MME Audio API with Silero
  • [BUGFIX] websocket disconnect on receiving generated TTS raw audio
  • [BUGFIX] TTS Model download not starting on model change

Full Changelog: v1.3.15.2...v1.3.15.4

v1.3.15.2

12 Jan 21:56
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.15.2)

  • [TASK] Add setting only_no_speech_threshold_for_segments
  • [TASK] Select random download url on fallback code
  • [TASK] Add multilingual detection per segment for faster_whisper
  • [BUGFIX] typo in setting transcription_auto_save_continuous_text
  • [BUGFIX] Loading models with bfloat precisions
  • [BUGFIX] Interleave support for more than 2 channels
  • [BUGFIX] Custom model loading (faster whisper)
  • [BUGFIX] fixed distilled-large-v3 model for faster-whisper

Full Changelog: v1.3.15.1...v1.3.15.2

v1.3.15.1

05 Jan 23:30
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.15.1)

  • [FEATURE] Update F5-TTS + add new F5-TTS language models (English, Chinese, French, German, Italian, Japanese, Spanish, Russian, Vietnamese, Malaysian)
  • [FEATURE] Add option to change TTS volume
  • [BUGFIX] Fix order of model loading
  • [BUGFIX] Use configured F5-TTS compute device
  • [BUGFIX] BigVGAN loading for F5-TTS
  • [BUGFIX] module import order

Full Changelog: v1.3.14.8...v1.3.15.1

v1.3.14.8

24 Dec 20:20
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.14.8)

  • [FEATURE] Add F5 TTS
  • [FEATURE] Add option to translate to more than one target language
  • [FEATURE] Add OSC Server to synchronize with VRChat Mute state
  • [FEATURE] Add support to load a user custom model
  • [TASK] Add reload voices event
  • [TASK] Update dependencies
  • [TASK] Initialize TTS after UI connected
  • [TASK] Only send source + translation if both actually exist
  • [TASK] remove direct-ml for linux
  • [TASK] Add large-v3-turbo model for faster-whisper
  • [TASK] Open playback audio device directly with detected informations instead of trying multiple options
  • [TASK] return audio segments in faster whisper
  • [TASK] additional translation improvements
  • [TASK] Upadate ctranslate library
  • [BUGFIX] use defined exclude_client for BroadcastMessage
  • [BUGFIX] Add possible stream playback fix
  • [BUGFIX] Add linux build portaudio dependency
  • [BUGFIX] Return correct download status on fallback download
  • [BUGFIX] Error if invalid F5/E5 model is requested

Full Changelog: v1.3.14.6...v1.3.14.8

v1.3.14.6

21 Aug 00:19
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.2 GB):

Download Server:

Changelog (v1.3.14.6)

  • [FEATURE] Add stt_processing plugin function
  • [TASK] Return detected language from transformer whisper
  • [TASK] Update transformers library
  • [BUGFIX] NLLB200 transformers based implementation
  • [BUGFIX] Calculate correct chunk size for VAD v3 model (Fixes #26)
  • [BUGFIX] vad_frames_per_buffer validity check
  • [BUGFIX] API of M4T and NLLB-200 models with newer transformers library version
  • [BUGFIX] disabled VAD error "KeyError: 'plugins'"

Full Changelog: v1.3.14.5...v1.3.14.6

v1.3.14.5

20 Jul 20:20
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.2 GB):

Download Server:

Changelog (v1.3.14.5)

  • [FEATURE] Direct-ML support. (Should allow to run many AI models on any DirectX 12 comatible GPU, including Intel and AMD)
  • [TASK] Update dependencies
  • [BUGFIX] Return original text in case TranslateLanguage thinks a text-translator is active but fails.
  • [BUGFIX] voice markers reusing the initial audio data every time.

Full Changelog: v1.3.14.4...v1.3.14.5

v1.3.14.4

10 Jul 12:13
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.14.4)

  • [FEATURE] Write TTS result to file directly if path is provided.
  • [TASK] Enable 'thread_per_transcription' by default again.
  • [TASK] Show traceback on plugin error.
  • [TASK] play_audio supporting bytes, torch.Tensor or numpy array
  • [TASK] Add frozendict library (used by ChatTTS plugin)

Full Changelog: v1.3.14.2...v1.3.14.4

v1.3.14.2

05 Jul 15:12
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.14.2)

  • [FEATURE] Support for audio with more than 2 channels.
  • [FEATURE] Add MMS STT model
  • [FEATURE] clipboard image OCR support
  • [FEATURE] Add select_audio widget for Plugins
  • [FEATURE] Add textfield widget type
  • [FEATURE] Add Speaker diarization class (experimental)
  • [FEATURE] Add noisereduce algorythm
  • [TASK] Improve streamed audio playback
  • [TASK] use romaji setting for translation requests
  • [TASK] Update ignorelist
  • [TASK] add playback hook, simplify buffer size setting
  • [TASK] Separation of audio processing for recording
  • [TASK] Add get languages plugin method
  • [TASK] Update dependencies
  • [TASK] remove downloaded zip renaming
  • [TASK] Add multiple file hash check utility function
  • [TASK] Send loading message over stdout instead of websocket
  • [TASK] Add plugin name to plugin errors
  • [TASK] Upgrade dependencies + VAD model to v5
  • [BUGFIX] catch plugin exceptions to not break whole application
  • [BUGFIX] Fix possible process management error if process could not be run
  • [BUGFIX] error on modified value in websocket message
  • [BUGFIX] streamed playback of dynamic chunk size
  • [BUGFIX] tagged streamed playback
  • [BUGFIX] buffer element size calculation.
  • [BUGFIX] Wait for resampling until full chunk is ready for streamed playback
  • [BUGFIX] resample_audio function on gpu tensors, reshaping audio data
  • [BUGFIX] Faster whisper handling of non avialable precision model files
  • [BUGFIX] plugin on_*_call calls not returning anything.

Full Changelog: v1.3.13.1...v1.3.14.2

v1.3.13.1

28 Feb 12:26
a5c59a9
Compare
Choose a tag to compare

Important:

This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.


Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.13.1)

  • [FEATURE] Add seamlessM4T v2 model
  • [FEATURE] Add Wav2Vec Bert2.0 STT Models
  • [FEATURE] Add TextCorrection model (for Wav2Vec Bert2.0 models)
  • [FEATURE] Add Whisper using Transformer library
  • [FEATURE] Add NVIDIA NeMo Canary STT model
  • [FEATURE] Add streaming overlay03 with 2 columns
  • [FEATURE] plugin event methods
  • [TASK] Load settings from Profile folder
  • [TASK] Support more datatypes in audio processing methods
  • [TASK] use pyaudio pool for audio streamer playback
  • [TASK] Update libraries
  • [TASK] Add annotated-types lib to build
  • [TASK] Switch to CUDA 12.1
  • [TASK] split pytorch requirements
  • [TASK] Add triton for windows again
  • [TASK] Change default vad_frames_per_buffer value
  • [TASK] Add TTS playback over html to streaming overlay03
  • [TASK] Make Whisper Voice Marker class a singleton
  • [TASK] Process transcription in single thread by default
  • [BUGFIX] use filename from provided url instead of last redirect
  • [BUGFIX] Audio processing without VAD
  • [BUGFIX] Multiprocess tasks running main python code
  • [BUGFIX] loading whisper large-v3 in different precisions
  • [BUGFIX] download whisper large-v3 when set to float32
  • [BUGFIX] allow downloads without checksum

Full Changelog: v1.3.12.2...v1.3.13.1

v1.3.12.2

23 Nov 18:23
Compare
Choose a tag to compare

Standalone Release File (3.1 GB):

Download Server:

Changelog (v1.3.12.2)

  • [FEATURE] Add Whisper V3 Support
  • [FEATURE] Add Whisper Distilled Support
  • [FEATURE] Add Option to write continuesly transcriptions to file (transcription_auto_save_continous_text setting)
  • [FEATURE] Add Option to write the audio file of each final transcription (transcription_save_audio_dir setting)
  • [FEATURE] Add buffered streamed audio playback
  • [TASK] Replaced Icon
  • [TASK] Replaced download library
  • [TASK] Add grpcio library
  • [TASK] Update omegaconf
  • [TASK] Made DeepFilterNet model class a singleton

Full Changelog: v1.3.12.1...v1.3.12.2