Skip to content

v1.6.0

Compare
Choose a tag to compare
@rotemdan rotemdan released this 04 Oct 06:00
· 159 commits to main since this release

New features

  • Initial support for text-to-text translation (Google Translate engine)
  • openai-cloud STT engine: Support for custom OpenAI API compatible speech-to-text providers, like Groq
  • Support for the new large-v3-turbo Whisper model in both integrated Whisper engine and whisper.cpp engine.
  • Add 6 new VITS voices

Enhancements

  • Whisper (integrated engine): hash seed before using it (ensures seeds like 0, 1, 2, 3, 4 would produce more distinct results)
  • whisper.cpp: use updated builds

Behavioral changes

  • Whisper (integrated engine): on Windows x64, will possibly use GPU accelerated decoding (decoderProvider=dml) for larger models (small*, medium* and large*)
  • alignTimelineTranslation / e5 engine: reduce default DTW window's token count to 20,000 tokens

Removed features

  • whisper.cpp: removed internal package support for cublas-1.8.0, due to build issues with latest VS2022, and very long build times.
  • Removed optional dependency on unused package speaker due to security vulnerabilities, and its native module requirements.
  • whisper: removed option for large model keyword (large-v3-turbo is currently the only one supported).

Fixes

  • When deriving sentence / segment timeline from word timeline, ensure sentences never break within words by temporarily masking potential sentence ending characters in the body of the word. Attempts to resolve issues #67 and #58
  • dtw-ra: when producing an alignment reference for a set of fragments, process the fragments in chunks, rather than all at once (currently uses a maximum of 1000 fragments for each chunk). Should resolve issue #64
  • whisper.cpp: add workaround for rare whisper.cpp issue with missing time offsets by falling back to last known end offset when they are not included. Should resolve issue #65
  • Don't error when DTW length is less than 2 (fixes rare issue with Whisper's internal alignment)
  • Fix logging in timeline translation alignment

Full Changelog: v1.5.0...v1.6.0