v1.6.0
New features
- Initial support for text-to-text translation (Google Translate engine)
openai-cloud
STT engine: Support for custom OpenAI API compatible speech-to-text providers, like Groq- Support for the new
large-v3-turbo
Whisper model in both integrated Whisper engine andwhisper.cpp
engine. - Add 6 new VITS voices
Enhancements
- Whisper (integrated engine): hash seed before using it (ensures seeds like 0, 1, 2, 3, 4 would produce more distinct results)
whisper.cpp
: use updated builds
Behavioral changes
- Whisper (integrated engine): on Windows x64, will possibly use GPU accelerated decoding (
decoderProvider=dml
) for larger models (small*
,medium*
andlarge*
) alignTimelineTranslation
/e5
engine: reduce default DTW window's token count to 20,000 tokens
Removed features
whisper.cpp
: removed internal package support forcublas-1.8.0
, due to build issues with latest VS2022, and very long build times.- Removed optional dependency on unused package
speaker
due to security vulnerabilities, and its native module requirements. whisper
: removed option forlarge
model keyword (large-v3-turbo
is currently the only one supported).
Fixes
- When deriving sentence / segment timeline from word timeline, ensure sentences never break within words by temporarily masking potential sentence ending characters in the body of the word. Attempts to resolve issues #67 and #58
dtw-ra
: when producing an alignment reference for a set of fragments, process the fragments in chunks, rather than all at once (currently uses a maximum of 1000 fragments for each chunk). Should resolve issue #64whisper.cpp
: add workaround for rarewhisper.cpp
issue with missing time offsets by falling back to last known end offset when they are not included. Should resolve issue #65- Don't error when DTW length is less than 2 (fixes rare issue with Whisper's internal alignment)
- Fix logging in timeline translation alignment
Full Changelog: v1.5.0...v1.6.0