Releases: echogarden-project/echogarden
Releases · echogarden-project/echogarden
v1.4.2
Fixes
- eSpeak: bring back workaround for the second eSpeak bug where
'
at the beginning of utterance causing it to omit both start and end markers for the first word
Full Changelog: v1.4.1...v1.4.2
v1.4.1
Fixes
- Use a different workaround for eSpeak marker bug. Remove the recent workaround
- Specify exact required Node version on
package.json
Full Changelog: v1.4.0...v1.4.1
v1.4.0
Enhancements
- DTW speech-to-transcript alignment: rework auto-selection of granularities and window durations. By default, it will now use two-pass DTW for audio durations that are 30 minutes or more: First pass with granularity of
xx-low
and window duration of 20% of the audio duration (up to 2.5 hours, where the first pass window duration remains at 30 minutes). Second pass with granularity oflow
and window duration of 15 seconds. This significantly reduces memory usage and processing time for multi-hour audio. - Reword some log messages
Fixes
- DTW sequence alignment: try to handle "jump transitions" between windows more effectively. This happens when there are consecutive columns with windows having no overlapping rows, so it's not possible to directly "step" between them. Remove warnings like
all cost directions are equal to infinity
that previously happened in these situations - eSpeak-NG: work around eSpeak issue where isolated
'
characters (not part of a word) at the start of the utterance are causing markers to be omitted and errors / crashes to be produced
Behavioral changes
- DTW speech alignment: the
auto
granularity identifier has been removed.
Full Changelog: v1.3.3...v1.4.0
v1.3.3
Enhancements
- Attempt to reduce memory requirements of large wave file conversions by allowing the GC to collect unused buffers where possible
- Whisper: prevent auto-prompting next part when current part is repetitive. Adds new option
whisper.repetitionThreshold
- Reword some log messages for better uniformity
Full Changelog: v1.3.2...v1.3.3
v1.3.2
Enhancements
- Improve audio format conversion speed by using faster operations like
Buffer.copyBytesFrom
(supported in Nodev18.16.0
or newer). The faster conversions assume the underlying architecture is little-endian - otherwise they wouldn't work correctly. That's okay, since big-endian architectures aren't supported anyway.
Full Changelog: v1.3.1...v1.3.2
v1.3.1
Fixes
- Apply the Speex resampler in 1 MiB chunks instead of copying the entire audio to WASM memory (which has a limit of 1GiB - 4GiB). The resampler can now process any length of audio
- Add support for decoding Wave files or buffers that are larger than 4GiB, by ignoring the length field for
RIFF
anddata
chunks when their size is exactly4294967295
. Similarly, the wave encoder can now create these large files or buffers by clipping the overflowing chunk's length fields to4294967295
- Add some missing log messages, and remove / rewrite some others
Behavioral changes
- Remove
--no-experimental-fetch
Node flag sincefetch
seems to be working correctly on Nodev18.20.0
+
Full Changelog: v1.3.0...v1.3.1
v1.3.0
Enhancements
- Accept language codes in multiple formats. Currently supports ISO 639-1 (example:
es
,es-MX
), ISO 639-2 (example:spa
), and full English language names (spanish
) Whisper
: when no matching language was found, include the exact provided language identifier to reduce confusion about language support
Fixes
- Recognition / alignment / translation: timing for words that overlap with non-speech regions is now truncated based on the voice-activity region where the overlap is greatest. If the processing is done over audio that has been cropped using VAD, it can cause an upcoming word to appear too early, or extend too much, before/after a non-speech region, causing the timing to be inaccurate near the region boundaries. This tries to fix that, by, during the uncropping of the timeline, ensuring that words can only span a single active voice region (selected according to maximum overlap), preventing time ranges to be over-extended
- Fix
whisper.cpp
speech-to-text translation not including word offsets
Behavioral changes
- Remove
pico
andflite
being used as default synthesis engines in some languages (pico
is never actually selected andflite
uses WASI which appears to have segmentation fault issues in Node versions20
,21
, and22
Full Changelog: v1.2.1...v1.3.0
v1.2.1
Fixes
- Fix regression with CLI not starting without a configuration file.
Full Changelog: v1.2.0...v1.2.1
v1.2.0
New features
- Add global option to adjust the amount of log messages during runtime (
logLevel
) - Add global option to set a custom remote package repository base URL (
packageBaseURL
). The default base URL is currentlyhttps://huggingface.co/echogarden/echogarden-packages/resolve/main/
. If Hugging Face isn't accessible in your location, you can replacehuggingface.co
by a mirror domain likehf-mirror.com
, or use any other custom address - Allow to set global options via the CLI
- Add support for global and common CLI options to the configuration file
Fixes
- Whisper: more tuning to try to avoid repetitive and cut-off recognitions
Full Changelog: v1.1.1...v1.2.0
v1.1.1
Fixes
- Whisper alignment engine: fix issue with target language not correctly passed when performing word segmentation. Should now apply correct word segmentation for Chinese and Japanese on the
whisper
alignment engine - Fix missing log messages on alignment operations
Full Changelog: v1.1.0...v1.1.1