Skip to content

Releases: echogarden-project/echogarden

v1.4.2

11 May 21:03
Compare
Choose a tag to compare

Fixes

  • eSpeak: bring back workaround for the second eSpeak bug where ' at the beginning of utterance causing it to omit both start and end markers for the first word

Full Changelog: v1.4.1...v1.4.2

v1.4.1

11 May 20:34
Compare
Choose a tag to compare

Fixes

  • Use a different workaround for eSpeak marker bug. Remove the recent workaround
  • Specify exact required Node version on package.json

Full Changelog: v1.4.0...v1.4.1

v1.4.0

11 May 12:44
Compare
Choose a tag to compare

Enhancements

  • DTW speech-to-transcript alignment: rework auto-selection of granularities and window durations. By default, it will now use two-pass DTW for audio durations that are 30 minutes or more: First pass with granularity of xx-low and window duration of 20% of the audio duration (up to 2.5 hours, where the first pass window duration remains at 30 minutes). Second pass with granularity of low and window duration of 15 seconds. This significantly reduces memory usage and processing time for multi-hour audio.
  • Reword some log messages

Fixes

  • DTW sequence alignment: try to handle "jump transitions" between windows more effectively. This happens when there are consecutive columns with windows having no overlapping rows, so it's not possible to directly "step" between them. Remove warnings like all cost directions are equal to infinity that previously happened in these situations
  • eSpeak-NG: work around eSpeak issue where isolated ' characters (not part of a word) at the start of the utterance are causing markers to be omitted and errors / crashes to be produced

Behavioral changes

  • DTW speech alignment: the auto granularity identifier has been removed.

Full Changelog: v1.3.3...v1.4.0

v1.3.3

09 May 09:44
Compare
Choose a tag to compare

Enhancements

  • Attempt to reduce memory requirements of large wave file conversions by allowing the GC to collect unused buffers where possible
  • Whisper: prevent auto-prompting next part when current part is repetitive. Adds new option whisper.repetitionThreshold
  • Reword some log messages for better uniformity

Full Changelog: v1.3.2...v1.3.3

v1.3.2

08 May 10:28
Compare
Choose a tag to compare

Enhancements

  • Improve audio format conversion speed by using faster operations like Buffer.copyBytesFrom (supported in Node v18.16.0 or newer). The faster conversions assume the underlying architecture is little-endian - otherwise they wouldn't work correctly. That's okay, since big-endian architectures aren't supported anyway.

Full Changelog: v1.3.1...v1.3.2

v1.3.1

08 May 06:57
Compare
Choose a tag to compare

Fixes

  • Apply the Speex resampler in 1 MiB chunks instead of copying the entire audio to WASM memory (which has a limit of 1GiB - 4GiB). The resampler can now process any length of audio
  • Add support for decoding Wave files or buffers that are larger than 4GiB, by ignoring the length field for RIFF and data chunks when their size is exactly 4294967295. Similarly, the wave encoder can now create these large files or buffers by clipping the overflowing chunk's length fields to 4294967295
  • Add some missing log messages, and remove / rewrite some others

Behavioral changes

  • Remove --no-experimental-fetch Node flag since fetch seems to be working correctly on Node v18.20.0+

Full Changelog: v1.3.0...v1.3.1

v1.3.0

02 May 11:19
Compare
Choose a tag to compare

Enhancements

  • Accept language codes in multiple formats. Currently supports ISO 639-1 (example: es, es-MX), ISO 639-2 (example: spa), and full English language names (spanish)
  • Whisper: when no matching language was found, include the exact provided language identifier to reduce confusion about language support

Fixes

  • Recognition / alignment / translation: timing for words that overlap with non-speech regions is now truncated based on the voice-activity region where the overlap is greatest. If the processing is done over audio that has been cropped using VAD, it can cause an upcoming word to appear too early, or extend too much, before/after a non-speech region, causing the timing to be inaccurate near the region boundaries. This tries to fix that, by, during the uncropping of the timeline, ensuring that words can only span a single active voice region (selected according to maximum overlap), preventing time ranges to be over-extended
  • Fix whisper.cpp speech-to-text translation not including word offsets

Behavioral changes

  • Remove pico and flite being used as default synthesis engines in some languages (pico is never actually selected and flite uses WASI which appears to have segmentation fault issues in Node versions 20, 21, and 22

Full Changelog: v1.2.1...v1.3.0

v1.2.1

28 Apr 19:01
Compare
Choose a tag to compare

Fixes

  • Fix regression with CLI not starting without a configuration file.

Full Changelog: v1.2.0...v1.2.1

v1.2.0

25 Apr 10:21
Compare
Choose a tag to compare

New features

  • Add global option to adjust the amount of log messages during runtime (logLevel)
  • Add global option to set a custom remote package repository base URL (packageBaseURL). The default base URL is currently https://huggingface.co/echogarden/echogarden-packages/resolve/main/. If Hugging Face isn't accessible in your location, you can replace huggingface.co by a mirror domain like hf-mirror.com, or use any other custom address
  • Allow to set global options via the CLI
  • Add support for global and common CLI options to the configuration file

Fixes

  • Whisper: more tuning to try to avoid repetitive and cut-off recognitions

Full Changelog: v1.1.1...v1.2.0

v1.1.1

23 Apr 08:07
Compare
Choose a tag to compare

Fixes

  • Whisper alignment engine: fix issue with target language not correctly passed when performing word segmentation. Should now apply correct word segmentation for Chinese and Japanese on the whisper alignment engine
  • Fix missing log messages on alignment operations

Full Changelog: v1.1.0...v1.1.1