Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

echogarden-project / echogarden Public

Notifications You must be signed in to change notification settings
Fork 31
Star 297

Code
Issues 31
Pull requests 3
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Releases: echogarden-project/echogarden

Releases · echogarden-project/echogarden

v1.4.2

11 May 21:03

rotemdan

Compare

Choose a tag to compare

Loading

v1.4.2

Fixes

eSpeak: bring back workaround for the second eSpeak bug where ' at the beginning of utterance causing it to omit both start and end markers for the first word

Full Changelog: v1.4.1...v1.4.2

Assets 2

Loading

All reactions

v1.4.1

11 May 20:34

rotemdan

Compare

Choose a tag to compare

Loading

v1.4.1

Fixes

Use a different workaround for eSpeak marker bug. Remove the recent workaround
Specify exact required Node version on package.json

Full Changelog: v1.4.0...v1.4.1

Assets 2

Loading

All reactions

v1.4.0

11 May 12:44

rotemdan

Compare

Choose a tag to compare

Loading

v1.4.0

Enhancements

DTW speech-to-transcript alignment: rework auto-selection of granularities and window durations. By default, it will now use two-pass DTW for audio durations that are 30 minutes or more: First pass with granularity of xx-low and window duration of 20% of the audio duration (up to 2.5 hours, where the first pass window duration remains at 30 minutes). Second pass with granularity of low and window duration of 15 seconds. This significantly reduces memory usage and processing time for multi-hour audio.
Reword some log messages

Fixes

DTW sequence alignment: try to handle "jump transitions" between windows more effectively. This happens when there are consecutive columns with windows having no overlapping rows, so it's not possible to directly "step" between them. Remove warnings like all cost directions are equal to infinity that previously happened in these situations
eSpeak-NG: work around eSpeak issue where isolated ' characters (not part of a word) at the start of the utterance are causing markers to be omitted and errors / crashes to be produced

Behavioral changes

DTW speech alignment: the auto granularity identifier has been removed.

Full Changelog: v1.3.3...v1.4.0

Assets 2

Loading

All reactions

v1.3.3

09 May 09:44

rotemdan

Compare

Choose a tag to compare

Loading

v1.3.3

Enhancements

Attempt to reduce memory requirements of large wave file conversions by allowing the GC to collect unused buffers where possible
Whisper: prevent auto-prompting next part when current part is repetitive. Adds new option whisper.repetitionThreshold
Reword some log messages for better uniformity

Full Changelog: v1.3.2...v1.3.3

Assets 2

Loading

All reactions

v1.3.2

08 May 10:28

rotemdan

Compare

Choose a tag to compare

Loading

v1.3.2

Enhancements

Improve audio format conversion speed by using faster operations like Buffer.copyBytesFrom (supported in Node v18.16.0 or newer). The faster conversions assume the underlying architecture is little-endian - otherwise they wouldn't work correctly. That's okay, since big-endian architectures aren't supported anyway.

Full Changelog: v1.3.1...v1.3.2

Assets 2

Loading

All reactions

v1.3.1

08 May 06:57

rotemdan

Compare

Choose a tag to compare

Loading

v1.3.1

Fixes

Apply the Speex resampler in 1 MiB chunks instead of copying the entire audio to WASM memory (which has a limit of 1GiB - 4GiB). The resampler can now process any length of audio
Add support for decoding Wave files or buffers that are larger than 4GiB, by ignoring the length field for RIFF and data chunks when their size is exactly 4294967295. Similarly, the wave encoder can now create these large files or buffers by clipping the overflowing chunk's length fields to 4294967295
Add some missing log messages, and remove / rewrite some others

Behavioral changes

Remove --no-experimental-fetch Node flag since fetch seems to be working correctly on Node v18.20.0+

Full Changelog: v1.3.0...v1.3.1

Assets 2

Loading

All reactions

v1.3.0

02 May 11:19

rotemdan

Compare

Choose a tag to compare

Loading

v1.3.0

Enhancements

Accept language codes in multiple formats. Currently supports ISO 639-1 (example: es, es-MX), ISO 639-2 (example: spa), and full English language names (spanish)
Whisper: when no matching language was found, include the exact provided language identifier to reduce confusion about language support

Fixes

Recognition / alignment / translation: timing for words that overlap with non-speech regions is now truncated based on the voice-activity region where the overlap is greatest. If the processing is done over audio that has been cropped using VAD, it can cause an upcoming word to appear too early, or extend too much, before/after a non-speech region, causing the timing to be inaccurate near the region boundaries. This tries to fix that, by, during the uncropping of the timeline, ensuring that words can only span a single active voice region (selected according to maximum overlap), preventing time ranges to be over-extended
Fix whisper.cpp speech-to-text translation not including word offsets

Behavioral changes

Remove pico and flite being used as default synthesis engines in some languages (pico is never actually selected and flite uses WASI which appears to have segmentation fault issues in Node versions 20, 21, and 22

Full Changelog: v1.2.1...v1.3.0

Assets 2

Loading

All reactions

v1.2.1

28 Apr 19:01

rotemdan

Compare

Choose a tag to compare

Loading

v1.2.1

Fixes

Fix regression with CLI not starting without a configuration file.

Full Changelog: v1.2.0...v1.2.1

Assets 2

Loading

All reactions

v1.2.0

25 Apr 10:21

rotemdan

Compare

Choose a tag to compare

Loading

v1.2.0

New features

Add global option to adjust the amount of log messages during runtime (logLevel)
Add global option to set a custom remote package repository base URL (packageBaseURL). The default base URL is currently https://huggingface.co/echogarden/echogarden-packages/resolve/main/. If Hugging Face isn't accessible in your location, you can replace huggingface.co by a mirror domain like hf-mirror.com, or use any other custom address
Allow to set global options via the CLI
Add support for global and common CLI options to the configuration file

Fixes

Whisper: more tuning to try to avoid repetitive and cut-off recognitions

Full Changelog: v1.1.1...v1.2.0

Assets 2

Loading

All reactions

v1.1.1

23 Apr 08:07

rotemdan

Compare

Choose a tag to compare

Loading

v1.1.1

Fixes

Whisper alignment engine: fix issue with target language not correctly passed when performing word segmentation. Should now apply correct word segmentation for Chinese and Japanese on the whisper alignment engine
Fix missing log messages on alignment operations

Full Changelog: v1.1.0...v1.1.1

Assets 2

Loading

All reactions

Previous 1 2 3 4 5 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.