Release v1.5.0 · echogarden-project/echogarden

New features

Speech-to-transcript-and-translation alignment aligns a translated transcript to the spoken audio with the assistance of the transcript in the original language. Supports 100 source and target languages. It does it uses a two-stage approach: first, conventional alignment is performed between the spoken audio and its native-language transcript. Then, the resulting timeline is aligned to the translated text using cross-language semantic text-to-text alignment
Timeline-to-translation alignment accepts a timeline and translated transcript, and performs the second stage independently. This can allow to reuse a previously aligned transcript with multiple translations, or be applied to the timeline output after speech synthesis or recognition

Add support for passing cuda as ONNX provider. Latest onnxruntime-node now supports it, but only on Linux (for Windows, use dml - DirectML)

Passing a subtitle file to synthesis operations now ignores the cues and splits to sentences based on punctuation alone
API operations for speech-to-translation now include separate properties for source and target languages

Timeline uncropping now correctly handles the edge case where a timestamp is higher than the audio duration (this can occur due to rounding or numerical stability)
Mel spectrogram conversion now handles the case where a filterbank is wider than the maximum frequency

Full Changelog: v1.4.4...v1.5.0