Skip to content

v1.5.0

Compare
Choose a tag to compare
@rotemdan rotemdan released this 26 May 15:04
· 188 commits to main since this release

New features

  • Speech-to-transcript-and-translation alignment aligns a translated transcript to the spoken audio with the assistance of the transcript in the original language. Supports 100 source and target languages. It does it uses a two-stage approach: first, conventional alignment is performed between the spoken audio and its native-language transcript. Then, the resulting timeline is aligned to the translated text using cross-language semantic text-to-text alignment
  • Timeline-to-translation alignment accepts a timeline and translated transcript, and performs the second stage independently. This can allow to reuse a previously aligned transcript with multiple translations, or be applied to the timeline output after speech synthesis or recognition

Enhancements

  • Add support for passing cuda as ONNX provider. Latest onnxruntime-node now supports it, but only on Linux (for Windows, use dml - DirectML)

Behavioral changes

  • Passing a subtitle file to synthesis operations now ignores the cues and splits to sentences based on punctuation alone
  • API operations for speech-to-translation now include separate properties for source and target languages

Fixes

  • Timeline uncropping now correctly handles the edge case where a timestamp is higher than the audio duration (this can occur due to rounding or numerical stability)
  • Mel spectrogram conversion now handles the case where a filterbank is wider than the maximum frequency

Full Changelog: v1.4.4...v1.5.0