Skip to content

A Speech-To-Text (with translation) library and tools; currently based on Whisper (runs locally if needed; no need in any API keys); initially is written to add auto-translation subtitles to OBS

License

Notifications You must be signed in to change notification settings

xaionaro-go/speech

Repository files navigation

speech

This is a library for Speech-To-Text operations in Go, and 2 tools to demonstrate how to use it:

  • ./cmd/stt -- just converts a PCM_F32_LE stream into transcription (by default translated to English).
  • ./cmd/subtitleswindow/main.go -- is a window that dynamically displays subtitles to be used in OBS, given a link to the audio stream (e.g. your RTMP link).

Currently, we provide API for using Whisper directly and/or for using whisper via HTTP API.

Quick start

Install dependencies

apt install -y libavcodec-dev libavformat-dev libavfilter-dev libavdevice-dev libswscale-dev libsrt-openssl-dev libssl-dev libasound2-dev libxxf86vm-dev make cmake nvidia-cuda-toolkit cuda-toolkit-12-2 libxcursor-dev libxrandr-dev libxinerama-dev libxi-dev

stt

If you use Linux:

WHISPER_MODEL=medium ENABLE_CUDA=true make example-stt

(keep in mind: the larger model is the more time it takes to warm up)

It will start listening the microphone, and you can start speaking. It should print the translation of your speech to English.

For example in my case:

WHISPER_MODEL=medium ENABLE_CUDA=true make example-stt
[...a lot of log...]
   23.3s -    26.3s:  Hello.
     28s -    30.6s:  This is just a demonstration that the thing works properly.
   30.7s -    37.9s:  And somehow it does work properly, which is weird.

subtitleswindow

Run:

ENABLE_CUDA=true make example-subtitleswindow

It should start a window that automatically translates your speech (it listens the microphone):

Subtitles Window screenshot

If you need to translate a speech given an RTMP/RTSP/SRT/whatever link, then:

Build:

ENABLE_CUDA=true make subtitleswindow-linux-amd64

Run:

(cd ./thirdparty/whisper.cpp && ./models/download-ggml-model.sh medium)
./build/subtitleswindow-linux-amd64 thirdparty/whisper.cpp/models/ggml-medium.bin rtmp://my.server:1935/myapp/mystream/

A window will pop up, and you'll see that it displays the most recent transcriptions. You can add this window in OBS to have live translation of your speech on your stream screen:

subtitleswindow with computing on a remote server

On the remote server run:

ENABLE_CUDA=true make sttd-linux-amd64
(cd ./thirdparty/whisper.cpp && ./models/download-ggml-model.sh large-v3)
./build/sttd-linux-amd64 --log-level trace 0.0.0.0:1234 --default-model-file thirdparty/whisper.cpp/models/ggml-large-v3.bin

Now on your local computer run:

make subtitleswindow-linux-amd64
./build/subtitleswindow-linux-amd64 --remote-addr address-of-my-remote-server:1234 --translate=true ''

About

A Speech-To-Text (with translation) library and tools; currently based on Whisper (runs locally if needed; no need in any API keys); initially is written to add auto-translation subtitles to OBS

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published