Whisper Edge

Porting OpenAI Whisper speech recognition to edge devices with hardware ML accelerators, enabling always-on live voice transcription. Current work includes Jetson Nano and Coral Edge TPU.

Jetson Nano

Shopping cart

Part	Price (2023)
NVIDIA Jetson Nano Developer Kit (4G)	$149.00
ChanGeek CGS-M1 USB Microphone	$16.99
Noctua NF-A4x10 5V Fan (or similar, recommended)	$13.95
D-Link DWA-181 Wi-Fi Adapter (or similar, optional)	$21.94

Model

The base.en version of Whisper seems to work best for the Jetson Nano:

base is the largest model size that fits into the 4GB of memory without modification.
Inference performance with base is ~10x real-time in isolation and ~1x real-time while recording concurrently.
Using the english-only .en version further improves WER (<5% on LibriSpeech test-clean).

Hack

Dilemma:

Whisper and some of its dependencies require Python 3.8.
The latest supported version of JetPack for Jetson Nano is 4.6.3, which is on Python 3.6.
No easy way to update Python to 3.8 without losing CUDA support for PyTorch.

Workaround:

Fork whisper and tiktoken, downgrading them to Python 3.6.

Setup

USB Serial

Attach the Jetson Nano to your computer via USB and get a shell, e.g. with screen on Linux:

screen /dev/ttyUSB0 115200

Or with PuTTY on Windows.

You'll be prompted to log in with the default credentials:

login: alex
password: arribada

SSH

First, follow the developer kit setup instructions, connect the Wi-Fi adapter and the microphone to USB, and ideally install a fan. (Also plugging in an Ethernet cable helps to make the downloads faster.) Then, get a shell on the Jetson Nano:

ssh [email protected]

Build

For the demo, the container should already be built. You can skip this step and proceed to Run.

We will use NVIDIA Docker containers to run inference. Get the source code and build the custom container:

git clone https://github.com/arribada/whisper-edge-demo.git whisper-edge-arribada
bash whisper-edge-arribada/build.sh

Run

Launch inference:

bash whisper-edge-arribada/run.sh

You should see console output similar to this:

I0317 00:42:23.979984 547488051216 stream.py:75] Loading model "base.en"...
100%|#######################################| 139M/139M [00:30<00:00, 4.71MiB/s]
I0317 00:43:14.232425 547488051216 stream.py:79] Warming model up...
I0317 00:43:55.164070 547488051216 stream.py:86] Starting stream...
I0317 00:44:19.775566 547488051216 stream.py:51]
I0317 00:44:22.046195 547488051216 stream.py:51] 
I0317 00:44:49.219501 547488051216 stream.py:51] Start speaking now to see the transcription!

Below is a script for demoing the transcription in real-time:

As the sun set, I couldn't help but admire the dolphins jumping out of the water, with seagulls flying overhead. 
It's a beautiful scene, but there's a problem on my mind: bycatch. 
You see, I'm a fisherman, and my family depends on our daily catch. 
But sometimes, our nets unintentionally trap dolphins, whales, and other creatures, instead of the sharks and seals we're targeting.

This script will highlight the keywords programmed into this demo in green.

The stream.py script run in the container accepts flags for different configurations (the default flags should work for the demo):

bash whisper-edge-arribada/run.sh --help

       USAGE: stream.py [flags]
flags:

stream.py:
  --channel_index: The index of the channel to use for transcription.
    (default: '0')
    (an integer)
  --chunk_seconds: The length in seconds of each recorded chunk of audio.
    (default: '10')
    (an integer)
  --input_device: The input device used to record audio.
    (default: 'plughw:2,0')
  --language: The language to use or empty to auto-detect.
    (default: 'en')
  --latency: The latency of the recording stream.
    (default: 'low')
  --model_name: The version of the OpenAI Whisper model to use.
    (default: 'base.en')
  --num_channels: The number of channels of the recorded audio.
    (default: '1')
    (an integer)
  --sample_rate: The sample rate of the recorded audio.
    (default: '16000')
    (an integer)

Try --helpfull to get a list of all flags.

Troubleshooting

To see if the microphone is working properly, use alsa-utils:

sudo apt-get -y install alsa-utils

# Is the USB device connected?
lsusb

# Is the correct recording device selected?
arecord -l

# Is the gain set properly?
alsamixer

# Does a test recording work?
arecord --format=S16_LE --duration=5 --rate=16000 --channels=1 --device=plughw:2,0 test.wav

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
media		media
test		test
.gitmodules		.gitmodules
Dockerfile.jetson-nano		Dockerfile.jetson-nano
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
dictionary.json		dictionary.json
run.sh		run.sh
stream.py		stream.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Edge

Jetson Nano

Shopping cart

Model

Hack

Setup

USB Serial

SSH

Build

Run

Troubleshooting

About

Releases

Packages

Languages

License

arribada/whisper-edge-demo

Folders and files

Latest commit

History

Repository files navigation

Whisper Edge

Jetson Nano

Shopping cart

Model

Hack

Setup

USB Serial

SSH

Build

Run

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages