Skip to content

Commit

Permalink
Merge pull request #41 from tokk-nv/dev-whisper
Browse files Browse the repository at this point in the history
Add Whisper tutorial
  • Loading branch information
tokk-nv authored Nov 7, 2023
2 parents 4b2e136 + 1e00648 commit 988bb70
Show file tree
Hide file tree
Showing 9 changed files with 95 additions and 0 deletions.
Binary file added docs/images/Chrome_ERR_CERT.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/Chrome_ERR_CERT_after_advanced.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/whisper_ipywebrtc_widget.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/whisper_jupyterlab_notebooks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/whisper_microphone_access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/whisper_transcribe_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/whisper_web_setting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
94 changes: 94 additions & 0 deletions docs/tutorial_whisper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Tutorial - Whisper

Let's run OpenAI's [Whisper](https://github.com/openai/whisper), pre-trained model for automatic speech recognition on Jetson!

!!! abstract "What you need"

1. One of the following Jetson:

<span class="blobDarkGreen4">Jetson AGX Orin 64GB</span>
<span class="blobDarkGreen5">Jetson AGX Orin (32GB)</span>
<span class="blobLightGreen4">Jetson Orin Nano Orin (8GB)</span>

2. Running one of the following [JetPack.5x](https://developer.nvidia.com/embedded/jetpack)

<span class="blobPink1">JetPack 5.1.2 (L4T r35.4.1)</span>
<span class="blobPink2">JetPack 5.1.1 (L4T r35.3.1)</span>
<span class="blobPink3">JetPack 5.1 (L4T r35.2.1)</span>

3. Sufficient storage space (preferably with NVMe SSD).

- `6.1 GB` for `whisper` container image
- Space for checkpoints

## Clone and set up `jetson-containers`

```
git clone https://github.com/dusty-nv/jetson-containers
cd jetson-containers
sudo apt update; sudo apt install -y python3-pip
pip3 install -r requirements.txt
```
## How to start

Use `run.sh` and `autotag` script to automatically pull or build a compatible container image.

```
cd jetson-containers
./run.sh $(./autotag whisper)
```

The container has a default run command (`CMD`) that will automatically start the Jupyter Lab server, with SSL enabled.

Open your browser and access `https://<IP_ADDRESS>:8888`.

!!! attention

Note it is **`https`** (not `http`).

HTTPS (SSL) connection is needed to allow `ipywebrtc` widget to have access to your microphone (for `record-and-transcribe.ipynb`).


You will see a warning message like this.

![](./images/Chrome_ERR_CERT.png){: style="width:50%"}

Press "**Advanced**" button and then click on "**Proceed to <IP_ADDRESS> (unsafe)**" link to proceed to the Jupyter Lab web interface.

![](./images/Chrome_ERR_CERT_after_advanced.png){: style="width:50%"}

> The default password for Jupyter Lab is `nvidia`.
## Run Jupyter notebooks

Whisper repo comes with demo Jupyter notebooks, which you can find under `/notebooks/` directory.

`jetson-containers` also adds one convenient notebook (`record-and-transcribe.ipynb`) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.

![](./images/whisper_jupyterlab_notebooks.png)

### `record-and-transcribe.ipynb`

This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's `medium` model to transcribe the audio sample.

It uses Jupyter notebook/lab's `ipywebrtc` extension to record an audio sample on your web browser.

![](./images/whisper_ipywebrtc_widget.png)

!!! attention

When you click the ⏺ botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.

![](./images/whisper_microphone_access.png)

??? info "Final check"

Once done, if you click on the "**⚠ Not secure**" part in the URL bar, you should see something like this.

![](./images/whisper_web_setting.png)

#### Result

Once you go through all the steps, you should see the transcribe result in text like this.

![](./images/whisper_transcribe_result.png)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ nav:
- NanoDB: tutorial_nanodb.md
- Audio:
- Audiocraft: tutorial_audiocraft.md
- Whisper: tutorial_whisper.md
# - Tools:
# - LangChain: tutorial_distillation.md
- Tips:
Expand Down

0 comments on commit 988bb70

Please sign in to comment.