diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..fcf1e6b3 --- /dev/null +++ b/404.html @@ -0,0 +1,973 @@ + + + + +
+ + + + + + + + + + + + + + + +For running LLM benchmarks, see the MLC
container documentation.
For running VLM benchmarks, see the MLC
and MiniGPT-4
container documentation.
VIT performance data from [1] [2] [3]
+For running Riva benchmarks, see ASR Performance and TTS Performance.
+For running vector database benchmarks, see the NanoDB
container documentation.
Below, you'll find a collection of community articles showcasing the implementation of generative AI on the Jetson platform.
+JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model.
+ + +Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people.
+ +Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:
+ + +++https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/
+
Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.
++ ++
Nurgaliyev Shakhizat demonstrates llamaspeak
on Jetson AGX Orin 64GB Developer Kit.
++https://www.hackster.io/shahizat/getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-5a55b5
+
Nick Bild provides an insightful introduction to the Jetson Generative AI Playground.
+++https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598
+
JetsonHacks has a in-depth tutorial on how to use jetson-containers
and even show text-generation-webui
and stable-diffusion-webui
containers in action!
++https://jetsonhacks.com/2023/09/04/use-these-jetson-docker-containers-tutorial/
+
Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit.
+ +++https://www.hackster.io/pjdecarlo/llama-2-llms-w-nvidia-jetson-and-textgeneration-web-ui-96b070
+
Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat.
+ +++https://www.hackster.io/shahizat/running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-cbc7d4
+
JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.
+ + +++https://jetsonhacks.com/2023/08/07/speech-ai-on-nvidia-jetson-tutorial/
+
Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.
+ +++https://www.hackster.io/pjdecarlo/llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-135ff2
+
Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.
+ +++ + + + + + + + + + +https://www.hackster.io/shahizat/how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-41fd79
+
Bring generative AI to the world with NVIDIA® Jetson™
+ + Explore Tutorials + + + Walkthrough + +Bring generative AI to the world with NVIDIA® Jetson™
+ + Explore Tutorials + + + Walkthrough + +Access UP42's geospatial collections and processing workflows via Python.
+ + Get started + + + Go to GitHub + +Use UP42 via Python: order geospatial data, run analytic workflows, and + generate insights.
+Use UP42 together with your preferred Python libraries.
+Interactive maps and visualizations. Ideal to use with Jupyter notebooks.
+For running LLM benchmarks, see the MLC
container documentation.
For running VLM benchmarks, see the MLC
and MiniGPT-4
container documentation.
VIT performance data from [1] [2] [3]
"},{"location":"benchmarks.html#stable-diffusion","title":"Stable Diffusion","text":""},{"location":"benchmarks.html#riva","title":"Riva","text":"For running Riva benchmarks, see ASR Performance and TTS Performance.
"},{"location":"benchmarks.html#vector-database","title":"Vector Database","text":"For running vector database benchmarks, see the NanoDB
container documentation.
Below, you'll find a collection of community articles showcasing the implementation of generative AI on the Jetson platform.
"},{"location":"community_articles.html#jetsonhacks-jetson-ai-labs-generative-ai-playground-10-31-2023","title":"JetsonHacks Jetson AI Labs \u2013 Generative AI Playground (10-31-2023)","text":"JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model.
"},{"location":"community_articles.html#hackster-vision2audio-giving-the-blind-an-understanding-through-ai-10-15-2023","title":"Hackster Vision2Audio - Giving the blind an understanding through AI (10-15-2023)","text":"Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people.
"},{"location":"community_articles.html#nvidia-bringing-generative-ai-to-life-with-nvidia-jetson-10-19-2023","title":"NVIDIA Bringing Generative AI to Life with NVIDIA Jetson (10-19-2023)","text":"Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:
https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/
"},{"location":"community_articles.html#medium-how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-10-02-2023","title":"Medium How to set up your Jetson device for LLM inference and fine-tuning (10-02-2023)","text":"Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.
https://medium.com/@michaelyuan_88928/how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-682e36444d43
"},{"location":"community_articles.html#hackster-getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-09-16-2023","title":"Hackster Getting Started with AI on Nvidia Jetson AGX Orin Dev Kit (09-16-2023)","text":"Nurgaliyev Shakhizat demonstrates llamaspeak
on Jetson AGX Orin 64GB Developer Kit.
https://www.hackster.io/shahizat/getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-5a55b5
"},{"location":"community_articles.html#hackster-new-ai-tool-is-generating-a-lot-of-buzz-09-13-2023","title":"Hackster New AI Tool Is Generating a Lot of Buzz (09-13-2023)","text":"Nick Bild provides an insightful introduction to the Jetson Generative AI Playground.
https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598
"},{"location":"community_articles.html#jetsonhacks-use-these-jetson-docker-containers-tutorial-09-04-2023","title":"JetsonHacks Use These! Jetson Docker Containers Tutorial (09-04-2023)","text":"JetsonHacks has a in-depth tutorial on how to use jetson-containers
and even show text-generation-webui
and stable-diffusion-webui
containers in action!
https://jetsonhacks.com/2023/09/04/use-these-jetson-docker-containers-tutorial/
"},{"location":"community_articles.html#hackster-llama-2-llms-w-nvidia-jetson-and-textgeneration-web-ui-08-17-2023","title":"Hackster LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui (08-17-2023)","text":"Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit.
https://www.hackster.io/pjdecarlo/llama-2-llms-w-nvidia-jetson-and-textgeneration-web-ui-96b070
"},{"location":"community_articles.html#hackster-running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-08-14-2023","title":"Hackster Running a ChatGPT-Like LLM-LLaMA2 on a Nvidia Jetson Cluster (08-14-2023)","text":"Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat.
https://www.hackster.io/shahizat/running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-cbc7d4
"},{"location":"community_articles.html#jetsonhacks-speech-ai-on-nvidia-jetson-tutorial-08-07-2023","title":"JetsonHacks Speech AI on NVIDIA Jetson Tutorial (08-07-2023)","text":"JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.
https://jetsonhacks.com/2023/08/07/speech-ai-on-nvidia-jetson-tutorial/
"},{"location":"community_articles.html#hackster-llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-07-12-2023","title":"Hackster LLM based Multimodal AI w/ Azure Open AI & NVIDIA Jetson (07-12-2023)","text":"Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.
https://www.hackster.io/pjdecarlo/llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-135ff2
"},{"location":"community_articles.html#hackster-how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-06-13-2023","title":"Hackster How to Run a ChatGPT-Like LLM on NVIDIA Jetson board (06-13-2023)","text":"Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.
https://www.hackster.io/shahizat/how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-41fd79
"},{"location":"getting-started.html","title":"Getting started","text":""},{"location":"tips_ram-optimization.html","title":"RAM Optimization","text":"Running a LLM requires a huge RAM space.
Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models.
Here we share a couple of ways to optimize the system RAM usage.
"},{"location":"tips_ram-optimization.html#disabling-the-desktop-gui","title":"Disabling the Desktop GUI","text":"If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).
You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:
$ sudo init 3 # stop the desktop\n# log your user back into the console (Ctrl+Alt+F1, F2, ect)\n$ sudo init 5 # restart the desktop\n
If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:
To disable desktop on boot
sudo systemctl set-default multi-user.target\n
To enable desktop on boot
sudo systemctl set-default graphical.target\n
sudo systemctl disable nvargus-daemon.service\n
"},{"location":"tips_ram-optimization.html#mounting-swap","title":"Mounting Swap","text":"If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:
If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.
sudo systemctl disable nvzramconfig\nsudo fallocate -l 16G /ssd/16GB.swap\nsudo mkswap /ssd/16GB.swap\nsudo swapon /ssd/16GB.swap\n
Then add the following line to the end of /etc/fstab to make the change persistent:
/ssd/16GB.swap none swap sw 0 0\n
"},{"location":"tips_ssd-docker.html","title":"Tips - SSD + Docker","text":"Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using jetson-containers
, you want to make sure you have a huge storage space for all the containers and the models you will download.
We are going to show how you can install SSD on your Jetson, and set it up for Docker.
"},{"location":"tips_ssd-docker.html#ssd","title":"SSD","text":""},{"location":"tips_ssd-docker.html#physical-installation","title":"Physical installation","text":"Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:
lspci\n
The output should look like the following:
0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)\n
Run lsblk
to find the device name.
lsblk\n
The output should look like the following:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT\nloop0 7:0 0 16M 1 loop \nmmcblk1 179:0 0 59.5G 0 disk \n\u251c\u2500mmcblk1p1 179:1 0 58G 0 part /\n\u251c\u2500mmcblk1p2 179:2 0 128M 0 part \n\u251c\u2500mmcblk1p3 179:3 0 768K 0 part \n\u251c\u2500mmcblk1p4 179:4 0 31.6M 0 part \n\u251c\u2500mmcblk1p5 179:5 0 128M 0 part \n\u251c\u2500mmcblk1p6 179:6 0 768K 0 part \n\u251c\u2500mmcblk1p7 179:7 0 31.6M 0 part \n\u251c\u2500mmcblk1p8 179:8 0 80M 0 part \n\u251c\u2500mmcblk1p9 179:9 0 512K 0 part \n\u251c\u2500mmcblk1p10 179:10 0 64M 0 part \n\u251c\u2500mmcblk1p11 179:11 0 80M 0 part \n\u251c\u2500mmcblk1p12 179:12 0 512K 0 part \n\u251c\u2500mmcblk1p13 179:13 0 64M 0 part \n\u2514\u2500mmcblk1p14 179:14 0 879.5M 0 part \nzram0 251:0 0 1.8G 0 disk [SWAP]\nzram1 251:1 0 1.8G 0 disk [SWAP]\nzram2 251:2 0 1.8G 0 disk [SWAP]\nzram3 251:3 0 1.8G 0 disk [SWAP]\nnvme0n1 259:0 0 238.5G 0 disk \n
Identify the device corresponding to your SSD. In this case, it is nvme0n1
.
Format the SSD, create a mount point, and mount it to the filesystem.
sudo mkfs.ext4 /dev/nvme0n1\n
You can choose any name for the mount point directory. We use /ssd
here, but in jetson-containers
' setup.md documentation, /mnt
is used.
sudo mkdir /ssd\n
sudo mount /dev/nvme0n1 /ssd\n
In order to ensure that the mount persists after boot, add an entry to the fstab
file:
First, identify the UUID for your SSD:
lsblk -f\n
Then, add a new entry to the fstab
file:
sudo vi /etc/fstab\n
Insert the following line, replacing the UUID with the value found from lsblk -f
:
UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2\n
Finally, change the ownership of the /ssd
directory.
sudo chown ${USER}:${USER} /ssd\n
Install nvidia-container
package.
Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including nvidia-containers
) and Docker are already pre-installed, so this step can be skipped.
sudo apt update\nsudo apt install -y nvidia-container\n
JetPack 6.0 DP users
If you flash Jetson Linux (L4T) R36.2 (JetPack 6.0 DP) on your Jetson using SDK Manager, and install nvidia-container
using apt
, on JetPack 6.0 it no longer automatically installs Docker.
Therefore, you need to run the following to manually install Docker and set it up.
sudo apt update\nsudo apt install -y nvidia-container curl\ncurl https://get.docker.com | sh && sudo systemctl --now enable docker\nsudo nvidia-ctk runtime configure --runtime=docker\n
Restart the Docker service and add your user to the docker
group, so that you don't need to use the command with sudo
.
sudo systemctl restart docker\nsudo usermod -aG docker $USER\nnewgrp docker\n
Add default runtime in /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert the \"default-runtime\": \"nvidia\"
line as following:
{\n\"runtimes\": {\n\"nvidia\": {\n\"path\": \"nvidia-container-runtime\",\n\"runtimeArgs\": []\n}\n},\n\"default-runtime\": \"nvidia\"\n}\n
Restart Docker
sudo systemctl daemon-reload && sudo systemctl restart docker\n
Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.
Stop the Docker service.
sudo systemctl stop docker\n
Move the existing Docker folder
sudo du -csh /var/lib/docker/ && \\\nsudo mkdir /ssd/docker && \\\nsudo rsync -axPS /var/lib/docker/ /ssd/docker/ && \\\nsudo du -csh /ssd/docker/
Edit /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert \"data-root\"
line like the following.
{\n\"runtimes\": {\n\"nvidia\": {\n\"path\": \"nvidia-container-runtime\",\n\"runtimeArgs\": []\n}\n},\n\"default-runtime\": \"nvidia\",\n\"data-root\": \"/ssd/docker\"\n}\n
Rename the old Docker data directory
sudo mv /var/lib/docker /var/lib/docker.old\n
Restart the docker daemon
sudo systemctl daemon-reload && \\\nsudo systemctl restart docker && \\\nsudo journalctl -u docker\n
[Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.
watch -n1 df
[Terminal 2] Next, open a new terminal and start Docker pull.
docker pull nvcr.io/nvidia/l4t-base:r35.2.1\n
[Terminal 1] Observe that the disk usage on /ssd
goes up as the container image is downloaded and extracted.
~$ docker image ls\nREPOSITORY TAG IMAGE ID CREATED SIZE\nnvcr.io/nvidia/l4t-base r35.2.1 dc07eb476a1d 7 months ago 713MB\n
Reboot your Jetson, and verify that you observe the following:
~$ sudo blkid | grep nvme\n/dev/nvme0n1: UUID=\"9fc06de1-7cf3-43e2-928a-53a9c03fc5d8\" TYPE=\"ext4\"\n\n~$ df -h\nFilesystem Size Used Avail Use% Mounted on\n/dev/mmcblk1p1 116G 18G 94G 16% /\nnone 3.5G 0 3.5G 0% /dev\ntmpfs 3.6G 108K 3.6G 1% /dev/shm\ntmpfs 734M 35M 699M 5% /run\ntmpfs 5.0M 4.0K 5.0M 1% /run/lock\ntmpfs 3.6G 0 3.6G 0% /sys/fs/cgroup\ntmpfs 734M 88K 734M 1% /run/user/1000\n/dev/nvme0n1 458G 824M 434G 1% /ssd\n\n~$ docker info | grep Root\n Docker Root Dir: /ssd/docker\n\n~$ sudo ls -l /ssd/docker/\ntotal 44\ndrwx--x--x 4 root root 4096 Mar 22 11:44 buildkit\ndrwx--x--- 2 root root 4096 Mar 22 11:44 containers\ndrwx------ 3 root root 4096 Mar 22 11:44 image\ndrwxr-x--- 3 root root 4096 Mar 22 11:44 network\ndrwx--x--- 13 root root 4096 Mar 22 16:20 overlay2\ndrwx------ 4 root root 4096 Mar 22 11:44 plugins\ndrwx------ 2 root root 4096 Mar 22 16:19 runtimes\ndrwx------ 2 root root 4096 Mar 22 11:44 swarm\ndrwx------ 2 root root 4096 Mar 22 16:20 tmp\ndrwx------ 2 root root 4096 Mar 22 11:44 trust\ndrwx-----x 2 root root 4096 Mar 22 16:19 volumes\n\n~$ sudo du -chs /ssd/docker/\n752M /ssd/docker/\n752M total\n\n~$ docker info | grep -e \"Runtime\" -e \"Root\"\nRuntimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2\n Default Runtime: nvidia\n Docker Root Dir: /ssd/docker\n
Your Jetson is now set up with the SSD!
"},{"location":"try.html","title":"Try","text":"Jump to NVIDIA Jetson Store.
"},{"location":"tutorial-intro.html","title":"Tutorial - Introduction","text":""},{"location":"tutorial-intro.html#overview","title":"Overview","text":"Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.
"},{"location":"tutorial-intro.html#text-llm","title":"Text (LLM)","text":"text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava!"},{"location":"tutorial-intro.html#text-vision-vlm","title":"Text + Vision (VLM)","text":"Give your locally running LLM an access to vision!
Mini-GPT4 Mini-GPT4, an open-source model that demonstrate vision-language capabilities. LLaVA Large Language and Vision Assistant, multimodal model that combines a vision encoder and Vicuna LLM for general-purpose visual and language understanding."},{"location":"tutorial-intro.html#image-generation","title":"Image Generation","text":"Stable Diffusion Run AUTOMATIC1111'sstable-diffusion-webui
to generate images from prompts Stable Diffusion XL A newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities."},{"location":"tutorial-intro.html#vision-transformers-vit","title":"Vision Transformers (ViT)","text":"EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT SAM Meta's SAM, Segment Anything model TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation"},{"location":"tutorial-intro.html#vector-database","title":"Vector Database","text":"NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data"},{"location":"tutorial-intro.html#audio","title":"Audio","text":"AudioCraft Meta's AudioCraft, to produce high-quality audio and music Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR)"},{"location":"tutorial-intro.html#tips","title":"Tips","text":"Knowledge Distillation SSD + Docker Memory optimization"},{"location":"tutorial-intro.html#about-nvidia-jetson","title":"About NVIDIA Jetson","text":"Note
We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores RAM(CPU+GPU) 64GB 32GB 8GB Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)"},{"location":"tutorial_audiocraft.html","title":"Tutorial - AudioCraft","text":"Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
10.7 GB
for audiocraft
container imagejetson-containers
","text":"git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_audiocraft.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers\n./run.sh $(./autotag audiocraft)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
AudioCraft repo comes with demo Jupyter notebooks.
On Jupyter Lab navigation pane on the left, double-click demos
folder.
For \"Text-conditional Generation\", you should get something like this.
Your browser does not support the audio element.
Info
You may encounter an error message like the following when executing the first cell, but you can keep going.
A matching Triton is not available, some optimizations will not be enabled.\nError caught was: No module named 'triton'\n
Warning
When running the 5-th cell of audiogen_demo.ipynb
, you may run into \"Failed to load audio\" RuntimeError.
For \"Text-conditional Generation\", you should get something like this.
Your browser does not support the audio element.
Warning
When running the 5-th cell of musicgen_demo.ipynb
, you may run into \"Failed to load audio\" RuntimeError.
See \"Jetson Introduction to Knowledge Distillation\" repo's README.md.
https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation
"},{"location":"tutorial_efficientvit.html","title":"Tutorial - EfficientViT","text":"Let's run MIT Han Lab's EfficientViT on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
10.9 GB
for efficientvit
container imagejetson-containers
","text":"git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_efficientvit.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers\n./run.sh $(./autotag efficientvit)\n
"},{"location":"tutorial_efficientvit.html#usage-of-efficientvit","title":"Usage of EfficientViT","text":"The official EfficientViT repo shows the complete usage information.
https://github.com/mit-han-lab/efficientvit#usage
"},{"location":"tutorial_efficientvit.html#run-examplebenchmark","title":"Run example/benchmark","text":"Inside the container, a small benchmark script benchmark.py
is added under /opt/efficientvit
directory by the jetson-container build process.
It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.
"},{"location":"tutorial_efficientvit.html#download-l2pt-model","title":"Downloadl2.pt
model","text":"mkdir -p /data/models/efficientvit/sam/\ncd /data/models/efficientvit/sam/\nwget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt\n
The downloaded checkpoint file is stored on the /data/
directory that is mounted from the Docker host.
cd /opt/efficientvit\npython3 ./benchmark.py\n
At the end you should see a summary like the following.
AVERAGE of 2 runs:\n encoder --- 0.062 sec\n latency --- 0.083 sec\nMemory consumption : 3419.68 MB\n
"},{"location":"tutorial_efficientvit.html#check-the-outputresult","title":"Check the output/result","text":"The output image file (of the last inference result) is stored as /data/benchmarks/efficientvit_sam_demo.png
.
It is stored under /data/
directory that is mounted from the Docker host. So you can go back to your host machine, and check ./jetson-containers/data/benchmark/
directory.
You should find the output like this.
"},{"location":"tutorial_llamaspeak.html","title":"Tutorial - llamaspeak","text":"Talk live with Llama using Riva ASR/TTS, and chat about images with Llava!
llamaspeak:v1
- uses text-generation-webui loaders for LLM models (llama.cpp, exllama, AutoGPTQ, Transformers)llamaspeak:v2
- uses AWQ/MLC from local_llm
package, web chat voice agent llamaspeak v2 has multimodal support for chatting about images with quantized Llava-1.5:
Multimodal Voice Chat with LLaVA-1.5 13B on NVIDIA Jetson AGX Orin (container: local_llm
)
See the Voice Chat
section of the local_llm
documentation to run llamaspeak v2.
LLaVA is a leading multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries. Internally, it uses the CLIP vision encoder to transform images into a common embedding space that the LLM (which is the same as Llama architecture) can understand with text. Below we will cover a few methods to Llava on Jetson, some with quantization for improved performance:
text-generation-webui
llava.serve.cli
jetson-containers
","text":"git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_llava.html#1-chat-with-llava-using-text-generation-webui","title":"1. Chat with Llava using text-generation-webui
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for text-generation-webui
container image1.7GB
7.25GB
The oobabooga chat UI from the LLM tutorial has a multimodal extension for Llava, and it supports 4-bit quantization using AutoGPTQ. If you already used text-generation-webui before 12/2023, do sudo docker pull $(./autotag text-generation-webui)
to update to the latest container.
./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) \\\n python3 download-model.py --output=/data/models/text-generation-webui \\\n TheBloke/llava-v1.5-13B-GPTQ\n
"},{"location":"tutorial_llava.html#start-web-ui-with-multimodal-extension","title":"Start Web UI with Multimodal Extension","text":"./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) \\\n python3 server.py --listen \\\n --model-dir /data/models/text-generation-webui \\\n --model TheBloke_llava-v1.5-13B-GPTQ \\\n --multimodal-pipeline llava-v1.5-13b \\\n --loader autogptq \\\n --disable_exllama \\\n --verbose\n
Go to Chat tab, drag and drop an image of your choice into the Drop Image Here area, and your question in the text area above and hit Generate.
"},{"location":"tutorial_llava.html#result","title":"Result","text":""},{"location":"tutorial_llava.html#2-run-from-the-terminal-with-llavaservecli","title":"2. Run from the terminal withllava.serve.cli
","text":"What you need
One of the following Jetson:
Jetson AGX Orin 64GB Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1GB
for llava
container image14GB
, or26GB
This example uses the upstream Llava codebase to run the original, unquantized Llava models from the command-line. As such, it uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging. See the Llava container readme for more infomation.
"},{"location":"tutorial_llava.html#llava-v15-7b","title":"llava-v1.5-7b","text":"./run.sh $(./autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-7b \\\n --image-file /data/images/hoover.jpg\n
"},{"location":"tutorial_llava.html#llava-v15-13b","title":"llava-v1.5-13b","text":"./run.sh $(./autotag llava) \\\npython3 -m llava.serve.cli \\\n--model-path liuhaotian/llava-v1.5-13b \\\n--image-file /data/images/hoover.jpg\n
This may run only on Jetson AGX Orin 64GB due to memory requirements."},{"location":"tutorial_minigpt4.html","title":"Tutorial - MiniGPT-4","text":"Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.8GB
for container imageMiniGPT-4
","text":"See jetson-containers
' minigpt4
package README for more infomation**
jetson-containers
","text":"git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_minigpt4.html#start-minigpt4-container-with-models","title":"Start minigpt4
container with models","text":"To start the MiniGPT4 container and webserver with the recommended models, run this command:
cd jetson-containers\n./run.sh $(./autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 && python3 webui.py \\\n $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \\\n $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'\n
Then, open your web browser and access http://<IP_ADDRESS>:7860
.
Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7.0GB
for container imagenanodb
","text":""},{"location":"tutorial_nanodb.html#clone-jetson-containers","title":"Clone jetson-containers
","text":"See jetson-containers
' nanodb
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_nanodb.html#how-to-start","title":"How to start","text":""},{"location":"tutorial_nanodb.html#download-your-data","title":"Download your data","text":"Just for an example, let's just use MS COCO dataset.
cd jetson-containers\nmkdir data/datasets/coco/\ncd data/datasets/coco\nwget http://images.cocodataset.org/zips/train2017.zip\nunzip train2017.zip\n
"},{"location":"tutorial_nanodb.html#indexing-data","title":"Indexing Data","text":"First, we need to build the index by scanning your dataset directory.
cd jetson-containers\n./run.sh -v ${PWD}/data/datasets/coco:/my_dataset $(./autotag nanodb) \\\n python3 -m nanodb \\\n --scan /my_dataset \\\n --path /my_dataset/nanodb \\\n --autosave --validate \n
This will take about 2 hours.
Once the database has loaded and completed any start-up operations , it will drop down to a >
prompt from which the user can run search queries. You can quickly check the operation by typing your query on this prompt.
> a girl riding a horse\n\n* index=80110 /data/datasets/coco/2017/train2017/000000393735.jpg similarity=0.29991915822029114\n* index=158747 /data/datasets/coco/2017/unlabeled2017/000000189708.jpg similarity=0.29254037141799927\n* index=123846 /data/datasets/coco/2017/unlabeled2017/000000026239.jpg similarity=0.292171448469162\n* index=127338 /data/datasets/coco/2017/unlabeled2017/000000042508.jpg similarity=0.29118549823760986\n* index=77416 /data/datasets/coco/2017/train2017/000000380634.jpg similarity=0.28964102268218994\n* index=51992 /data/datasets/coco/2017/train2017/000000256290.jpg similarity=0.28929752111434937\n* index=228640 /data/datasets/coco/2017/unlabeled2017/000000520381.jpg similarity=0.28642547130584717\n* index=104819 /data/datasets/coco/2017/train2017/000000515895.jpg similarity=0.285491943359375\n
You can press Ctrl+C to exit from the app and the container.
"},{"location":"tutorial_nanodb.html#interactive-web-ui","title":"Interactive web UI","text":"Spin up the Gradio server.
cd jetson-containers\n./run.sh -v ${PWD}/data/datasets/coco:/my_dataset $(./autotag nanodb) \\\n python3 -m nanodb \\\n --path /my_dataset/nanodb \\\n --server --port=7860\n
You can use your PC (or any machine) that can access your Jetson via a network, and navigate your browser to http://<IP_ADDRESS>:7860
You can enter text search queries as well as drag/upload images.
"},{"location":"tutorial_nanoowl.html","title":"Tutorial - NanoOWL","text":"Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7.2 GB
for container imagejetson-containers
","text":"git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_nanoowl.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers\n./run.sh $(./autotag nanoowl)\n
"},{"location":"tutorial_nanoowl.html#how-to-run-the-tree-prediction-live-camera-example","title":"How to run the tree prediction (live camera) example","text":"Ensure you have a camera device connected
ls /dev/video*\n
If no video device is found, exit from the container and check if you can see a video device on the host side.
Launch the demo
cd examples/tree_demo\npython3 tree_demo.py ../../data/owl_image_encoder_patch32.engine\n
Info
If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.
python3 -m nanoowl.build_image_encoder_engine \\\ndata/owl_image_encoder_patch32.engine\n
Second, open your browser to http://<ip address>:7860
Type whatever prompt you like to see what works!
Here are some examples
[a face [a nose, an eye, a mouth]]
[a face (interested, yawning / bored)]
(indoors, outdoors)
Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.3GB
for container imagenanosam
","text":""},{"location":"tutorial_nanosam.html#clone-jetson-containers","title":"Clone jetson-containers
","text":"See jetson-containers
' nanosam
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_nanosam.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers\n./run.sh $(./autotag nanosam)\n
"},{"location":"tutorial_nanosam.html#run-examples","title":"Run examples","text":"Inside the container, you can move to /opt/nanosam
directory, to go through all the examples demonstrated on the repo.
cd /opt/nanosam\n
To run the \"Example 1 - Segment with bounding box\":
python3 examples/basic_usage.py \\\n --image_encoder=\"data/resnet18_image_encoder.engine\" \\\n --mask_decoder=\"data/mobile_sam_mask_decoder.engine\"\n
The result is saved under /opt/nanosam/data/basic_usage_out.jpg
.
To check on your host machine, you can copy that into /data
directory of the container where that is mounted from the host.
cp data/basic_usage_out.jpg /data/\n
Then you can go to your host system, and find the file under the jetson_containers
' data
directory, like jetson_containers/data/basic_usage_out.jpg
.
Let's run Meta's SAM
on NVIDIA Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container imagesam
","text":""},{"location":"tutorial_sam.html#clone-jetson-containers","title":"Clone jetson-containers
","text":"See jetson-containers
' sam
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_sam.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers\n./run.sh $(./autotag sam)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
In Jupyter Lab, navigate to notebooks
and open automatic_mask_generator_example.py
notebook.
Create a new cell at the top, insert the model download command below and run the cell.
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\n
Then go through executing all the cells below Set-up.
"},{"location":"tutorial_sam.html#results","title":"Results","text":"The biggest vit_h
(2.4GB) model may not ran due to OOM, but vit_l
(1.1GB) runs on Jetson Orin Nano.\u00a0\u21a9
Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities. All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.
What you need
One of the following Jetson devices (SDXL requires >= ~13GB memory)
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image12.4GB
for SDXL modelsHave followed the previous stable-diffusion-webui
tutorial and have the webserver container running.
Stable Diffusion XL is supported through AUTOMATIC1111's stable-diffusion-webui
with some additional settings. First you need to download the SDXL models to your jetson-containers
data directory (which is automatically mounted into the container)
# run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device\nCONTAINERS_DIR=/path/to/your/jetson-containers\nMODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/\n\nsudo chown -R $USER $MODEL_DIR\n\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors\n
It's assumed that you already have the stable-diffusion-webui
container and webserver running from the previous tutorial.
After the models have finished downloading, click the \ud83d\udd04 button to refresh the model list in the web UI. Select sd_xl_base_1.0.safetensors
from the Stable Diffusion checkpoint drop-down:
Then under the Generation tab, expand the Refiner section, and select sd_xl_refiner_1.0.safetensors
from the drop-down:
Guidance on other relevant settings:
Switch at
setting dictates the step at which the refiner takes over from the base model. At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image.Sampling steps
are between 20-40 and Switch at
is between 0.7-0.9. This takes experimentation to find the best combination for the characteristics of your desired output.low quality, jpeg artifacts, blurry, ect
)CFG Scale
controls how closely the model conforms to your prompt versus how creative it is.When you get a good image, remember to save your random seed and settings so you can re-generate it later!
"},{"location":"tutorial_stable-diffusion-xl.html#results","title":"Results","text":"photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)
a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)
small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)
small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Let's run AUTOMATIC1111's stable-diffusion-webui
on NVIDIA Jetson to generate images from our prompts!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image4.1GB
for SD 1.5 modelThe jetson-containers project provides pre-built Docker images for stable-diffusion-webui
. You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers
, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
containers. You can find the complete article with detailed instructions here.
If you are running this for the first time, go through the pre-setup and see the jetson-containers/stable-diffusion-webui
readme.
Use run.sh
and autotag
script to automatically pull or build a compatible container image:
cd jetson-containers\n./run.sh $(./autotag stable-diffusion-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/stable-diffusion-webui && python3 launch.py \\\n --data=/data/models/stable-diffusion \\\n --enable-insecure-extension-access \\\n --xformers \\\n --listen \\\n --port=7860\n
You should see it downloading the model checkpoint on the first run.
Open your browser and access http://<IP_ADDRESS>:7860
To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Let's run TAM
to perform Segment Anything on videos on NVIDIA Jetson.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container imagetam
","text":""},{"location":"tutorial_tam.html#clone-jetson-containers","title":"Clone jetson-containers
","text":"See jetson-containers
' tam
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_tam.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers\n./run.sh $(./autotag tam)\n
The container has a default run command (CMD
) that will automatically start TAM's web server.
Open your browser and access http://<IP_ADDRESS>:12212
.
Check out the official tutorial to learn how to operate the web UI.
"},{"location":"tutorial_tam.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html","title":"Tutorial - text-generation-webui","text":"Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui
on NVIDIA Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for container imageThe jetson-containers project provides pre-built Docker images for text-generation-webui
along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect). You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone --depth=1 https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers
, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
containers. You can find the complete article with detailed instructions here.
If you are running this for the first time, go through the pre-setup and see the jetson-containers/text-generation-webui
container readme.
Use run.sh
and autotag
script to automatically pull or build a compatible container image:
cd jetson-containers\n./run.sh $(./autotag text-generation-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/text-generation-webui && python3 server.py \\\n --model-dir=/data/models/text-generation-webui \\\n --chat \\\n --listen\n
Open your browser and access http://<IP_ADDRESS>:7860
.
See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using download-model.py
./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \\\n'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'\n
From within the web UI, select Model tab and navigate to \"Download model or LoRA\" section.
You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub). Then click the Download button.
"},{"location":"tutorial_text-generation.html#gguf-models","title":"GGUF models","text":"The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.
You can download a single model file for a particular quantization, like *.Q4_K_M.bin
. Input the file name and hit Download button.
TheBloke/Llama-2-7b-Chat-GGUF
llama-2-7b-chat.Q4_K_M.gguf
5,268 TheBloke/Llama-2-13B-chat-GGUF
llama-2-13b-chat.Q4_K_M.gguf
8,609 TheBloke/LLaMA-30b-GGUF
llama-30b.Q4_K_S.gguf
19,045 TheBloke/Llama-2-70B-chat-GGUF
llama-2-70b-chat.Q4_K_M.gguf
37,655 Info
"},{"location":"tutorial_text-generation.html#model-selection-for-jetson-orin-nano","title":"Model selection for Jetson Orin Nano","text":"Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below. The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano. Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.
"},{"location":"tutorial_text-generation.html#load-a-model","title":"Load a model","text":"After you have downloaded a model, click the \ud83d\udd04 button to refresh your model list, and select the model you want to use.
For a GGUF model, remember to
n-gpu-layers
to 128
n_gqa
to 8
if you using Llama-2-70B (on Jetson AGX Orin 64GB)Then click the Load button.
"},{"location":"tutorial_text-generation.html#chat-template","title":"Chat Template","text":"If you're using a Llama model fine-tuned for chat, like the models listed above (except for LLaMA-30b
), you need to use the oobabooga Instruct mode and set the template. On the Parameters tab, go to the Instruction Template sub-tab, then select Llama-v2
from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)
For the base text completion models (like LLaMA-30b
), use the Default or Notebook tab.
Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior. There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like temperature
and top_p
for controlling randomness.
Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode). Then you can start chatting with the LLM!
"},{"location":"tutorial_text-generation.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html#things-to-do-with-your-llm","title":"Things to do with your LLM","text":"Here are some common test prompts for coding, math, history ect. You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else. Also Llama-2 is quite playful and likes to play games to test its logic abilities!
>> What games do you like to play?\n\nI'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:\n\n- 20 Questions: You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions. The goal is to guess the object in as few questions as possible.\n- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!\n- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!\n- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!\n- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own\n- Jokes: I love to tell jokes and make people laugh with my \"Dad humor\"! Knock knock! *giggles*\n
"},{"location":"tutorial_text-generation.html#model-size-tested","title":"Model size tested","text":"With llama.cpp, GGUF model, 4-bit quantization:
Model size Jetson AGX Orin 64GB Jetson AGX Orin 32GB Jetson Orin Nano 8GB 70B model \u2705 30B model \u2705 \u2705 13B model \u2705 \u2705 7B model \u2705 \u2705 \u2705Want to explore using Python APIs to run LLMs directly? See jetson-containers
for its LLM related packages and containers.
Limited to 7B model (4-bit quantized).\u00a0\u21a9
Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1 GB
for whisper
container imagejetson-containers
","text":"git clone https://github.com/dusty-nv/jetson-containers\ncd jetson-containers\nsudo apt update; sudo apt install -y python3-pip\npip3 install -r requirements.txt\n
"},{"location":"tutorial_whisper.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers\n./run.sh $(./autotag whisper)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server, with SSL enabled.
Open your browser and access https://<IP_ADDRESS>:8888
.
Attention
Note it is https
(not http
).
HTTPS (SSL) connection is needed to allow ipywebrtc
widget to have access to your microphone (for record-and-transcribe.ipynb
).
You will see a warning message like this.
Press \"Advanced\" button and then click on \"Proceed to (unsafe)\" link to proceed to the Jupyter Lab web interface.
The default password for Jupyter Lab is nvidia
.
Whisper repo comes with demo Jupyter notebooks, which you can find under /notebooks/
directory.
jetson-containers
also adds one convenient notebook (record-and-transcribe.ipynb
) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.
record-and-transcribe.ipynb
","text":"This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's medium
model to transcribe the audio sample.
It uses Jupyter notebook/lab's ipywebrtc
extension to record an audio sample on your web browser.
Attention
When you click the \u23fa botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.
Final checkOnce done, if you click on the \"\u26a0 Not secure\" part in the URL bar, you should see something like this.
"},{"location":"tutorial_whisper.html#result","title":"Result","text":"Once you go through all the steps, you should see the transcribe result in text like this.
"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 00000000..0f8724ef --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,3 @@ + +Running a LLM requires a huge RAM space.
+Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models.
+Here we share a couple of ways to optimize the system RAM usage.
+If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI.
+This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).
You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:
+$ sudo init 3 # stop the desktop
+# log your user back into the console (Ctrl+Alt+F1, F2, ect)
+$ sudo init 5 # restart the desktop
+
If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:
+To disable desktop on boot
+sudo systemctl set-default multi-user.target
+
To enable desktop on boot
+sudo systemctl set-default graphical.target
+
sudo systemctl disable nvargus-daemon.service
+
If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:
+++If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.
+
sudo systemctl disable nvzramconfig
+sudo fallocate -l 16G /ssd/16GB.swap
+sudo mkswap /ssd/16GB.swap
+sudo swapon /ssd/16GB.swap
+
Then add the following line to the end of /etc/fstab to make the change persistent:
+/ssd/16GB.swap none swap sw 0 0
+
Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using jetson-containers
, you want to make sure you have a huge storage space for all the containers and the models you will download.
We are going to show how you can install SSD on your Jetson, and set it up for Docker.
+Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:
+lspci
+
The output should look like the following:
+0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)
+
Run lsblk
to find the device name.
lsblk
+
The output should look like the following:
+NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
+loop0 7:0 0 16M 1 loop
+mmcblk1 179:0 0 59.5G 0 disk
+├─mmcblk1p1 179:1 0 58G 0 part /
+├─mmcblk1p2 179:2 0 128M 0 part
+├─mmcblk1p3 179:3 0 768K 0 part
+├─mmcblk1p4 179:4 0 31.6M 0 part
+├─mmcblk1p5 179:5 0 128M 0 part
+├─mmcblk1p6 179:6 0 768K 0 part
+├─mmcblk1p7 179:7 0 31.6M 0 part
+├─mmcblk1p8 179:8 0 80M 0 part
+├─mmcblk1p9 179:9 0 512K 0 part
+├─mmcblk1p10 179:10 0 64M 0 part
+├─mmcblk1p11 179:11 0 80M 0 part
+├─mmcblk1p12 179:12 0 512K 0 part
+├─mmcblk1p13 179:13 0 64M 0 part
+└─mmcblk1p14 179:14 0 879.5M 0 part
+zram0 251:0 0 1.8G 0 disk [SWAP]
+zram1 251:1 0 1.8G 0 disk [SWAP]
+zram2 251:2 0 1.8G 0 disk [SWAP]
+zram3 251:3 0 1.8G 0 disk [SWAP]
+nvme0n1 259:0 0 238.5G 0 disk
+
Identify the device corresponding to your SSD. In this case, it is nvme0n1
.
Format the SSD, create a mount point, and mount it to the filesystem.
+sudo mkfs.ext4 /dev/nvme0n1
+
++You can choose any name for the mount point directory. We use
+/ssd
here, but injetson-containers
' setup.md documentation,/mnt
is used.
sudo mkdir /ssd
+
sudo mount /dev/nvme0n1 /ssd
+
In order to ensure that the mount persists after boot, add an entry to the fstab
file:
First, identify the UUID for your SSD:
+lsblk -f
+
Then, add a new entry to the fstab
file:
sudo vi /etc/fstab
+
Insert the following line, replacing the UUID with the value found from lsblk -f
:
UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2
+
Finally, change the ownership of the /ssd
directory.
sudo chown ${USER}:${USER} /ssd
+
Install nvidia-container
package.
++Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including
+nvidia-containers
) and Docker are already pre-installed, so this step can be skipped.
sudo apt update
+sudo apt install -y nvidia-container
+
JetPack 6.0 DP users
+If you flash Jetson Linux (L4T) R36.2 (JetPack 6.0 DP) on your Jetson using SDK Manager, and install nvidia-container
using apt
, on JetPack 6.0 it no longer automatically installs Docker.
Therefore, you need to run the following to manually install Docker and set it up.
+sudo apt update
+sudo apt install -y nvidia-container curl
+curl https://get.docker.com | sh && sudo systemctl --now enable docker
+sudo nvidia-ctk runtime configure --runtime=docker
+
Restart the Docker service and add your user to the docker
group, so that you don't need to use the command with sudo
.
sudo systemctl restart docker
+sudo usermod -aG docker $USER
+newgrp docker
+
Add default runtime in /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json
+
Insert the "default-runtime": "nvidia"
line as following:
{
+ "runtimes": {
+ "nvidia": {
+ "path": "nvidia-container-runtime",
+ "runtimeArgs": []
+ }
+ },
+ "default-runtime": "nvidia"
+}
+
Restart Docker
+sudo systemctl daemon-reload && sudo systemctl restart docker
+
Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.
+Stop the Docker service.
+sudo systemctl stop docker
+
Move the existing Docker folder
+sudo du -csh /var/lib/docker/ && \
+ sudo mkdir /ssd/docker && \
+ sudo rsync -axPS /var/lib/docker/ /ssd/docker/ && \
+ sudo du -csh /ssd/docker/
+
Edit /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json
+
Insert "data-root"
line like the following.
{
+ "runtimes": {
+ "nvidia": {
+ "path": "nvidia-container-runtime",
+ "runtimeArgs": []
+ }
+ },
+ "default-runtime": "nvidia",
+ "data-root": "/ssd/docker"
+}
+
Rename the old Docker data directory
+sudo mv /var/lib/docker /var/lib/docker.old
+
Restart the docker daemon
+sudo systemctl daemon-reload && \
+ sudo systemctl restart docker && \
+ sudo journalctl -u docker
+
[Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.
+watch -n1 df
+
[Terminal 2] Next, open a new terminal and start Docker pull.
+docker pull nvcr.io/nvidia/l4t-base:r35.2.1
+
[Terminal 1] Observe that the disk usage on /ssd
goes up as the container image is downloaded and extracted.
~$ docker image ls
+REPOSITORY TAG IMAGE ID CREATED SIZE
+nvcr.io/nvidia/l4t-base r35.2.1 dc07eb476a1d 7 months ago 713MB
+
Reboot your Jetson, and verify that you observe the following:
+~$ sudo blkid | grep nvme
+/dev/nvme0n1: UUID="9fc06de1-7cf3-43e2-928a-53a9c03fc5d8" TYPE="ext4"
+
+~$ df -h
+Filesystem Size Used Avail Use% Mounted on
+/dev/mmcblk1p1 116G 18G 94G 16% /
+none 3.5G 0 3.5G 0% /dev
+tmpfs 3.6G 108K 3.6G 1% /dev/shm
+tmpfs 734M 35M 699M 5% /run
+tmpfs 5.0M 4.0K 5.0M 1% /run/lock
+tmpfs 3.6G 0 3.6G 0% /sys/fs/cgroup
+tmpfs 734M 88K 734M 1% /run/user/1000
+/dev/nvme0n1 458G 824M 434G 1% /ssd
+
+~$ docker info | grep Root
+ Docker Root Dir: /ssd/docker
+
+~$ sudo ls -l /ssd/docker/
+total 44
+drwx--x--x 4 root root 4096 Mar 22 11:44 buildkit
+drwx--x--- 2 root root 4096 Mar 22 11:44 containers
+drwx------ 3 root root 4096 Mar 22 11:44 image
+drwxr-x--- 3 root root 4096 Mar 22 11:44 network
+drwx--x--- 13 root root 4096 Mar 22 16:20 overlay2
+drwx------ 4 root root 4096 Mar 22 11:44 plugins
+drwx------ 2 root root 4096 Mar 22 16:19 runtimes
+drwx------ 2 root root 4096 Mar 22 11:44 swarm
+drwx------ 2 root root 4096 Mar 22 16:20 tmp
+drwx------ 2 root root 4096 Mar 22 11:44 trust
+drwx-----x 2 root root 4096 Mar 22 16:19 volumes
+
+~$ sudo du -chs /ssd/docker/
+752M /ssd/docker/
+752M total
+
+~$ docker info | grep -e "Runtime" -e "Root"
+ Runtimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2
+ Default Runtime: nvidia
+ Docker Root Dir: /ssd/docker
+
Your Jetson is now set up with the SSD!
+ + + + + + + + +Jump to NVIDIA Jetson Store.
+ + + + + + + + +Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.
++ | + |
---|---|
text-generation-webui | +Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui | +
llamaspeak | +Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! | +
Give your locally running LLM an access to vision!
++ | + |
---|---|
Mini-GPT4 | +Mini-GPT4, an open-source model that demonstrate vision-language capabilities. | +
LLaVA | +Large Language and Vision Assistant, multimodal model that combines a vision encoder and Vicuna LLM for general-purpose visual and language understanding. | +
+ | + |
---|---|
Stable Diffusion | +Run AUTOMATIC1111's stable-diffusion-webui to generate images from prompts |
+
Stable Diffusion XL | +A newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities. | +
+ | + |
---|---|
EfficientVIT | +MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction | +
NanoSAM | +NanoSAM, SAM model variant capable of running in real-time on Jetson | +
NanoOWL | +OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT | +
SAM | +Meta's SAM, Segment Anything model | +
TAM | +TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation | +
+ | + |
---|---|
NanoDB | +Interactive demo to witness the impact of Vector Database that handles multimodal data | +
+ | + |
---|---|
AudioCraft | +Meta's AudioCraft, to produce high-quality audio and music | +
Whisper | +OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR) | +
+ | + |
---|---|
Knowledge Distillation | ++ |
SSD + Docker | ++ |
Memory optimization | ++ |
Note
+We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.
++ | Jetson AGX Orin 64GB Developer Kit | +Jetson AGX Orin Developer Kit | +Jetson Orin Nano Developer Kit | +|
---|---|---|---|---|
+ | + | + | ||
GPU | +2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores | +1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores | ++ | |
RAM (CPU+GPU) |
+64GB | +32GB | +8GB | +|
Storage | +64GB eMMC (+ NVMe SSD) | +microSD card (+ NVMe SSD) | ++ |
Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!
+What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x)
+Sufficient storage space (preferably with NVMe SSD).
+10.7 GB
for audiocraft
container imagejetson-containers
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers
+./run.sh $(./autotag audiocraft)
+
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
++The default password for Jupyter Lab is
+nvidia
.
AudioCraft repo comes with demo Jupyter notebooks.
+On Jupyter Lab navigation pane on the left, double-click demos
folder.
For "Text-conditional Generation", you should get something like this.
+ +Info
+You may encounter an error message like the following when executing the first cell, but you can keep going. +
A matching Triton is not available, some optimizations will not be enabled.
+Error caught was: No module named 'triton'
+
Warning
+When running the 5-th cell of audiogen_demo.ipynb
, you may run into "Failed to load audio" RuntimeError.
For "Text-conditional Generation", you should get something like this.
+ +Warning
+When running the 5-th cell of musicgen_demo.ipynb
, you may run into "Failed to load audio" RuntimeError.
See "Jetson Introduction to Knowledge Distillation" repo's README.md.
+https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation
+ + + + + + + + +Let's run MIT Han Lab's EfficientViT on Jetson!
+What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x)
+Sufficient storage space (preferably with NVMe SSD).
+10.9 GB
for efficientvit
container imagejetson-containers
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers
+./run.sh $(./autotag efficientvit)
+
The official EfficientViT repo shows the complete usage information.
+https://github.com/mit-han-lab/efficientvit#usage
+Inside the container, a small benchmark script benchmark.py
is added under /opt/efficientvit
directory by the jetson-container build process.
It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.
+l2.pt
modelmkdir -p /data/models/efficientvit/sam/
+cd /data/models/efficientvit/sam/
+wget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt
+
++The downloaded checkpoint file is stored on the
+/data/
directory that is mounted from the Docker host.
cd /opt/efficientvit
+python3 ./benchmark.py
+
At the end you should see a summary like the following.
+AVERAGE of 2 runs:
+ encoder --- 0.062 sec
+ latency --- 0.083 sec
+Memory consumption : 3419.68 MB
+
The output image file (of the last inference result) is stored as /data/benchmarks/efficientvit_sam_demo.png
.
It is stored under /data/
directory that is mounted from the Docker host.
+So you can go back to your host machine, and check ./jetson-containers/data/benchmark/
directory.
You should find the output like this.
+ + + + + + + + + +Talk live with Llama using Riva ASR/TTS, and chat about images with Llava!
+ +llamaspeak:v1
- uses text-generation-webui loaders for LLM models (llama.cpp, exllama, AutoGPTQ, Transformers)llamaspeak:v2
- uses AWQ/MLC from local_llm
package, web chat voice agent llamaspeak v2 has multimodal support for chatting about images with quantized Llava-1.5:
+ +++Multimodal Voice Chat with LLaVA-1.5 13B on NVIDIA Jetson AGX Orin (container:
+local_llm
)
See the Voice Chat
section of the local_llm
documentation to run llamaspeak v2.
LLaVA is a leading multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries. Internally, it uses the CLIP vision encoder to transform images into a common embedding space that the LLM (which is the same as Llama architecture) can understand with text. Below we will cover a few methods to Llava on Jetson, some with quantization for improved performance:
+ + +jetson-containers
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
text-generation-webui
What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.2GB
for text-generation-webui
container image1.7GB
7.25GB
The oobabooga chat UI from the LLM tutorial has a multimodal extension for Llava, and it supports 4-bit quantization using AutoGPTQ. If you already used text-generation-webui before 12/2023, do sudo docker pull $(./autotag text-generation-webui)
to update to the latest container.
./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) \
+ python3 download-model.py --output=/data/models/text-generation-webui \
+ TheBloke/llava-v1.5-13B-GPTQ
+
./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) \
+ python3 server.py --listen \
+ --model-dir /data/models/text-generation-webui \
+ --model TheBloke_llava-v1.5-13B-GPTQ \
+ --multimodal-pipeline llava-v1.5-13b \
+ --loader autogptq \
+ --disable_exllama \
+ --verbose
+
Go to Chat tab, drag and drop an image of your choice into the Drop Image Here area, and your question in the text area above and hit Generate.
+ +llava.serve.cli
What you need
+One of the following Jetson:
+Jetson AGX Orin 64GB +Jetson AGX Orin (32GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.1GB
for llava
container image14GB
, or26GB
This example uses the upstream Llava codebase to run the original, unquantized Llava models from the command-line. As such, it uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging. See the Llava container readme for more infomation.
+./run.sh $(./autotag llava) \
+ python3 -m llava.serve.cli \
+ --model-path liuhaotian/llava-v1.5-7b \
+ --image-file /data/images/hoover.jpg
+
./run.sh $(./autotag llava) \
+ python3 -m llava.serve.cli \
+ --model-path liuhaotian/llava-v1.5-13b \
+ --image-file /data/images/hoover.jpg
+
Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!
+ +What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+5.8GB
for container imageMiniGPT-4
See jetson-containers
' minigpt4
package README for more infomation**
jetson-containers
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
minigpt4
container with modelsTo start the MiniGPT4 container and webserver with the recommended models, run this command:
+cd jetson-containers
+./run.sh $(./autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 && python3 webui.py \
+ $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \
+ $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'
+
Then, open your web browser and access http://<IP_ADDRESS>:7860
.
Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.
+ +What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+7.0GB
for container imagenanodb
jetson-containers
See jetson-containers
' nanodb
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Just for an example, let's just use MS COCO dataset.
+cd jetson-containers
+mkdir data/datasets/coco/
+cd data/datasets/coco
+wget http://images.cocodataset.org/zips/train2017.zip
+unzip train2017.zip
+
First, we need to build the index by scanning your dataset directory.
+cd jetson-containers
+./run.sh -v ${PWD}/data/datasets/coco:/my_dataset $(./autotag nanodb) \
+ python3 -m nanodb \
+ --scan /my_dataset \
+ --path /my_dataset/nanodb \
+ --autosave --validate
+
This will take about 2 hours.
+Once the database has loaded and completed any start-up operations , it will drop down to a >
prompt from which the user can run search queries.
+You can quickly check the operation by typing your query on this prompt.
> a girl riding a horse
+
+* index=80110 /data/datasets/coco/2017/train2017/000000393735.jpg similarity=0.29991915822029114
+* index=158747 /data/datasets/coco/2017/unlabeled2017/000000189708.jpg similarity=0.29254037141799927
+* index=123846 /data/datasets/coco/2017/unlabeled2017/000000026239.jpg similarity=0.292171448469162
+* index=127338 /data/datasets/coco/2017/unlabeled2017/000000042508.jpg similarity=0.29118549823760986
+* index=77416 /data/datasets/coco/2017/train2017/000000380634.jpg similarity=0.28964102268218994
+* index=51992 /data/datasets/coco/2017/train2017/000000256290.jpg similarity=0.28929752111434937
+* index=228640 /data/datasets/coco/2017/unlabeled2017/000000520381.jpg similarity=0.28642547130584717
+* index=104819 /data/datasets/coco/2017/train2017/000000515895.jpg similarity=0.285491943359375
+
You can press Ctrl+C to exit from the app and the container.
+Spin up the Gradio server.
+cd jetson-containers
+./run.sh -v ${PWD}/data/datasets/coco:/my_dataset $(./autotag nanodb) \
+ python3 -m nanodb \
+ --path /my_dataset/nanodb \
+ --server --port=7860
+
You can use your PC (or any machine) that can access your Jetson via a network, and navigate your browser to http://<IP_ADDRESS>:7860
You can enter text search queries as well as drag/upload images.
+ + + + + + + + + +Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.
+ +What you need
+One of the following Jetson:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+7.2 GB
for container imagejetson-containers
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers
+./run.sh $(./autotag nanoowl)
+
Ensure you have a camera device connected
+ls /dev/video*
+
++If no video device is found, exit from the container and check if you can see a video device on the host side.
+
Launch the demo +
cd examples/tree_demo
+python3 tree_demo.py ../../data/owl_image_encoder_patch32.engine
+
Info
+If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.
+python3 -m nanoowl.build_image_encoder_engine \
+ data/owl_image_encoder_patch32.engine
+
Second, open your browser to http://<ip address>:7860
Type whatever prompt you like to see what works!
+Here are some examples
+[a face [a nose, an eye, a mouth]]
[a face (interested, yawning / bored)]
(indoors, outdoors)
Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.
+ +What you need
+One of the following Jetson:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.3GB
for container imagenanosam
jetson-containers
See jetson-containers
' nanosam
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers
+./run.sh $(./autotag nanosam)
+
Inside the container, you can move to /opt/nanosam
directory, to go through all the examples demonstrated on the repo.
cd /opt/nanosam
+
To run the "Example 1 - Segment with bounding box":
+python3 examples/basic_usage.py \
+ --image_encoder="data/resnet18_image_encoder.engine" \
+ --mask_decoder="data/mobile_sam_mask_decoder.engine"
+
The result is saved under /opt/nanosam/data/basic_usage_out.jpg
.
To check on your host machine, you can copy that into /data
directory of the container where that is mounted from the host.
cp data/basic_usage_out.jpg /data/
+
Then you can go to your host system, and find the file under the jetson_containers
' data
directory, like jetson_containers/data/basic_usage_out.jpg
.
Let's run Meta's SAM
on NVIDIA Jetson.
What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)⚠️1
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.8GB
for container imagesam
jetson-containers
See jetson-containers
' sam
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers
+./run.sh $(./autotag sam)
+
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
++The default password for Jupyter Lab is
+nvidia
.
In Jupyter Lab, navigate to notebooks
and open automatic_mask_generator_example.py
notebook.
Create a new cell at the top, insert the model download command below and run the cell.
+!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
+
Then go through executing all the cells below Set-up.
+The biggest vit_h
(2.4GB) model may not ran due to OOM, but vit_l
(1.1GB) runs on Jetson Orin Nano. ↩
Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities. All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.
+ +What you need
+One of the following Jetson devices (SDXL requires >= ~13GB memory)
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.8GB
for container image12.4GB
for SDXL modelsHave followed the previous stable-diffusion-webui
tutorial and have the webserver container running.
Stable Diffusion XL is supported through AUTOMATIC1111's stable-diffusion-webui
with some additional settings. First you need to download the SDXL models to your jetson-containers
data directory (which is automatically mounted into the container)
# run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device
+CONTAINERS_DIR=/path/to/your/jetson-containers
+MODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/
+
+sudo chown -R $USER $MODEL_DIR
+
+wget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
+wget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors
+
It's assumed that you already have the stable-diffusion-webui
container and webserver running from the previous tutorial.
After the models have finished downloading, click the 🔄 button to refresh the model list in the web UI. Select sd_xl_base_1.0.safetensors
from the Stable Diffusion checkpoint drop-down:
Then under the Generation tab, expand the Refiner section, and select sd_xl_refiner_1.0.safetensors
from the drop-down:
Guidance on other relevant settings:
+Switch at
setting dictates the step at which the refiner takes over from the base model. At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image.Sampling steps
are between 20-40 and Switch at
is between 0.7-0.9. This takes experimentation to find the best combination for the characteristics of your desired output.low quality, jpeg artifacts, blurry, ect
)CFG Scale
controls how closely the model conforms to your prompt versus how creative it is.When you get a good image, remember to save your random seed and settings so you can re-generate it later!
+
+photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)
+a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)
+small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)
+small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Let's run AUTOMATIC1111's stable-diffusion-webui
on NVIDIA Jetson to generate images from our prompts!
What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.8GB
for container image4.1GB
for SD 1.5 modelThe jetson-containers project provides pre-built Docker images for stable-diffusion-webui
. You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Info
+JetsonHacks provides an informative walkthrough video on jetson-containers
, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
containers. You can find the complete article with detailed instructions here.
++If you are running this for the first time, go through the pre-setup and see the
+jetson-containers/stable-diffusion-webui
readme.
Use run.sh
and autotag
script to automatically pull or build a compatible container image:
cd jetson-containers
+./run.sh $(./autotag stable-diffusion-webui)
+
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/stable-diffusion-webui && python3 launch.py \
+ --data=/data/models/stable-diffusion \
+ --enable-insecure-extension-access \
+ --xformers \
+ --listen \
+ --port=7860
+
You should see it downloading the model checkpoint on the first run.
+Open your browser and access http://<IP_ADDRESS>:7860
To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!
+Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Let's run TAM
to perform Segment Anything on videos on NVIDIA Jetson.
What you need
+One of the following Jetson:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.8GB
for container imagetam
jetson-containers
See jetson-containers
' tam
package README for more infomation**
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers
+./run.sh $(./autotag tam)
+
The container has a default run command (CMD
) that will automatically start TAM's web server.
Open your browser and access http://<IP_ADDRESS>:12212
.
Check out the official tutorial to learn how to operate the web UI.
+ + +Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui
on NVIDIA Jetson!
What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)⚠️1
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.2GB
for container imageThe jetson-containers project provides pre-built Docker images for text-generation-webui
along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect). You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone --depth=1 https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Info
+JetsonHacks provides an informative walkthrough video on jetson-containers
, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
containers. You can find the complete article with detailed instructions here.
++If you are running this for the first time, go through the pre-setup and see the
+jetson-containers/text-generation-webui
container readme.
Use run.sh
and autotag
script to automatically pull or build a compatible container image:
cd jetson-containers
+./run.sh $(./autotag text-generation-webui)
+
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/text-generation-webui && python3 server.py \
+ --model-dir=/data/models/text-generation-webui \
+ --chat \
+ --listen
+
Open your browser and access http://<IP_ADDRESS>:7860
.
See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using download-model.py
./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \
+ 'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'
+
From within the web UI, select Model tab and navigate to "Download model or LoRA" section.
+You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub). Then click the Download button.
+The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.
+You can download a single model file for a particular quantization, like *.Q4_K_M.bin
. Input the file name and hit Download button.
Model | +Quantization | +Memory (MB) | +
---|---|---|
TheBloke/Llama-2-7b-Chat-GGUF |
+llama-2-7b-chat.Q4_K_M.gguf |
+5,268 | +
TheBloke/Llama-2-13B-chat-GGUF |
+llama-2-13b-chat.Q4_K_M.gguf |
+8,609 | +
TheBloke/LLaMA-30b-GGUF |
+llama-30b.Q4_K_S.gguf |
+19,045 | +
TheBloke/Llama-2-70B-chat-GGUF |
+llama-2-70b-chat.Q4_K_M.gguf |
+37,655 | +
Info
+Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below. The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano. Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.
+After you have downloaded a model, click the 🔄 button to refresh your model list, and select the model you want to use.
+For a GGUF model, remember to
+n-gpu-layers
to 128
n_gqa
to 8
if you using Llama-2-70B (on Jetson AGX Orin 64GB)Then click the Load button.
+If you're using a Llama model fine-tuned for chat, like the models listed above (except for LLaMA-30b
), you need to use the oobabooga Instruct mode and set the template. On the Parameters tab, go to the Instruction Template sub-tab, then select Llama-v2
from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)
For the base text completion models (like LLaMA-30b
), use the Default or Notebook tab.
Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior. There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like temperature
and top_p
for controlling randomness.
Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode). Then you can start chatting with the LLM!
+Here are some common test prompts for coding, math, history ect. You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else. Also Llama-2 is quite playful and likes to play games to test its logic abilities!
+>> What games do you like to play?
+
+I'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:
+
+- 20 Questions: You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions. The goal is to guess the object in as few questions as possible.
+- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!
+- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!
+- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!
+- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own
+- Jokes: I love to tell jokes and make people laugh with my "Dad humor"! Knock knock! *giggles*
+
With llama.cpp, GGUF model, 4-bit quantization:
+Model size | +Jetson AGX Orin 64GB | +Jetson AGX Orin 32GB | +Jetson Orin Nano 8GB | +
---|---|---|---|
70B model | +✅ | ++ | + |
30B model | +✅ | +✅ | ++ |
13B model | +✅ | +✅ | ++ |
7B model | +✅ | +✅ | +✅ | +
Want to explore using Python APIs to run LLMs directly?
+See jetson-containers
for its LLM related packages and containers.
Limited to 7B model (4-bit quantized). ↩
+Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!
+What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 5 (L4T r35.x) +JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+6.1 GB
for whisper
container imagejetson-containers
git clone https://github.com/dusty-nv/jetson-containers
+cd jetson-containers
+sudo apt update; sudo apt install -y python3-pip
+pip3 install -r requirements.txt
+
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
cd jetson-containers
+./run.sh $(./autotag whisper)
+
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server, with SSL enabled.
Open your browser and access https://<IP_ADDRESS>:8888
.
Attention
+Note it is https
(not http
).
HTTPS (SSL) connection is needed to allow ipywebrtc
widget to have access to your microphone (for record-and-transcribe.ipynb
).
You will see a warning message like this.
+ +Press "Advanced" button and then click on "Proceed to
++The default password for Jupyter Lab is
+nvidia
.
Whisper repo comes with demo Jupyter notebooks, which you can find under /notebooks/
directory.
jetson-containers
also adds one convenient notebook (record-and-transcribe.ipynb
) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.
record-and-transcribe.ipynb
This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's medium
model to transcribe the audio sample.
It uses Jupyter notebook/lab's ipywebrtc
extension to record an audio sample on your web browser.
Attention
+When you click the ⏺ botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.
+ +Once done, if you click on the "⚠ Not secure" part in the URL bar, you should see something like this.
+ +Once you go through all the steps, you should see the transcribe result in text like this.
+ + + + + + + + + +