From 89fc86c1a0ea8f9e6eaabf39d433015dc6681c0f Mon Sep 17 00:00:00 2001
From: GitHub Actions
- JetPack 6 (L4T r36.x) + JetPack 6 GA (L4T r36.3) + + + JetPack 6.1 (L4T r36.4)
@@ -1815,13 +1818,14 @@Rapidly design and experiment with creating your own automation agents, personal assistants, and edge AI systems in an interactive sandbox for connecting multimodal LLMs, speech and vision transformers, vector databases, prompt templating, and function calling to live sensors and I/O. Optimized for deployment onboard Jetson with on-device compute, low-latency streaming, and unified memory.
\u2022 Edge LLM inference with quantization and KV caching (NanoLLM) \u2022 Realtime vision/language models (ala Live Llava and Video VILA) \u2022 Speech recognition and synthesis (Whisper ASR, Piper TTS, Riva) \u2022 Multimodal vector database from NanoDB \u2022 Audio and video streaming (WebRTC, RTP, RTSP, V4L2) \u2022 Performance monitoring and profiling \u2022 Native bot-callable functions and agent tools \u2022 Extensible plugins with auto-generated UI controls \u2022 Save, load, and export pipeline presets \u2022 Native bot-callable functions and agent tools"},{"location":"agent_studio.html#running-agent-studio","title":"Running Agent Studio","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>5GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
This will start the server running on your device. You can then navigate your browser to https://IP_ADDRESS:8050
--load
flag (like --load /data/nano_llm/presets/xyz.json
) --web-port
(and --ws-port
for the websocket port)To make code changes without rebuilding the container, clone the NanoLLM sources and then mount them over /opt/NanoLLM
git clone https://github.com/dusty-nv/NanoLLM\njetson-containers run \\\n -v ${PWD}/NanoLLM:/opt/NanoLLM \\\n $(autotag nano_llm)\n
You can then edit the source from outside the container. And in the terminal that starts, you can install other packages from apt/pip/ect.
"},{"location":"agent_studio.html#plugins","title":"Plugins","text":"The models, I/O, and logic blocks share a lightweight plugin interface, exposing a process()
function and any runtime properties:
from nano_llm import Plugin\n\nclass TextFilter(Plugin):\n def __init__(self, search: str='bad word', replace: str='censored', **kwargs):\n \"\"\"\n A simple plugin performing text replacement. Write docs because they're extracted for the UI,\n and for the tool descriptions that go to the bot for function calling (and use the type hints) \n\n Args:\n search: The string to look for an replace in incoming text.\n replace: What to replace occurances of the search string with.\n \"\"\"\n super().__init__(inputs=['text'], outputs=['text'])\n\n # set these attributes and register them as configurable\n self.add_parameters(search=search, replace=replace)\n\n def process(self, input, **kwargs):\n \"\"\" Substitute the text, and the return value will be sent to the plugins connected to this. \"\"\"\n return input.replace(self.search, self.replace)\n
Typically each plugin runs asynchronously its own thread and queue of data, although some lightweight routines are executed inline. Usually they're in the same process, but could make requests to microservices. Websockets are used for clients and the web UI.
Below are descriptions of commonly-used components. Help text for these is extracted from their Python docs and shown in the UI, along with settings that you can control. In the node editor, click on a plugin to open its grid widget or configuration window.
"},{"location":"agent_studio.html#llm","title":"LLM","text":"NanoLLMAutoPromptUserPromptTextStream >nano_llm.plugins.NanoLLM
(source)
Load quantized LLM/VLM with MLC (speed), AWQ (quality), or HF Transformers (compatability)\n\n - Inputs\n * str, list[str], np.ndarray, torch.Tensor, cudaImage, PIL.Image \n\n - Commands \n * /reset /refresh /generate (these inputs can control the chat)\n\n - Outputs\n * delta The latest detokenized text to be generated\n * partial All of the text generated in the response so far\n * final The complete response sent at the end of the request\n * words Similar to 'delta', but with at least one word at a time\n * history A list of the messages in the chat history (as dictionaries)\n * tools Enable the bot to call functions in plugins connected on this channel\n
> nano_llm.plugins.AutoPrompt
(source)
Apply a template each time new data comes in, for example '<image> Describe the image' \n would tag each incoming image along with that text prompt (with images, usually for VLM)\n\n AutoPrompt can be used to compose structured messages that reference prior inputs, \n like `Image 1: <image> Image 2: <image>` or using <text> for last text recieved: \n\n The three most recent replies were:\n * <text>\n * <text>\n * <text> \n\nThese most recent inputs are used in newest to oldest order from a LIFO queue.\n
> nano_llm.plugins.UserPrompt
(source)
This plugin is for inputting text on the keyboard, either from the terminal or text box in the UI.\n\n It can also load prompts from a text or json file, and those files can reference other files to load.\n
> nano_llm.plugins.TextStream
(source)
Simple plugin for viewing any stream of text from the system in a text box in the UI. \n\n For sources emanating from ASR or LLM, can apply color highlighting to partial/final responses.\n
"},{"location":"agent_studio.html#speech","title":"Speech","text":"Whisper ASRPiper TTSVAD FilterRiva > nano_llm.plugins.WhisperASR
(source)
Whisper streaming speech-to-text with TensorRT (github.com/NVIDIA-AI-IOT/whisper_trt)\n\n Supports these Whisper models: 'tiny' (39M), 'base' (74M), 'small' (244M)\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * final The text transcript of the entire segment spoken in the input\n * partial Rolling partial transcript of the words spoken so far in the audio segment\n
The Whisper plugin is designed to be used in a pipeline with the VAD Filter to reduce erroneous output from background noise:
>nano_llm.plugins.PiperTTS
(source)
Piper text-to-speech using CUDA and onnxruntime (github.com/rhasspy/piper)\n\n Populates a list of the available Piper models that it can download, along with the speaker voices \n for multi-speaker models. The default model is 'en_US-libritts-high'. Has controls for voice speed.\n\n - Inputs\n * str either individual words, sentences, or an entire passage to synthesize\n\n - Outputs\n * audio np.ndarray (int16) at 16KHz sample rate (for low-quality models) or 22.5KHz (for medium/high)\n
This PiperTTS preset is available, which uses RateLimit to stream the output at 1x realtime factor (so it can be muted later)
>nano_llm.plugins.VADFilter
(source)
Voice activity detection model using Silero. Drops incoming audio unless it exceeds the VAD threshold.\n\n Use this on incoming audio before ASR plugins to reduce spurious transcripts produced from background noise.\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * audio np.ndarray or torch.Tensor (only when VAD confidence > threshold)\n
Riva ASR / TTS services - these run in another container over GRPC. See these resources for setup:
NGC - riva-embedded quickstart guide
jetson-containers - riva container setup
JetsonHacks - Speech AI on NVIDIA Jetson Tutorial
AudioInputDevice
USB/I2S soundcard input via PulseAudio/PyAudio (for microphone directly attached to system) AudioOutputDevice
USB/I2S soundcard output via PulseAudio/PyAudio (for speakers directly attached to system) AudioRecorder
Save an audio stream to WAV file on the server WebAudioIn
Recieve audio samples streamed from the client over websockets WebAudioOut
Transmit audio samples to the client over websockets"},{"location":"agent_studio.html#video","title":"Video","text":"VideoSource
Capture images from camera device (V4L2/CSI), network stream (RTP, RTSP), video file (MP4, MKV, AVI, FLV) VideoOutput
Output H264/H265-encoded video to network stream (RTP, RTSP, WebRTC), display, or file (MP4, MKV, AVI, FLV) VideoOverlay
Draw text on top of video streams for HUD or OSD-style display RateLimit
Throttle transmission to a specified rate (can also be used with audio)"},{"location":"agent_studio.html#database","title":"Database","text":"NanoDB
Optimized in-memory multimodal vectorDB for txt2img/img2img similarity search and image tagging [[\u2197]](tutorial_nanodb.md) Deduplicate
Filter incoming text against previous entries with embedding model and cosine similarity over a timeout period EventFilter
Look for keywords or search strings in streams of text, keeping track of the begin/end times that these occur."},{"location":"agent_studio.html#tips-tricks","title":"Tips & Tricks","text":"Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recreated in the dynamic environment of Agent Studio without needing to touch any code. Here are some practical tips as you go about trying different combinations of model pipelines:
"},{"location":"agent_studio.html#ui","title":"UI","text":"jetson-containers/data/nano_llm/presets
--load
flag at startup.Clear Cache
button in the Agent menu. RateLimiter
plugin to throttle data sources and balance resources.drop_inputs
setting that some plugins expose will always keep them up with the latest request.${DATE}
and ${TIME}
${HOMEASSISTANT.DEVICES}
Image 1: <image> Image 2: <image>
or: The three most recent replies were:\n * <text>\n * <text>\n * <text>\n
\ud83e\udd16 Have fun bot building! If you need help, reach out on the Jetson Forums or GitHub Issues.
"},{"location":"benchmarks.html","title":"Benchmarks","text":""},{"location":"benchmarks.html#large-language-models-llm","title":"Large Language Models (LLM)","text":"For running LLM benchmarks, see the MLC
container documentation.
Small language models are generally defined as having fewer than 7B parameters (Llama-7B shown for reference) For more data and info about running these models, see the SLM
tutorial and MLC
container documentation.
This measures the end-to-end pipeline performance for continuous streaming like with Live Llava. For more data and info about running these models, see the NanoVLM
tutorial.
VIT performance data from [1] [2] [3]
"},{"location":"benchmarks.html#stable-diffusion","title":"Stable Diffusion","text":""},{"location":"benchmarks.html#riva","title":"Riva","text":"For running Riva benchmarks, see ASR Performance and TTS Performance.
"},{"location":"benchmarks.html#vector-database","title":"Vector Database","text":"For running vector database benchmarks, see the NanoDB
container documentation.
Below, you'll find a collection of guides, tutorials, and articles contributed by the community showcasing the implementation of generative AI on the Jetson platform.
"},{"location":"community_articles.html#github-jetbot-voice-activated-copilot-tools-with-ros2-riva-and-nanollm-9-21-2024","title":"GitHub Jetbot Voice-Activated Copilot Tools with ROS2, RIVA, and NanoLLM (9-21-2024)","text":"Jen Hung Ho created ROS2 nodes for ASR/TTS and LLM/VLM on Jetson that can can be used to control JetBot, including customizable voice commands and the execution of advanced actions. Check it out on GitHub under Jen-Hung-Ho/ros2_jetbot_tools
and Jen-Hung-Ho/ros2_jetbot_voice
and on the forums here.
Nurgaliyev Shakhizat creates an interactive talking avatar using ASR, TTS, LLM, and Audio2Face with NVIDIA Omniverse and Unreal Engine:
"},{"location":"community_articles.html#hackster-an-eye-for-an-item-4-26-2024","title":"Hackster An Eye for an Item (4-26-2024)","text":"Allen Tao builds an indoor robot with Isaac ROS that maps your home and remembers where things are using SLAM and NanoDB!
"},{"location":"community_articles.html#hackster-escalator-people-tracker-4-2-2024","title":"Hackster Escalator People Tracker (4-2-2024)","text":"George Profenza & team install local CV solution for detecting and tracking people in large retail spaces to drive live Generative AI graphics:
"},{"location":"community_articles.html#hackster-edge-style-fashion-preview-at-the-edge-4-1-2024","title":"Hackster Edge Style: Fashion Preview at the Edge (4-1-2024)","text":"AI-powered app from Andrei Ciobanu shows virtual try-ons with customer images, enhancing retail shopping using Jetson Orin for speed and privacy:
"},{"location":"community_articles.html#hackster-cooking-meals-with-a-local-ai-assistant-on-jetson-axg-orin-4-1-2024","title":"Hackster Cooking meals with a local AI assistant on Jetson AXG Orin (4-1-2024)","text":"Dimiter Kendri builds a multimodal, multi AI agent, fully local, conversational chatbot with multi agent research capabilities via speech queries:
"},{"location":"community_articles.html#hackster-realtime-language-segment-anything-on-jetson-orin-3-4-2024","title":"Hackster Realtime Language-Segment-Anything on Jetson Orin (3-4-2024)","text":"Huy Mai enables Segment Anything (SAM) with natural language prompting using GroundingDINO for object detection:
"},{"location":"community_articles.html#github-japanese-nmt-translation-for-stable-diffusion-2-23-2024","title":"GitHub Japanese NMT Translation for Stable Diffusion (2-23-2024)","text":"Toshihiko Aoki has created a prompt generator for stable-diffusion-webui
that translates Japanese queries into English using a fine-tuned GPT-2 NMT model before feeding them into Stable Diffusion. Check out the full guide on GitHub under to-aoki/ja-tiny-sd-webui
, including the training dataset and LoRA building!
Vy Pham has created a novel denoising pipeline using a custom trained Transformer-based diffusion model and GAN upscaler for image enhancement, running on Jetson AGX Orin. It runs interactively in a Streamlit web UI for photo capturing and the processing of images and videos. Great work!
Your browser does not support the video tag."},{"location":"community_articles.html#hackster-ai-powered-application-for-the-blind-and-visually-impaired-12-13-2023","title":"Hackster AI-Powered Application for the Blind and Visually Impaired (12-13-2023)","text":"Nurgaliyev Shakhizat demonstrates a locally-hosted Blind Assistant Device running on Jetson AGX Orin 64GB Developer Kit for realtime image-to-speech translation:
\u00a0 Find more resources about this project here: [Hackster] [GitHub]
"},{"location":"community_articles.html#daves-armoury-bringing-glados-to-life-with-robotics-and-ai-2-8-2024","title":"Dave's Armoury Bringing GLaDOS to life with Robotics and AI (2-8-2024)","text":"See how DIY robotics legend Dave Niewinski from davesarmoury.com brings GLaDOS to life using Jetson AGX Orin, running LLMs onboard alongside object + depth tracking, and RIVA ASR/TTS with a custom-trained voice model for speech recognition and synthesis! Using Unitree Z1 arm with 3D printing and StereoLabs ZED2.
\u00a0 Find more resources about this project here: [Forums] [GitHub]
"},{"location":"community_articles.html#hackster-seeed-studios-local-voice-chatbot-puts-a-speech-recognizing-llama-2-llm-on-your-jetson-2-7-2024","title":"Hackster Seeed Studio's Local Voice Chatbot Puts a Speech-Recognizing LLaMa-2 LLM on Your Jetson (2-7-2024)","text":"Seeed Studio has announced the launch of the Local Voice Chatbot, an NVIDIA Riva- and LLaMa-2-based large language model (LLM) chatbot with voice recognition capabilities \u2014 running entirely locally on NVIDIA Jetson devices, including the company's own reComputer range. Follow the step-by-step guide on the Seeed Studio wiki.
"},{"location":"community_articles.html#youtube-genai-nerds-react-insider-look-at-nvidias-newest-generative-ai-2-6-2024","title":"YouTube GenAI Nerds React - Insider Look at NVIDIA's Newest Generative AI (2-6-2024)","text":"Watch this panel about the latest trends & tech in edge AI, featuring Kerry Shih from OStream, Jim Benson from JetsonHacks, and Dusty from NVIDIA.
"},{"location":"community_articles.html#nvidia-bringing-generative-ai-to-life-with-nvidia-jetson-11-7-2023","title":"NVIDIA Bringing Generative AI to Life with NVIDIA Jetson (11-7-2023)","text":"Watch this webinar about deploying LLMs, VLMs, ViTs, and vector databases onboard Jetson Orin for building next-generation applications using Generative AI:
"},{"location":"community_articles.html#jetsonhacks-jetson-ai-labs-generative-ai-playground-10-31-2023","title":"JetsonHacks Jetson AI Labs \u2013 Generative AI Playground (10-31-2023)","text":"JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model.
"},{"location":"community_articles.html#hackster-vision2audio-giving-the-blind-an-understanding-through-ai-10-15-2023","title":"Hackster Vision2Audio - Giving the blind an understanding through AI (10-15-2023)","text":"Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people:
"},{"location":"community_articles.html#nvidia-generative-ai-models-at-the-edge-10-19-2023","title":"NVIDIA Generative AI Models at the Edge (10-19-2023)","text":"Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:
\u00a0 Technical Blog - https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/
"},{"location":"community_articles.html#medium-how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-10-02-2023","title":"Medium How to set up your Jetson device for LLM inference and fine-tuning (10-02-2023)","text":"Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.
https://medium.com/@michaelyuan_88928/how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-682e36444d43
"},{"location":"community_articles.html#hackster-getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-09-16-2023","title":"Hackster Getting Started with AI on Nvidia Jetson AGX Orin Dev Kit (09-16-2023)","text":"Nurgaliyev Shakhizat demonstrates llamaspeak
on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
Nick Bild provides an insightful introduction to the Jetson Generative AI Playground:
https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598
"},{"location":"community_articles.html#jetsonhacks-use-these-jetson-docker-containers-tutorial-09-04-2023","title":"JetsonHacks Use These! Jetson Docker Containers Tutorial (09-04-2023)","text":"JetsonHacks has a in-depth tutorial on how to use jetson-containers
and even show text-generation-webui
and stable-diffusion-webui
containers in action!
Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
"},{"location":"community_articles.html#hackster-running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-08-14-2023","title":"Hackster Running a ChatGPT-Like LLM-LLaMA2 on a Nvidia Jetson Cluster (08-14-2023)","text":"Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat:
"},{"location":"community_articles.html#jetsonhacks-speech-ai-on-nvidia-jetson-tutorial-08-07-2023","title":"JetsonHacks Speech AI on NVIDIA Jetson Tutorial (08-07-2023)","text":"JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.
"},{"location":"community_articles.html#hackster-llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-07-12-2023","title":"Hackster LLM based Multimodal AI w/ Azure Open AI & NVIDIA Jetson (07-12-2023)","text":"Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.
"},{"location":"community_articles.html#hackster-how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-06-13-2023","title":"Hackster How to Run a ChatGPT-Like LLM on NVIDIA Jetson board (06-13-2023)","text":"Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.
"},{"location":"getting-started.html","title":"Getting started","text":""},{"location":"hello_ai_world.html","title":"Hello AI World","text":"Hello AI World is an in-depth tutorial series for DNN-based inference and training of image classification, object detection, semantic segmentation, and more. It is built on the jetson-inference
library using TensorRT for optimized performance on Jetson.
It's highly recommended to familiarize yourself with the concepts of machine learning and computer vision before diving into the more advanced topics of generative AI here on the Jetson AI Lab. Many of these models will prove useful to have during your development.
HELLO AI WORLD >> https://github.com/dusty-nv/jetson-inference
"},{"location":"initial_setup_jon.html","title":"Initial Setup Guide for Jetson Orin Nano Developer Kit","text":"Note
This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.
The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.
This guide explains the complete flow from opening the box, updating the firmware if needed, flashing the latest JetPack 6.0 GA image on SD card, and the initial software setup, so that you will be ready for tutorials listed on this site and other AI projects.
"},{"location":"initial_setup_jon.html#check-your-inventory","title":"Check your inventory","text":"Following item is needed or highly desired to set up your Jetson Orin Nano Developer Kit. If you don't have them in your inventory, you want to arrange them and come back to this guide once they are available.
What not come in the box - What you need/want to prepare
"},{"location":"initial_setup_jon.html#storage","title":"Storage","text":"You need either of the following set:
What you find in the box
"},{"location":"initial_setup_jon.html#jetson-orin-nano-developer-kit","title":"Jetson Orin Nano Developer Kit","text":"The Jetson Orin Nano Developer Kit consists of Jetson Orin Nano module (enlarged SO-DIMM form factor), and the reference carrier board.
It is designed to use a microSD card as the primary storage, thus the module (that has a big black heat sink with a fan) has a microSD card slot at the bottom side of the module.
"},{"location":"initial_setup_jon.html#19v-dc-power-supply","title":"19V DC power supply","text":""},{"location":"initial_setup_jon.html#overall-flow","title":"Overall flow","text":"Jetson Orin Nano Initial Setup Flowchart (Click to expand)
flowchart\n A(start) --> B{1. Check<br>Manufactured after May 2024?<br>i.e. Jetson UEFI Firmware<br>newer than version 36.0}\n B --[YES] --> O[6. Flash JetPack 6.0 GA image on microSD card]\n B --[No] --> C[2. Flash JetPack 5.1.3 image on microSD card]\n C --> D[3. Reboot] --> E{{Firmware update during reboot}}\n E --> F[4. Run QSPI updater] --> G[5. Reboot] --> H{{Firmware update during reboot}}\n H --> O\n O --> P(7. Start developing on JetPack 6.0) \n\n style C fill:#fee\n style D fill:#DEE,stroke:#333\n style G fill:#DEE,stroke:#333\n style F stroke-width:4px\n style E stroke-width:2px,stroke-dasharray: 5 5\n style H stroke-width:2px,stroke-dasharray: 5 5\n style O fill:#fee
"},{"location":"initial_setup_jon.html#1-check-if-jetson-uefi-firmware-version-363","title":"1. Check if Jetson UEFI Firmware version > 36.3
","text":"Your Jetson Orin Nano Developer Kit may have the latest firmware (\"Jetson UEFI firmware\" on QSPI-NOR flash memory) flashed at the factory.
If not, we need to go through a set of procedures to upgrade to the latest firmware. (Luckily, we can now do this all just on Jetson, meaning we don't need to use a host Ubuntu PC any more!)
So let's first check the version of your Jetson UEFI Firmware. You can take one of the following methods.
Monitor-attached Headless\ud83d\ude01I'm feeling luckyJ14
\"button\" header of carrier board located under the Jetson module. RXD
(Pin 3) Adafruit adaptor cable Green TXD
(Pin 4) Adafruit adaptor cable White GND
(Pin 7) Adafruit adaptor cable Black For the detail, refer to Jetson Orin Nano Developer Kit Carrier Board Specification.
You could skip to 6. Flash JetPack 6.0 GA image onto your microSD card, and try your luck to see if your Jetson just boots your Jetson Orin Nano Developer Kit up to the initial software set up (OEM-config).
"},{"location":"initial_setup_jon.html#360-upgrade-the-jetson-uefi-firmware-to-36x","title":"[< 36.0] Upgrade the Jetson UEFI firmware to 36.x","text":"Attention
Select the appropriate tab below based on your firmware version you found in the above step.
If you found your Jetson Orin Nano needs its firmware updated to run JetPack 6, click \" Firmware < 36.0\" tab, and then additional step 2 to 5 will appear for you to follow.
If you know your Jetson Orin Nano has the latest firmware, stay on \" Firmware 36.x\" tab, and skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)
Firmware 36.x Firmware < 36.0Your Jetson Orin Nano has the latest firmware that is ready for JetPack 6 SD card.
Skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)
Your Jetson Orin Nano needs its firmware updated in order to make JetPack 6 SD card work.
Perform the following steps (2 to 5).
"},{"location":"initial_setup_jon.html#2-flash-jetpack-513-image-onto-your-microsd-card","title":"2. Flash JetPack 5.1.3 image onto your microSD card","text":"First we need to run JetPack 5.1.3 in order let its nvidia-l4t-bootloader
package to get its bootloader/firmware updater activated, so that the firmware update automatically runs the next time it reboots.
Download SD card image on to your PC
On your PC, download JetPack 5.1.3 image for Jetson Orin Nano Developer Kit from the official JetPack 5.1.3 page or from the below direct link button.
Warning
NVIDIA had updated the JetPack 5.1.3 image on 5/28/2024, as the old version had some issue and the following process did not work.So please download and use the latest image (the new file name is JP513-orin-nano-sd-card-image_b29.zip
).
Jetson Orin Nano Developer KitJetPack 5.1.3 image
Use Balena Etcher to flash image to SD card
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the flashed microSD card into the slot on Jetson module
Power-on
Turn on the Jetson Orin Nano Developer Kit with JetPack 5.1.3 SD card inserted by plugging in the DC power supply.
Complete the initial software setup (oem-config
)
Ensure firmware update is scheduled.
Once Jetson boots into Jetson Linux system, a background service automatically runs to schedule a firmware update (if needed) to be performed during the next boot-up process.
Once you see the following, or just wait about 5 minutes after powering on to ensure the scheduling is done, reboot.
GUI CUI$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n
Reboot
Reboot your Jetson Orin Nano Developer Kit.
You should see the following during the boot up process.
With monitor Headless (serial)Once done, you will boot into JetPack 5.1.3 (again), with underlying firmware updated to 5.0-35550185
.
Double-check your firmware version is up to date (35.5.0
= JetPack 5.1.3)
Once it reboots back into Jetson Linux system, on Jetson terminal, run the following:
sudo nvbootctrl dump-slots-info\n
You should see something like the following, with the Current version indicating 35.5.0
.
Current version: 35.5.0\nCapsule update status: 0\nCurrent bootloader slot: A\nActive bootloader slot: A\nnum_slots: 2\nslot: 0, status: normal\nslot: 1, status: normal\n
Install QSPI Updater Debian package to trigger another (final) firmware update
On Jetson terminal, run the following:
sudo apt-get install nvidia-l4t-jetson-orin-nano-qspi-updater\n
Installing the nvidia-l4t-jetson-orin-nano-qspi-updater
automatically runs its script to schedule another (final) firmware update to be performed during the next boot process, so that the firmware is ready for JetPack 6.
Reboot
Once the QSPI update is scheduled, reboot your Jetson Orin Nano Developer Kit.
Observe update
You can observe the update during the boot up process.
Power off
Once the update is done, it reboots and tries to boot, however it will get stuck UNLESS you change the SD card to JetPack 6 one.
Therefore you should just power off the developer kit by disconnecting the DC power supply.
Attention
This part may look very confusing as neither the attached monitor nor the debug UART shows any explicit message on what action to take next.
What is going on here is that the Jetson's firmware (inside the QSPI-NOR flash memory) is now updated, ready for the JetPack 6 SD card, however it is now incompatible with JetPack 5.1.3 SD card left in the Jetson module's slot, so after the reboot it gets stuck in the boot process.
So there is nothing problematic with this boot halt (or endless rebooting). We just need to power-off and insert a new SD card.
Once we know the onboard firmware is up-to-date and ready for JetPack 6, we can boot Jetson Orin Nano Developer Kit with a microSD card for JetPack 6.
Download SD card image on to your PC
On your PC, download JetPack 6.0 GA image for Jetson Orin Nano Developer Kit from the official JetPack 6.0 page or from the below direct link button.
Jetson Orin Nano Developer KitJetPack 6.0 GA image
Use Balena Etcher to flash image to SD card
Insert your microSD card into your PC's SD card slot, and use Balena Etcher to flash the SD card with the image you just downloaded.
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the JetPack 6 microSD card into the slot on Jetson module
Power-on by plugging the DC power supply
Complete the initial software setup (oem-config
)
\ud83c\udf8a Congratulations! Your Jetson Orin Nano Developer Kit is set up with JetPack 6 SD card and you are ready to develop on JetPack 6.
"},{"location":"initial_setup_jon.html#next-step","title":"Next step","text":""},{"location":"initial_setup_jon.html#nvme-ssd-installation","title":"NVMe SSD installation","text":"Take a look at this page for installing NVMe SSD and setting up Docker with it.
"},{"location":"lerobot.html","title":"HuggingFace LeRobot","text":"Let's run HuggingFace LeRobot
to train Transformer-based action diffusion policies and ACT onboard NVIDIA Jetson. These models learn to predict actions for a particular task from visual inputs and prior trajectories, typically collected during teleoperation or in simulation.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
16.5GB
for lerobot
container image>2GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
This section gives the guide on how you can work through the LeRobot official example of Getting Started with Real-World Robots (7_get_started_with_real_robot.md
) on your Jetson.
Tip
It's recommended to work on your Jetson in monitor-attached mode.
lerobot
is designed to show camera view in windows and playback TTS audio while capturing dataset, so it is more convenient to setup your Jetson with its monitor (and speakers) attached to Jetson.d
jetson-container
's location","text":"Through out the course of all the workflows of lerobot
, we will be generating a lot of data, especially for capturing dataset.
We will clone the lerobot
directory on host and mount the directory in the container to keep all the data persistant, but first make sure your jetson-containers
directory is placed on your SSD, not on your eMMC or microSD card.
If you have created the jetson-containers
directory on eMMC or microSD card (likely the case if you first set up your Jetson device without SSD first and later added SSD), then use the rsync
command to move the entire directory under SSD mount point.
rsync -aHAX /home/jetson/jetson-containers/ /ssd/jetson-containers/\n
Then run the installer again.
bash /ssd/jetson-containers/install.sh\n
"},{"location":"lerobot.html#b-create-lerobot-dir-on-host","title":"b. Create lerobot
dir on host","text":"As described above, we will setup the lerobot
directory under data
directory of jetson-containers
for monting it inside the container so that generated data persist.
cd jetson-containers\n./packages/robots/lerobot/clone_lerobot_dir_under_data.sh\n./packages/robots/lerobot/copy_overlay_files_in_data_lerobot.sh\n
"},{"location":"lerobot.html#c-pulseaudio-setup","title":"c. PulseAudio setup","text":"LeRobot's dataset capture flow (control_robot.py
) utilizes Speech Dispatcher to use espeak TTS, in order to give operators audio queues for notifying the status and signaling the next operation. It's actually very helpful.
Speech Dispatcher utilizes Pulse Audio, so rather than just sharing the /dev/snd
device when docker run
(which is good for ALSA), we need to add the following arguments.
--device /dev/snd \\\n -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native \\\n -v ${XDG_RUNTIME_DIR}/pulse:${XDG_RUNTIME_DIR}/pulse \\\n
This is already added to run.sh
of jetson-containers
, however, we need to edit /etc/pulse/default.pa
in order to allow the root user access to the socket file.
sudo vi /etc/pulse/default.pa\n
Find the section loading module-native-protomocl-unix
and add auth-anonymous=1
### Load several protocols\n.ifexists module-esound-protocol-unix.so\nload-module module-esound-protocol-unix auth-anonymous=1\n.endif\nload-module module-nativ\n
Then restart PulseAudio service to make the config take effect.
pulseaudio --kill\npulseaudio --start\n
For troubleshootings or details, please check the docs.md
of speech-dispatcher
package.
It is more convenient if the lerobot programs can always find the device of leader and follower arm with unique names.
For that, we set an udev rule so that arms always get assigned the same device name as following. This is first done on Jetson host side.
/dev/ttyACM_kochleader
: Leader arm/dev/ttyACM_kochfollower
: Follower armFirst only connect the leader arm to Jetson and record the serial ID by running the following:
ll /dev/serial/by-id/\n
The output should look like this.
lrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_BA98C8C350304A46462E3120FF121B06-if00 -> ../../ttyACM1\n
Then edit the first line of ./99-usb-serial.rules
like the following.
You can find the template of this file under ./packages/robots/lerobot
directory.
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"00000000000000000000000000000000\", SYMLINK+=\"ttyACM_kochfollower\"\n
Now disconnect the leader arm, and then only connect the follower arm to Jetson.
Repeat the same steps to record the serial to edit the second line of 99-usb-serial.rules
file.
$ ll /dev/serial/by-id/\nlrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_483F88DC50304A46462E3120FF0C081A-if00 -> ../../ttyACM0\n$ vi ./packages/robots/lerobot\n
You should have ./99-usb-serial.rules
now looking like this:
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"483F88DC50304A46462E3120FF0C081A\", SYMLINK+=\"ttyACM_kochfollower\"\n
Finally copy this under /etc/udev/rules.d/
(of host), and restart Jetson.
sudo cp ./99-usb-serial.rules /etc/udev/rules.d/\nsudo reboot\n
After reboot, check if we now have achieved the desired fixed simlinks names for the arms.
ls -l /dev/ttyACM*\n
You should get something like this:
crw-rw---- 1 root dialout 166, 0 Sep 24 17:20 /dev/ttyACM0\ncrw-rw---- 1 root dialout 166, 1 Sep 24 16:13 /dev/ttyACM1\nlrwxrwxrwx 1 root root 7 Sep 24 17:20 /dev/ttyACM_kochfollower -> ttyACM0\nlrwxrwxrwx 1 root root 7 Sep 24 16:13 /dev/ttyACM_kochleader -> ttyACM1\n
"},{"location":"lerobot.html#e-optional-csi-cameras","title":"e. (Optional) CSI cameras","text":"If you plan to use CSI cameras (not USB webcams) for data capture, you will use the new --csi2webcam
options of jetson-containers
, which exposes V4L2loopback devices that performs like USB webcams (MJPEG) for CSI cameras using Jetson's hardware JPEG encoder.
This feature require some packages to be installed.
sudo apt update && sudo apt install v4l2loopback-dkms v4l-utils\n
"},{"location":"lerobot.html#f-increse-the-swap-file-size","title":"f. Increse the swap file size","text":"You may ran out of memory when are setting up to perform ACT model training.
swapoff -a -v\nsudo rm /swfile\nsudo systemctl disable nvzramconfi\nsudo fallocate -l 8G /ssd/8GB.swap\nsudo chmod 600 /ssd/8GB.swap\nsudo mkswap /ssd/8GB.swap\nsudo echo \"/ssd/8GB.swap swap swap defaults 0 0\" >> /etc/fstab\nsudo reboot\n
"},{"location":"lerobot.html#g-starting-the-lerobot-container","title":"g. Starting the lerobot
container","text":"USB webcamsCSI cameras cd jetson-containers\n./run.sh \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
cd jetson-containers\n./run.sh \\\n --csi2webcam --csi-capture-res='1640x1232@30' --csi-output-res='640x480@30' \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
"},{"location":"lerobot.html#work-with-real-world-robots-once-in-container","title":"Work with Real-World Robots - Once in container","text":"JupyerLab tip
Inside the lerobot
container, JupyterLab server process starts.
You can access with http://localhost:8888/
(or http://<IP_ADDRESS>:8888/
from other PC on the same network).
In the notebooks
, there are some Jupyter notebooks for each segment of the official tutorial Getting Started with Real-World Robots (7_get_started_with_real_robot.md
).
Please note that some of them (like notebooks/7-2_real-robot_configure-motors.ipynb
) can be used as a real work notebook to execute python codes and scritps convniently inside the notebook along with instructions (rather than switching to console).
However, keep in mind that you are encouraged to always check the original official tutorial, and some operation like training is much better executed on console.
Bash history tip
Inside the container, on the console, you can press Up key to scroll through some of the frequently used commands pre-registered in bash history.
"},{"location":"lerobot.html#q-setup-audio","title":"q. Setup audio","text":"Check if PulseAudio is available.
pactl info\n
If you need to set the default audio output device, use set-default-sink
.
pactl list short sinks\npactl set-default-sink [SINK_NAME_OR_INDEX]\n
"},{"location":"lerobot.html#1-order-and-assemble-your-koch-v11","title":"1. Order and Assemble your Koch v1.1","text":"You can order the Koch v1.1 kits from ROBOTIS. (Note: they don't come with 3d printed parts)
TODO:
Follow the Jupyter notebook 7-2_real-robot_configure-motors.ipynb
.
You should mostly operate on the container's terminal.
Follow the official document's section.
Camera config tip
The official document demonstrates the two camera positions, one at the top (\"phone\") and the other at directly in front facing the arm (\"laptop\").
In our trials, this camera placement worked, but we needed to make the camera zoom-up to the scene so that they capture better spacial resolution.
Another thing worth experimenting is the wrist cam. More to come later.
Tip
If you plan to perfom training on a different machine, scp
the dataset directory.
scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/data/${HF_USER}/\n
scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/home/<USER>/lerobot/data/${HF_USER}/\n
"},{"location":"lerobot.html#4-train-a-policy-on-your-data","title":"4. Train a policy on your data","text":"You should operate on ther container's terminal.
Follow the official document's section.
Tip
Following commands are registered in Bash history inside the lerobot
container.
wandb login\nexport HF_USER=\npython lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/koch_test_$(date +%Y%m%d_%H%M%S) \\\n --tags tutorial \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10\n
Tip
If you perform the training on other Jetson or PC, scp
the outputs directory content back to the orinal Jetson that has the leader and follower arm attached.
scp -r outputs/train/act_koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/outputs/train/ \n
"},{"location":"lerobot.html#5-evaluate-your-policy","title":"5. Evaluate your policy","text":"You should operate on the container's terminal.
Follow the official document's section.
Tip for a. Use koch.yaml
and our record
function
Modify the command in the bash history to add -p
arugment to points to the policy checkpoint.
python lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/koch_test_01 \\\n --tags tutorial \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10 \\\n -p outputs/train/act_koch_test/checkpoints/last/pretrained_model\n
Tip for Visualize evaluation afterwards
python lerobot/scripts/visualize_dataset.py \\\n --root data \\\n --repo-id ${HF_USER}/eval_koch_test\n
If everything goes well, you should see
"},{"location":"lerobot.html#basic-walkthrough","title":"Basic Walkthrough","text":"This is from the lerobot top README.md.
"},{"location":"lerobot.html#visualize-datasets","title":"Visualize Datasets","text":"Outside of container, first launch the rerun.io visualization tool that LeRobot uses [\u2197]
pip3 install rerun-sdk\nrerun\n
Then, start the docker container to playback one of these LeRobot datasets.
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/visualize_dataset.py \\\n --repo-id lerobot/pusht \\\n --episode-index 0\n
"},{"location":"lerobot.html#evaluate-a-pretrained-diffusion-policy","title":"Evaluate a Pretrained Diffusion Policy","text":"This will download and run a pre-trained diffusion model on the PushT environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/eval.py \\\n -p lerobot/diffusion_pusht \\\n eval.n_episodes=10 \\\n eval.batch_size=10\n
"},{"location":"lerobot.html#train-your-own-act-policy","title":"Train your own ACT Policy","text":"Next, train ACT on the Aloha manipulation environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/train.py \\\n policy=act \\\n env=aloha \\\n env.task=AlohaInsertion-v0 \\\n dataset_repo_id=lerobot/aloha_sim_insertion_human \n
See Trossen Robotics for dual-arm ALOHA kits, and Robotis for the low-cost Koch v1.1 kit used in the LeRobot tutorial:
HuggingFace LeRobot - Get Started with Real-World Robots (YouTube Playlist)
"},{"location":"llama_vlm.html","title":"Llama 3.2 Vision","text":"The latest additions to Meta's family of foundation LLMs include multimodal vision/language models (VLMs) in 11B and 90B sizes with high-resolution image inputs (1120x1120) and cross-attention with base completion and instruction-tuned chat variants:
Llama-3.2-11B-Vision
Llama-3.2-11B-Vision-Instruct
Llama-3.2-90B-Vision
Llama-3.2-90B-Vision-Instruct
While quantization and optimization efforts are underway, we have started with running the unquantized 11B model in a container based on HuggingFace Transformers that has been updated with the latest support for Llama-3.2-Vision a jump start on trying out these exciting new multimodal models - thanks to Meta for continuing to release open Llama models!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
12.8GB
for llama-vision
container image>25GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Request access to the gated models here with your HuggingFace API key.
Today Llama-3.2-11B-Vision is able to be run on Jetson AGX Orin in FP16 via HuggingFace Transformers. Here's a simple code example from the model card for using it:
import time\nimport requests\nimport torch\n\nfrom PIL import Image\nfrom transformers import MllamaForConditionalGeneration, AutoProcessor\n\nmodel_id = \"meta-llama/Llama-3.2-11B-Vision\"\nmodel = MllamaForConditionalGeneration.from_pretrained(model_id, device_map=\"auto\", torch_dtype=torch.bfloat16)\nprocessor = AutoProcessor.from_pretrained(model_id)\n\nprompt = \"<|image|><|begin_of_text|>If I had to write a haiku for this one\"\nurl = \"https://llava-vl.github.io/static/images/view.jpg\"\nraw_image = Image.open(requests.get(url, stream=True).raw)\n\ninputs = processor(text=prompt, images=raw_image, return_tensors=\"pt\").to(model.device)\noutput = model.generate(**inputs, do_sample=False, max_new_tokens=32)\n
If I had to write a haiku for this one, it would be:\n\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\n
Initial testing seems that Llama-3.2-Vision has more conversational abilities than VLMs typically retain after VQA alignment. This llama_vision.py script has interactive completion and image loading to avoid re-loading the model. It can be launched from the container like this:
jetson-containers run \\\n -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n $(autotag llama-vision) \\\n python3 /opt/llama_vision.py \\\n --model \"meta-llama/Llama-3.2-11B-Vision\" \\\n --image \"/data/images/hoover.jpg\" \\\n --prompt \"I'm out in the\" \\\n --max-new-tokens 32 \\\n --interactive\n
After processing the initial image, it will ask you to submit another prompt or image:
total 4.8346s (39 tokens, 8.07 tokens/sec)\n\nEnter prompt or image path/URL:\n\n>> \n
We will update this page and container as support for the Llama-3.2-Vision architecture is added to quantization APIs like MLC and llama.cpp for GGUF, which will reduce the memory and latency.
"},{"location":"nerf.html","title":"Nerfstudio - Neural Reconstruction","text":"Nerfstudio is a tool for training and visualizing Neural Radiance Fields (NeRF) models for 3D volumetric reconstruction from a collection of images taken of a scene or environment. It's designed to be easy to use and starts generating imagery at the outset, which becomes further refined as training progresses. Neural reconstruction is being applied to edge devices and robotics like with FruitNeRF. This tutorial will guide you through the process of training high-quality NeRF models using nerfstudio on Jetson.
Special thanks to Johnny N\u00fa\u00f1ez Cano for porting the NeRF stack to Jetson! See NeRF in 2023: Theory and Practice by It-Jim for useful background info.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
17.6GB
for nerfstudio
container image>5GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use this command to automatically run, build, or pull a compatible container image for nerfstudio:
jetson-containers run $(autotag nerfstudio)\n
To mount your own directories into the container, use the -v
or --volume
flags: jetson-containers run -v /path/on/host:/path/in/container $(autotag nerfstudio)\n
Later a webserver will be reachable at http://localhost:7007
. The official docs include command-line usage and models [\u2197]
It's recommended to train a test scene first by following these steps from the getting started tutorial:
# Download some test data\nns-download-data nerfstudio --capture-name=poster\n\n# Train model\nns-train nerfacto --data /data/nerfstudio/models\n
After it loads, you can navigate your browser to http://localhost:7007
(or substitute your Jetson's IP address)
The 3D visualizations will begin rendering after the first training step, and progressively become more refined. This process typically takes an hour on Jetson AGX Orin to complete the default setting of 30,000 training steps. The model checkpoints are saved under the mounted jetson-containers/data/nerfstudio
directory, and generate the same output quality as before.
The creators behind FruitNeRF fused NeRFs with segmentation for more accurate counting of fruits in 3D. By training a semantic neural radiance field on both RGB and segmentation labels from SAM, the model is robust to occlusions, double-counting, and mistaken or undesirable fruits.
Extended sampling of the volumetric grid can then blob objects for pose estimation, occupancy mapping, and navigation. Such approaches are promising for combining the strengths of NeRFs for 3D reconstruction, segmentation VITs for open-vocabulary classification, and Jetson's onboard compute for high-definition environmental scanning and perception while in the field.
"},{"location":"nerf.html#data-preparation","title":"Data Preparation","text":"We'll use the FruitNeRF datasets as an example of training NeRFs on custom data with nerfstudio. To train a NeRF model on a different scene, you first need to capture a set of images and corresponding camera poses (which can be estimated with the included photogrammetry and image registration tools like COLMAP and OpenSFM as shown here)
The images should be taken from different viewpoints of the scene you want to model, structured in the nerfstudio dataset format. For FruitNeRF, you can just download and extract pre-recorded real or synthetic data from here:
Real DataSynthetic datacd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Real.zip\nunzip FruitNeRF_Real.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Synthetic.zip\nunzip FruitNeRF_Synthetic.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
"},{"location":"nerf.html#training-a-fruitnerf-model","title":"Training a FruitNeRF Model","text":"This command will generate a NeRF on the first tree (there are multiple trees to try from each dataset). You can find the complete usage information and models on the official GitHub repo: github.com/meyerls/FruitNeRF
ns-train nerfacto \\\n --data /data/nerfstudio/FruitNeRF_Dataset/tree_01 \\\n --output-dir /data/nerfstudio/models \\\n --pipeline.datamanager.camera-res-scale-factor 0.5\n
The memory usage depends on the number of images in the dataset and their resolution, so if you're running low on memory try adjusting the scaling factors and image sampling parameters. The simulated data has a lower resolution and can run at full size.
"},{"location":"openvla.html","title":"OpenVLA - Vision/Language Action Models for Embodied Robotics","text":"Fine Tuning and Deployment Guide
The tutorials's goal is to provide optimized quantization and inference for deploying VLA models, along with reference fine-tuning workflows for adapting models for new robots, tasks, and environments. Rigorous performance and accuracy validation is applied in a self-contained sim environment with scenario generation and domain randomization (MimicGen). Future phases will include sim2real with Isaac Lab and ROS2 integration, study of related models like CrossFormer and optimizations to the neural architecture for realtime performance.
\u2705 Quantization and inference optimizations for VLA models \u2705 Accuracy validation of the original OpenVLA-7B weights \u2705 Reference fine-tuning workflow with synthetic data generation \u2705 On-device training with LoRA's on Jetson AGX Orin and full fine-tuning on A100/H100 instances \u2705 85% accuracy on an example block-stacking task with domain randomization \u2705 Sample datasets and test models for reproducing results
Thank you to OpenVLA, Open X-Embodiment, MimicGen, Robosuite and many others with related work for sharing their promising research, models, and tools for advancing physical AI and robotics.
"},{"location":"openvla.html#vla-architecture","title":"VLA Architecture","text":"OpenVLA is a vision/language action model for embodied robotics and behavioral learning built on LLM/VLMs (this base model is a Prismatic VLM using Llama-7B, DINOv2, and SigLIP). Instead of image captioning or visual question/answering, VLA models generate action tokens from camera images and natural language instructions that are used for controlling the robot. Action tokens are discrete token ID's reserved from the text tokenizer's vocabulary that map to continuous values, normalized against the range of motion of each robot. These real-valued tokens are more efficient and accurate than the model outputting numerical data as text in JSON or Pydantic formats, where each digit, decimal point, separator, and whitespace takes an additional token to generate. Other hybrid vision/language models like Florence-2 have adopted similar approaches for continuous-domain prediction using Transformers.
Each action token generated by the model represents a degree-of-freedom of the output coordinate space (i.e. xyz, rotation pose), or a component of the robot that can be controlled (like the gripper). OpenVLA-7B was trained on the Open X-Embodiment dataset for manipulation, with a 7-DoF action space consisting of (delta xyz, delta roll/pitch/yaw, gripper)
. The position and rotation are relative changes to the end-effector (EEF) pose, with an external inverse kinematics (IK) solution like cuMotion solving joint constraints specific to each robotic arm. The gripper dimension is an absolute control between 0 (open) and 1 (closed) that does not recieve further scaling/normalization.
OpenVLA reserves 256 of the least-frequently used tokens out of the Llama-7B vocabulary for action values, which gives it 8-bit resolution over the controls. It has an input image resolution of 224x224 to stacked DINOv2/SigLIP vision encoders that are projected to ~275 input tokens (plus the text prompt), and outputs 7 tokens mapped to (\u0394pos, \u0394rotation, gripper)
coordinates.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>15GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Support for OpenVLA has been added to NanoLLM on top of its streaming VLM pipeline with INT4/FP8 quantization using MLC and vision encoders in FP16 with TensorRT. First we'll test the model on BridgeData V2, one of the top weighted datasets from the Open X-Embodiment collection. The model was trained on this data and is used to confirm that the quantization and inference are working correctly during deployment. This is what the dataset looks like, courtesy of their website:
The following command starts the container, downloads the dataset and model (if needed), quantizes it on the first run, and measures the accuracy of the action values against the groundtruth from the dataset using normalized mean-squared error (NRMSE) to unbias the varying ranges each dimension of the action space can have. We extracted a 100-episode subset of the original Bridge data here on HuggingFace Hub, so you don't need to download the entire ~400GB dataset just for these tests.
INT4FP8FP16jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model openvla/openvla-7b \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp16.json\n
Quantization Accuracy Latency FPS FP16 95.3% 840 ms 1.19 FP8 95.2% 471 ms 2.12 INT4 90.1% 336 ms 2.97 These results were run on Jetson AGX Orin 64GB with JetPack 6, and we will see later with our fine-tuned model the INT4 accuracy match FP8/FP16.
Each frame, the 7D action vector predicted by the model is printed along with the groundtruth, along with the accuracy, latency, and framerate for that frame. The numbers printed after ~
are the averages of those so far, with the last value reported being the mean over the entire dataset processed.
# INT4\nstep 355 [-0.02692 0.00776 -0.00299 0.08160 0.07292 0.04791 0.99608] accuracy 0.8466 ~0.9017 time=336.2 ms fps=2.96 ~2.97\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP8\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9523 time=469.7 ms fps=2.13 ~2.12\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP16\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9531 time=842.3 ms fps=1.19 ~1.18\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n
The per-frame metrics and averages can be saved with the --save-stats
argument, and in the interests of time you can cap the amount of episodes processed with --max-episodes
. As mentioned above, the Bridge dataset used was included in the training dataset, and further below we run this again on data we generated not from the training dataset with significant variation. This tool can also load other datasets in RLDS/TFDS format from Open X-Embodiment, and HDF5 from Robomimic/MimicGen. You can also create your own agents and scripts using the exposed APIs from the coding examples below.
The code is simple for running VLA inference on camera streams using the NanoLLM library in the container:
VLA on Videofrom nano_llm import NanoLLM\nfrom nano_llm.plugins import VideoSource\n\n# load vision/language action model\nmodel = NanoLLM.from_pretrained(model, quantization='q4f16_ft')\ncamera = VideoSource(video_source, cuda_stream=0)\n\nassert(model.vla) # make sure this is a VLA\n\nwhile True:\n # returns a cudaImage, np.ndarray, or torch.Tensor on the GPU\n image = camera.capture()\n\n if image is None: # in case of timeout, keep trying\n continue\n\n # returns a np.ndarray or torch.Tensor with vla.dof elements\n # for OpenVLA, this is (\u0394x, \u0394y, \u0394z, \u0394roll, \u0394pitch, \u0394yaw, gripper)\n actions = model.vla.predict_action(\n image, \n instruction=\"pick up the nearest object\", \n action_space=\"normalized\",\n return_tensors='np',\n )\n\n # send the actions to your robot controller or IK solver\n ...\n
VLA models are also supported in Agent Studio, which includes the simulator components as well.
"},{"location":"openvla.html#online-validation","title":"Online Validation","text":"Given the challenging task domain, dynamic feedback loops, and computational demands for sim/training/inference, using VLAs for language-guided dexterous manipulation involves a significant increase in complexity over baseline usage of LLMs and VLMs. To go from predicting logits at the token level to actions consistently correct enough over an extended series of frames to form useful behaviors, it's important to cross-check outputs and measure accuracy at each stage of the training/inference workflow to be able to identify the source of potential regressions when they occur.
Unlike typical applications in supervised learning, the metrics for end-task completion and success aren't measured from static pre-recorded datasets that don't account for the temporal domain and feedback from physical interactions along with compounding errors - they require online validation, either in simulation or real-world tests.
Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab
During training the token classification accuracy is measured from the groundtruth action labels (i.e. how many action tokens were predicted exactly right), with the model optimizing to minimize this loss (as is normal for LLMs). Action accuracy in the continuous domain is also is also measured during training from the L1 error of the detokenized real-valued outputs. Continuous action accuracy trends slightly higher than token classification accuracy, as the later does not provide any reward for being closer to the desired result. In practice, these should be >95% accurate at this level for completing tasks successfully in similar environments. To achieve that high degree of accuracy, it seems intentional in the work and related research to overfit the model by training it for many epochs (upwards of 30 epochs on the same 900K episodes for OpenVLA). Transformers are known to recall specific knowledge from few training examples, and are sensitive to overfitting and forgetting previously learned information. As such, LLMs are normally only trained for a few epochs at most to preserved their zero-shot capabilities and ability to generatize to out-of-distribution inputs. During the fine-tuning part of this project, we characterize the impacts on model accuracy and task success from the number of distinct training episodes versus the number of epochs over repeated data.
The actual task success rate doesn't get measured until the inference stage, when it is either connected to a simulator or physically tested in a series of time-consuming trials under similar conditions. We integrated MimicGen directly with the OpenVLA training scripts for an endless source of unseen data, but encountered gradient instabilities after the model had received a significant number of episodes.
"},{"location":"openvla.html#simulation-with-mimicgen","title":"Simulation with MimicGen","text":"MimicGen creates randomized episodes from as few as 10 teleoperated examples by utilizing scene graph information and task/subtask metadata about which objects in the environment are targets of the current subtask, in order to interpolate the original teloperated trajectories into their new random locations and poses. This generates large amounts of unique training data to improve robustness, without needing large amounts of human effort for the robot learning new skills and behaviors.
MimicGen is built on the Robomimic and Robosuite simulators and are able to run onboard Jetson headlessly alongside the VLA, simplifying the setup for reproducibility. The RoboCasa project is built on MimicGen and being integrated with NVIDIA Omniverse and OSMO, and in future work we'd use Isaac Lab for scalability, more accurate physics, and photorealistic rendering.
MimicGen includes 12 tasks like block stacking, pick and place, assembly, and kitchen scenarios. And each type of task has variants increasing in difficulty as learning progresses, which would be interesting to compare curated approaches to the purely random sequencing that OpenVLA uses with Open X-Embodiment. In this phase of the tutorial, we focus on the block stacking task to understand the training requirements and runtime performance needed to master a new task with success rates of >75-80%, similar to the paper. This will help inform scaling to multiple behaviors and more complex scenarios that vary significantly from in-distribution examples like the MimicGen environments (as evidenced by the original OpenVLA weights scoring zero successes in them).
"},{"location":"openvla.html#data-generation","title":"Data Generation","text":"We built MimicGen containers for Jetson from a fork of the code with some patches for aarch64+igpu along with enhancements like generation of natural language labels with random variations for the relevant tasks, along with additional domain randomization for the colors/materials of objects (these environments were added as Stack_D2
, Stack_D3
, and Stack_D4
). For training OpenVLA, the images and labels are saved to disk, whereas later inference is done with online simulation to measure the task success rate. To that effect we integrated MimicGen with Agent Studio for interactively testing the models and quickly dropping in components like ASR for verbally commanding the robot.
Online Training
There's initial support for direct integration of MimicGen in this fork of OpenVLA for live simulation and validation during training and endless episodes without repeating epochs. The models experienced spiking gradients later into LoRA's, and should try again with lower learning rates or by similarly integrating MimicGen into their full fine-tuning script using FDSP for increasing the batch size on dGPU.
This command will generate the specified number of training episodes, saved in Robomimic HDF5 format. We provide the rendered datasets for these on HuggingFace Hub with 1000 and 2500 episodes. OpenVLA suggests only needing 10-150 episodes for fine-tuning and data-efficient adaptation, which perhaps performs similarly in comparable spaces, but we ultimately found insufficient for the MimicGen environments.
jetson-containers run $(autotag nano_llm) \\\n python3 -m mimicgen.generate \\\n --tasks Stack_D4 \\\n --episodes 100 \\\n --output /data/datasets/mimicgen \\\n --cameras agentview \\\n --camera-width 224 \\\n --camera-height 224\n
The HDF5 dataset will be saved to /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5
(which is in a mounted volume under your jetson-containers/data
directory outside of the container), along with a video of sample episodes that were rendered:
This video is actually of Stack_D2 to avoid subjecting everyone to flickering colors. Stack_D4 is used for training and generates blocks with random colors and positions each frame, along with language labels augmented through the random combination of various nouns, adjectives, and verbs that form the instruction (Stack the red block on the green block
, Put the little cube on top
). Stack_D3 randomizes colors/positions each frame, and instructions each episode. Stack_D2 does them all per-episode (which is typically used at runtime). Since OpenVLA uses a single frame at a time with no temporal aspect during training, applying domain randomization per-frame as opposed to per-episode is feasible provides more variance in the dataset. The block-stacking episodes typically come out to be around ~110 frames each, and take around 10-15 seconds to generate per episode on Jetson AGX Orin with per-frame domain randomization, and 5 seconds per episode without domain randomization.
The agentview camera looks onward from the front of the scene towards the robot. There are others available like sideview and eye_in_hand (wrist view) - we tried using the onboard wrist camera, but found the model would too easily veer off track and get 'lost' offscreen. It may be possible for wrist-only to work should the dataset add examples of the robot recovering and returning to a wider vantage point. Other VIT-based embodied models like Octo and CrossFormer use both cameras, and is a future experiment with VLA's based on multi-image VLM's like VILA.
"},{"location":"openvla.html#rlds-conversion","title":"RLDS Conversion","text":"OpenVLA uses datasets in RLDS format (which is based on TFDS), so we provide a converter from HDF5. This extra step can also be time-consuming for a large number of epiodes, like those used here. This is one of the reasons we desire to run MimicGen online with training and performed the initial integration directly with OpenVLA. Unless you are generating different data, you can skip this and use the MimicGen datasets that we uploaded here in RLDS format.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.datasets \\\n --dataset /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5 \\\n --dataset-type mimicgen \\\n --convert rlds \\\n --remap-keys agentview:image \\\n --output /data/datasets/mimicgen/rlds/stack_d4_ep2500\n
This will create a set of tfrecord files under the output directory that are able to be loaded by the OpenVLA training scripts.
"},{"location":"openvla.html#fine-tuning","title":"Fine Tuning","text":"A primary objective of this project is to characterize the training needed to adapt the model to different robots and tasks. Our development primarily consisted of running test LoRA's onboard Jetson AGX Orin 64GB and debugging issues locally, and when the results were encouraging to perform a full fine-tuning with FDSP on multiple A100/H100's from spot instance providers like Brev.dev, Vast.ai, and RunPod. Full fine-tuning on 2x Jetson AGX Orin's was attempted with FDSP, but ran out of memory with all settings that we tried. We provide the test models we trained on HuggingFace for others to try in the inference + sim setup below. Below are the training GPU configurations that were used, along with their batch sizes that maximized memory usage:
Batch Size FPS $/hr Jetson AGX Orin 64GB 8 (LoRA) 1.57 - 2x A100 SMX4 80GB 48 13.07 ~$1.50 8x H100 NVL 94GB 256 92.4 ~$25The rental fees are ballpark averages over the spot instances available with these GPUs at the time of writing, and becomes quite reasonable when used alongside a Jetson repurposed for training daily test LoRA's on a reduced amount of data. Training until convergence on Jetson and 2xA100 took roughly 24-36 hours depending on the amount of data and number of epochs. We kept to <5 epochs for the full fine-tunes in an attempt to prevent the afformentioned overfitting, instead opting to increase the number of episodes.
Below we provide the steps to run the OpenVLA LoRA training on Jetson, and for the dGPU systems refer to Fully Fine-Tuning OpenVLA. Typically you will launch a spot instance with your provider of choice in a CUDA or PyTorch container, then install the OpenVLA repo and its dependencies with pip, and download your dataset to the system before launching the command (or create a bundled container with it all included to save time). Here's the WandB Dashboard from the full fine-tuning runs that you can inspect, comparing a fewer number of episodes for more epochs, versus a larger number of episodes trained for fewer epochs:
"},{"location":"openvla.html#on-device-lora","title":"On-Device LoRA","text":"The OpenVLA repo provides working training scripts for LoRA/qLoRA and multi-node multi-GPU full fine-tunes using PyTorch FDSP. It was not difficult to go in a make changes and enhancements, some of which we have done for our purposes of on-device training in this fork. Overall we found the process to be more similar than not to training other vision DNNs, just with larger datasets and rigorous validation required of the data pipeline that all the coordinate spaces and transformations matched up at every step of the sim\u2192training\u2192inference workflow.
We built an OpenVLA container for JetPack that runs the LoRA training, which you can find the specific documentation about from the OpenVLA readme (it's also recommended to read their paper which includes many insights into the training process).
jetson-containers run $(autotag openvla) \\\n torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py \\\n --vla_path openvla/openvla-7b \\\n --data_root_dir /data/datasets/mimicgen/rlds \\\n --dataset_name stack_d4_ep2500 \\\n --run_root_dir /data/models/openvla \\\n --lora_rank 32 \\\n --batch_size 8 \\\n --grad_accumulation_steps 2 \\\n --learning_rate 5e-4 \\\n --image_aug False \\\n --save_steps 250 \\\n --epochs 5\n
This will start a TensorBoard server on port 6006 to monitor the training progress. Typically you would set the script running for more epochs than you intend to actually run, so that you can instead stop when the model converges (typically occurring with a loss below 0.5 and token accuracy >95%). This script was adapted so that if you interrupt training by pressing Ctrl+D
from the terminal, it will gracefully stop early and still merge the LoRA weights before exiting. If training is terminated otherwise, we added a merge.py
script that you should run afterwards get the model ready for inference.
Now that we have trained our test model (or you can download one from here), let's re-validate it again like we did above on the original OpenVLA model, but this time on unseen data from MimicGen with a different random seed (dusty-nv/mimicgen-stack_d4-ep100
). These commands will download and run the fully fine-tuned checkpoint (on 2500 episodes for 4 epochs) that we released to (dusty-nv/openvla-7b-mimicgen
. If you trained your own model, you can substitute the local path to the HF checkpoint.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp16.json\n
The results from this are collated in the next section along with the end-task success rates. Time to see it in action!
"},{"location":"openvla.html#inference-simulation","title":"Inference + Simulation","text":"To measure how well our model actually performs at completing the task, we spin up a MimicGen environment in Agent Studio that's connected to the VLA model. It counts the number of successful episodes by checking the reward issued by the sim, which is not used by the model but signals when the task was completed. We use a horizon of 200 frames for evaluation, after which it is deemed to be a failure.
INT4FP8FP16jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-INT4\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP8\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP16\n
To start the benchmarking, connect the output of the AutoPrompt
node to the OpenVLA
node. On its own it will run forever - we did it for 100 episodes each, which can take several hours since the sim operates in lock step with the model (future experiments will train on actions accumulated from multiple timesteps and also reduce the model size to improve performance).
This is using the model fine-tuned on 2500 episodes for 4 epochs, and although the task may have been simple, is evidence of achieving the sought-after success rates of ~85%. Quantization has a negligible ~1% impact while scaling performance almost linearly. The average number of frames is how long it took the robot to complete the task, which efficiency is another important end-metric to evalulate models by (consider the source teleop episodes were ~110 frames long, and we realized after that these averages include failed episodes during evaluation). The training dataset dusty-nv/bridge_orig_ep2500
was used to measure the action Train Accuracy, while the previously unused and distinct dusty-nv/bridge_orig_ep100
was used for Validation Accuracy.
Upon measuring the success rates of the other fine-tuned models that were trained on fewer episodes for more epochs, we can see the impact of increasing the size of the dataset:
Episodes Epochs Frames Task Success Avg Frames 500 10 550K 23% 186 1000 6 660K 48% 165 2500 4 1.1M 84% 138That isn't to say that the 2500-episode model still isn't overfit - it is after learning to always stack the smallr block and essentially ignore the instructions. More task diversity in the training is required, which we can do now that we have the workflow. And we want to branch out into real-world domains instead of test environments in simulation to prove model viability. The amount of variance and data required to achieve generalist models in the challenging problem space of manipulation does raise interesting questions about purely random task learning versus more guided and curated approaches that ramp up in complexity as training progresses.
"},{"location":"openvla.html#future-research","title":"Future Research","text":"Embodied agents are an exciting emerging area at the forefront of robotics and physical AI, with many promising avenues to investigate further. Follow-up publications to OpenVLA include Embodied-CoT and CrossFormer, along with others sure to be in development. In addition to proceeding to physical testing, these areas of interest we'll experiment with as research progresses:
The Jetson AI Lab Research Group is a global collective for advancing open-source Edge ML, open to anyone to join and collaborate with others from the community and leverage each other's work. Our goal is using advanced AI for good in real-world applications in accessible and responsible ways. By coordinating together as a group, we can keep up with the rapidly evolving pace of AI and more quickly arrive at deploying intelligent multimodal agents and autonomous robots into the field.
There are virtual meetings that anyone is welcome to join, offline discussion on the Jetson Projects forum, and guidelines for upstreaming open-source contributions.
Next Meeting - 10/15
The next team meeting is on Tuesday, October 15th at 9am PST - see the invite below or click here to join the meeting in progress.
"},{"location":"research.html#topics-of-interest","title":"Topics of Interest","text":"These are some initial research topics for us to discuss and investigate. This list will vary over time as experiments evolve and the SOTA progresses:
\u2022 Controller LLMs for dynamic pipeline code generation \u2022 Fine-tuning LLM/VLM onboard Jetson AGX Orin 64GB \u2022 HomeAssistant.io integration for smart home [1] [2] \u2022 Continuous multi-image VLM streaming and change detection \u2022 Recurrent LLM architectures (Mamba, RKVW, ect) [1] \u2022 Lightweight low-memory streaming ASR/TTS models \u2022 Diffusion models for image processing and enhancement \u2022 Time Series Forecasting with Transformers [1] [2] \u2022 Guidance, grammars, and guardrails for constrained output \u2022 Inline LLM function calling / plugins from API definitions \u2022 ML DevOps, edge deployment, and orchestration \u2022 Robotics, IoT, and cyberphysical systems integrationNew topics can be raised to the group either during the meetings or on the forums (people are welcome to work on whatever they want of course)
"},{"location":"research.html#contribution-guidelines","title":"Contribution Guidelines","text":"When experiments are successful, ideally the results will be packaged in such a way that they are easily reusable for others to integrate into their own projects:
Ongoing technical discussions are encouraged to occur on the forums or GitHub Issues, with status updates on projects given during the meetings.
"},{"location":"research.html#meeting-schedule","title":"Meeting Schedule","text":"We'll aim to meet monthly or bi-weekly as a team in virtual meetings that anyone is welcome to join and speak during. We'll discuss the latest updates and experiments that we want to explore. Please remain courteous to others during the calls. We'll stick around after for anyone who has questions or didn't get the chance to be heard.
Tuesday October 15th at 9am PST (10/15/24)
264 770 145 196
Uwbdgj
Jetson AI Lab Research Group.ics
The agenda will be listed here beforehand - post to the forum to add agenda items. The meetings will be recorded so anyone unable to attend live can watch them after.
"},{"location":"research.html#past-meetings","title":"Past Meetings","text":"October 1, 2024Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
NousResearch/Hermes-2-Pro-Llama-3-8B
)Topics Covered:
Topics Covered:
jetson-container
UpdatesProject Kickoffs:
Below are shown some of the sustaining members of the group who have been working on generative AI in edge computing:
Dustin Franklin, NVIDIA Principal Engineer | Pittsburgh, PA (jetson-inference, jetson-containers) Nurgaliyev Shakhizat Institute of Smart Systems and AI | Kazakhstan (Assistive Devices, Vision2Audio, HPC) Kris Kersey, Kersey Fabrications Embedded Software Engineer | Atlanta, GA (The OASIS Project, AR/VR, 3D Fabrication) Johnny N\u00fa\u00f1ez Cano PhD Researcher in CV/AI | Barcelona, Spain (Recurrent LLMs, Pose & Behavior Analysis) Doruk S\u00f6nmez, ConnectTech Intelligent Video Analytics Engineer | Turkey (NVIDIA DLI Certified Instructor, IVA, VLM) Akash James, Spark Cognition AI Architect, UC Berkeley Researcher | Oakland (NVIDIA AI Ambassador, Personal Assistants) Mieszko Syty, MS/1 Design AI/ML Engineer | Warsaw, Poland (LLM, Home Assistants, ML DevOps) Jim Benson, JetsonHacks DIY Extraordinaire | Los Angeles, CA (AI in Education, RACECAR/J) Chitoku Yato, NVIDIA Jetson AI DevTech | Santa Clara, CA (JetBot, JetRacer, MinDisk, Containers) Dana Sheahen, NVIDIA DLI Curriculum Developer | Santa Clara, CA (AI in Education, Jetson AI Fundamentals) Sammy Ochoa, NVIDIA Jetson AI DevTech | Austin, TX (Metropolis Microservices) John Welsh, NVIDIA (NanoOWL, NanoSAM, JetBot, JetRacer, torch2trt, trt_pose, Knowledge Distillation) Dave Niewinski Dave's Armoury | Waterloo, Ontario (GLaDOS, Fetch, Offroad La-Z-Boy, KUKA Bot) Gary Hilgemann, REBOTNIX CEO & AI Roboticist | L\u00fcnen, Germany (GUSTAV, SPIKE, VisionTools, GenAI) Elaine Wu, Seeed Studio AI & Robotics Partnerships | Shenzhen, China (reComputer, YOLOv8, LocalJARVIS, Voice Bot) Patty Delafuente, NVIDIA Data Scientist & UMBC PhD Student | MD (AI in Education, DLI Robotics Teaching Kit) Song Han, MIT HAN Lab NVIDIA Research | Cambridge, MA (Efficient Large Models, AWQ, VILA) Bryan Hughes, Mimzy AI Founder, Entrepreneur | SF Bay Area (Multimodal Assistants, AI at the Edge) Tianqi Chen, CMU Catalyst OctoML, CTO | Seattle, WA (MLC, Apache TVM, XGBoost) Michael Gr\u00fcner, RidgeRun Team Lead / Architect | Costa Rica (Embedded Vision & AI, Multimedia) Jesse Flot, CMU Robotics Academy Co-Director | Pittsburgh, PA (Applied AI & Robotics, Autonomy Foundations) Paul DeCarlo, Microsoft Professor | University of Houston (Azure IoT, Video Analytics, Microsoft JARVIS) Mike Hansen, Nabu Casa Voice Engineering Lead | Urbandale, Iowa (Home Assistant, Piper TTS, Wyoming) Lakshantha Dissanayake, Ultralytics Embedded CV Engineer | Vancouver, BC (YOLOv8, TensorRT, DeepStream) Kerry Shih, GenAI Nerds Founder, CEO | Los Angeles, CA (Gapi) Ziad Youssfi, CMU ECE Professor | Pittsburgh, PA (ML in Robotics & Embedded Systems) Walter Lucetti, Stereolabs Robotics & Vision Engineer | Paris, France (MyzharBot, ROS2, GStreamer) Raffaello Bonghi, NVIDIA AI & Robotics Engineer | Manchester, UK (Nanosaur, Panther, jetson-stats) Alvaro Costa, ANS Group AI & Robotics Lead | Epsom, UK (TeknTrash, StipraPOD) David Pearson, ConnectTech Embedded Systems Engineer | Ontario, CA (Edge AI Systems, Vision/Language Models) Jason Seawall, Numurus CEO | Seattle, WA (NEPI, Edge AI & Automation) Martin Cerven AI Researcher | Germany (Autonomous Robotics, Voicecraft) Romil Shah, Amazon GenAI IIoT @ AWS | San Jose, CA (aws-samples/genai-at-edge
) Kabilan Kb, Roboticist NVIDIA DLI Ambassador | Tamil Nadu, IN (ROS2 Tutorials, Autonomous Wheelchair)"},{"location":"ros.html","title":"ROS2 Nodes for Generative AI","text":"The ros2_nanollm
package provides ROS2 nodes for running optimized LLM's and VLM's locally inside a container. These are built on NanoLLM and ROS2 Humble for deploying generative AI models onboard your robot with Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm:humble
container image>10GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Recommended
Before you start, please review NanoVLM and Live LLaVa demos. For primary documentation, view ROS2 NanoLLM.
Ensure you have a camera device connected
ls /dev/video*\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nano_llm:humble) \\\n ros2 launch ros2_nanollm camera_input_example.launch.py\n
This command will start the launch file of the container.
By default this will load the Efficient-Large-Model/Llama-3-VILA1.5-8B
VLM and publish the image captions and overlay to topics that can be subscribed to by your other nodes, or visualized with RViz or Foxglove. Refer to the ros2_nanollm
repo for documentation on the input/output topics that are exposed.
To build your own ROS2 node using LLM or VLM, first create a ROS 2 workspace and package in a directory mounted to the container (following the ROS 2 Humble Documentation). Your src folder should then look like this:
\u2514\u2500\u2500 src \n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 launch \n \u2514\u2500\u2500 camera_input.launch.py\n \u251c\u2500\u2500 resource\n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 your-package-name\n \u2514\u2500\u2500 __init__.py \n \u2514\u2500\u2500 your-node-name_py.py\n \u251c\u2500\u2500 test\n \u2514\u2500\u2500 test_copyright.py\n \u2514\u2500\u2500 test_flake8.py\n \u2514\u2500\u2500 test_pep257.py\n \u251c\u2500\u2500 package.xml\n \u251c\u2500\u2500 setup.cfg\n \u251c\u2500\u2500 setup.py\n \u2514\u2500\u2500 README.md\n
We will create the launch folder, as well as the camera_input.launch.py and your-node-name_py.py files in later steps.
"},{"location":"ros.html#editing-the-setup","title":"Editing the Setup","text":"Let\u2019s begin by editing the setup.py
file. At the top of the file, add
from glob import glob \n
In the setup method, find the data_files=[]
line, and make sure it looks like this:
data_files=[\n ('share/ament_index/resource_index/packages',\n ['resource/' + package_name]),\n ('share/' + package_name, ['package.xml']),\n ('share/' + package_name, glob('launch/*.launch.py')),\n ],\n
Edit the maintainer line with your name. Edit the maintainer email to your email. Edit the description line to describe your package.
maintainer='kshaltiel', \nmaintainter_email='kshaltiel@nvidia.com', \ndescription='YOUR DESCRIPTION', \n
Find the console_scripts
line in the entry_points method. Edit the inside to be:
'your-node-name_py = your-package-name.your-node-name_py:main'\n
For example:
entry_points={\n 'console_scripts': [\n 'nano_llm_py = ros2_nanollm.nano_llm_py:main'\n ],\n },\n
All done for this file!"},{"location":"ros.html#creating-the-node","title":"Creating the Node","text":"Inside your package, under the folder that shares your package's name and contains the __init__.py
file, create a file named after your node. For NanoLLM, this file would be called nano_llm_py.py
.
Paste the following code into the empty file:
import rclpy \nfrom std_msgs.msg import String\nfrom sensor_msgs.msg import Image\nfrom cv_bridge import CvBridge\nfrom PIL import Image as im\nfrom MODEL_NAME import NECESSARY_MODULES\n\nclass Your_Model_Subscriber(Node):\n\n def __init__(self):\n super().__init__('your_model_subscriber')\n\n #EDIT PARAMETERS HERE \n self.declare_parameter('param1', \"param1_value\") \n self.declare_parameter('param2', \"param2_value\")\n\n # Subscriber for input query\n self.query_subscription = self.create_subscription(\n String,\n 'input_query',\n self.query_listener_callback,\n 10)\n self.query_subscription # prevent unused variable warning\n\n # Subscriber for input image\n self.image_subscription = self.create_subscription(\n Image,\n 'input_image',\n self.image_listener_callback,\n 10)\n self.image_subscription # prevent unused variable warning\n\n # To convert ROS image message to OpenCV image\n self.cv_br = CvBridge() \n\n #LOAD THE MODEL\n self.model = INSERT_MODEL.from_pretrained(\"PATH-TO-MODEL\")\n\n #chatHistory var \n self.chat_history = ChatHistory(self.model)\n\n ## PUBLISHER\n self.output_publisher = self.create_publisher(String, 'output', 10)\n self.query = \"Describe the image.\"\n\n def query_listener_callback(self, msg):\n self.query = msg.data\n\n def image_listener_callback(self, data): \n input_query = self.query\n\n # call model with input_query and input_image \n cv_img = self.cv_br.imgmsg_to_cv2(data, 'rgb8')\n PIL_img = im.fromarray(cv_img)\n\n # Parsing input text prompt\n prompt = input_query.strip(\"][()\")\n text = prompt.split(',')\n self.get_logger().info('Your query: %s' % text) #prints the query\n\n #chat history \n self.chat_history.append('user', image=PIL_img)\n self.chat_history.append('user', prompt, use_cache=True)\n embedding, _ = self.chat_history.embed_chat()\n\n #GENERATE OUTPUT\n output = self.model.generate(\n inputs=embedding,\n kv_cache=self.chat_history.kv_cache,\n min_new_tokens = 10,\n streaming = False, \n do_sample = True,\n )\n\n output_msg = String()\n output_msg.data = output\n self.output_publisher.publish(output_msg)\n self.get_logger().info(f\"Published output: {output}\")\n\ndef main(args=None):\n rclpy.init(args=args)\n\n your_model_subscriber = Your_Model_Subscriber()\n\n rclpy.spin(your_model_subscriber)\n\n # Destroy the node explicitly\n # (optional - otherwise it will be done automatically\n # when the garbage collector destroys the node object)\n nano_llm_subscriber.destroy_node()\n rclpy.shutdown()\n\nif __name__ == '__main__':\n main()\n
Edit the import statement at the top of the file to import the necessary modules from the model.
Next, edit the class name and name inside the __init__()
function to reflect the model that will be used.
Find the comment that reads #EDIT PARAMETERS HERE
. Declare all parameters except for the model name following the format in the file. Under the #LOAD THE MODEL
comment, include the path to the model.
Lastly, edit the generate method under the GENERATE OUTPUT
comment to include any additional parameters.
All done for this file!
"},{"location":"ros.html#creating-the-launch-file","title":"Creating the Launch File","text":"Inside your package, create the launch folder. Create your launch file inside of it.
mkdir launch\ncd launch \ntouch camera_input.launch.py\n
You can edit this file externally, and it will update within the container. Paste the following code into the empty file.
from launch import LaunchDescription\nfrom launch_ros.actions import Node\nfrom launch.substitutions import LaunchConfiguration\nfrom launch.actions import DeclareLaunchArgument\n\ndef generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'param1',\n default_value='param1_default',\n description='Description of param1'),\n DeclareLaunchArgument(\n 'param2',\n default_value='param2_default',\n description='Description of param2'),\n ]\n\n\n #Your model parameters \n param1 = LaunchConfiguration('param1')\n param2 = LaunchConfiguration('param2')\n\n #camera node for camera input\n cam2image_node = Node(\n package='image_tools',\n executable='cam2image',\n remappings=[('image', 'input_image')],\n )\n\n #model node\n model_node = Node(\n package='your-package-name', #make sure your package is named this\n executable='your-node-name_py', \n parameters=[{\n 'param1': param1, \n 'param2': param2,\n }]\n )\n\n final_launch_description = launch_args + [cam2image_node] + [model_node]\n\n return LaunchDescription(final_launch_description)\n
Find the required parameters for your model. You can view this by looking at the Model API for your specific model and taking note to how the model is called. For example, NanoLLM retrieves models through the following:
model = NanoLLM.from_pretrained(\n \"meta-llama/Llama-3-8b-hf\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n
The parameters for NanoLLM would be the model name, api, and quantization.
In the generate_launch_description
function, edit the DeclareLaunchArgument
to accomodate for all parameters except the model name. For NanoLLM, this would look like:
def generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'api',\n default_value='mlc',\n description='The model backend to use'),\n DeclareLaunchArgument(\n 'quantization',\n default_value='q4f16_ft',\n description='The quantization method to use'),\n ]\n
Then edit the lines under #Your model Parameters
to match the parameters of your model, again excluding the model name. Lastly, fill in the code under the #model node
comment with your package name, the name of your node file, and all of your parameters, this time including the model.
All done for this file!
"},{"location":"tips_ram-optimization.html","title":"RAM Optimization","text":"Running a LLM requires a huge RAM space.
Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models.
Here we share a couple of ways to optimize the system RAM usage.
"},{"location":"tips_ram-optimization.html#disabling-the-desktop-gui","title":"Disabling the Desktop GUI","text":"If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).
You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:
$ sudo init 3 # stop the desktop\n# log your user back into the console (Ctrl+Alt+F1, F2, ect)\n$ sudo init 5 # restart the desktop\n
If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:
To disable desktop on boot
sudo systemctl set-default multi-user.target\n
To enable desktop on boot
sudo systemctl set-default graphical.target\n
sudo systemctl disable nvargus-daemon.service\n
"},{"location":"tips_ram-optimization.html#mounting-swap","title":"Mounting Swap","text":"If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:
If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.
sudo systemctl disable nvzramconfig\nsudo fallocate -l 16G /ssd/16GB.swap\nsudo mkswap /ssd/16GB.swap\nsudo swapon /ssd/16GB.swap\n
Then add the following line to the end of /etc/fstab to make the change persistent:
/ssd/16GB.swap none swap sw 0 0\n
"},{"location":"tips_ssd-docker.html","title":"Tips - SSD + Docker","text":"Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using jetson-containers
, you want to make sure you have a huge storage space for all the containers and the models you will download.
We are going to show how you can install SSD on your Jetson, and set it up for Docker.
"},{"location":"tips_ssd-docker.html#ssd","title":"SSD","text":""},{"location":"tips_ssd-docker.html#physical-installation","title":"Physical installation","text":"Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:
lspci\n
The output should look like the following:
0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)\n
Run lsblk
to find the device name.
lsblk\n
The output should look like the following:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT\nloop0 7:0 0 16M 1 loop \nmmcblk1 179:0 0 59.5G 0 disk \n\u251c\u2500mmcblk1p1 179:1 0 58G 0 part /\n\u251c\u2500mmcblk1p2 179:2 0 128M 0 part \n\u251c\u2500mmcblk1p3 179:3 0 768K 0 part \n\u251c\u2500mmcblk1p4 179:4 0 31.6M 0 part \n\u251c\u2500mmcblk1p5 179:5 0 128M 0 part \n\u251c\u2500mmcblk1p6 179:6 0 768K 0 part \n\u251c\u2500mmcblk1p7 179:7 0 31.6M 0 part \n\u251c\u2500mmcblk1p8 179:8 0 80M 0 part \n\u251c\u2500mmcblk1p9 179:9 0 512K 0 part \n\u251c\u2500mmcblk1p10 179:10 0 64M 0 part \n\u251c\u2500mmcblk1p11 179:11 0 80M 0 part \n\u251c\u2500mmcblk1p12 179:12 0 512K 0 part \n\u251c\u2500mmcblk1p13 179:13 0 64M 0 part \n\u2514\u2500mmcblk1p14 179:14 0 879.5M 0 part \nzram0 251:0 0 1.8G 0 disk [SWAP]\nzram1 251:1 0 1.8G 0 disk [SWAP]\nzram2 251:2 0 1.8G 0 disk [SWAP]\nzram3 251:3 0 1.8G 0 disk [SWAP]\nnvme0n1 259:0 0 238.5G 0 disk \n
Identify the device corresponding to your SSD. In this case, it is nvme0n1
.
Format the SSD, create a mount point, and mount it to the filesystem.
sudo mkfs.ext4 /dev/nvme0n1\n
You can choose any name for the mount point directory. We use /ssd
here, but in jetson-containers
' setup.md documentation, /mnt
is used.
sudo mkdir /ssd\n
sudo mount /dev/nvme0n1 /ssd\n
In order to ensure that the mount persists after boot, add an entry to the fstab
file:
First, identify the UUID for your SSD:
lsblk -f\n
Then, add a new entry to the fstab
file:
sudo vi /etc/fstab\n
Insert the following line, replacing the UUID with the value found from lsblk -f
:
UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2\n
Finally, change the ownership of the /ssd
directory.
sudo chown ${USER}:${USER} /ssd\n
Install nvidia-container
package.
Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including nvidia-containers
) and Docker are already pre-installed, so this step can be skipped.
sudo apt update\nsudo apt install -y nvidia-container\n
JetPack 6.0 DP users
If you flash Jetson Linux (L4T) R36.2 (JetPack 6.0 DP) on your Jetson using SDK Manager, and install nvidia-container
using apt
, on JetPack 6.0 it no longer automatically installs Docker.
Therefore, you need to run the following to manually install Docker and set it up.
sudo apt update\nsudo apt install -y nvidia-container curl\ncurl https://get.docker.com | sh && sudo systemctl --now enable docker\nsudo nvidia-ctk runtime configure --runtime=docker\n
Restart the Docker service and add your user to the docker
group, so that you don't need to use the command with sudo
.
sudo systemctl restart docker\nsudo usermod -aG docker $USER\nnewgrp docker\n
Add default runtime in /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert the \"default-runtime\": \"nvidia\"
line as following:
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\"\n}\n
Restart Docker
sudo systemctl daemon-reload && sudo systemctl restart docker\n
Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.
Stop the Docker service.
sudo systemctl stop docker\n
Move the existing Docker folder
sudo du -csh /var/lib/docker/ && \\\n sudo mkdir /ssd/docker && \\\n sudo rsync -axPS /var/lib/docker/ /ssd/docker/ && \\\n sudo du -csh /ssd/docker/ \n
Edit /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert \"data-root\"
line like the following.
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\",\n \"data-root\": \"/ssd/docker\"\n}\n
Rename the old Docker data directory
sudo mv /var/lib/docker /var/lib/docker.old\n
Restart the docker daemon
sudo systemctl daemon-reload && \\\n sudo systemctl restart docker && \\\n sudo journalctl -u docker\n
[Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.
watch -n1 df \n
[Terminal 2] Next, open a new terminal and start Docker pull.
docker pull nvcr.io/nvidia/l4t-base:r35.2.1\n
[Terminal 1] Observe that the disk usage on /ssd
goes up as the container image is downloaded and extracted.
~$ docker image ls\nREPOSITORY TAG IMAGE ID CREATED SIZE\nnvcr.io/nvidia/l4t-base r35.2.1 dc07eb476a1d 7 months ago 713MB\n
Reboot your Jetson, and verify that you observe the following:
~$ sudo blkid | grep nvme\n/dev/nvme0n1: UUID=\"9fc06de1-7cf3-43e2-928a-53a9c03fc5d8\" TYPE=\"ext4\"\n\n~$ df -h\nFilesystem Size Used Avail Use% Mounted on\n/dev/mmcblk1p1 116G 18G 94G 16% /\nnone 3.5G 0 3.5G 0% /dev\ntmpfs 3.6G 108K 3.6G 1% /dev/shm\ntmpfs 734M 35M 699M 5% /run\ntmpfs 5.0M 4.0K 5.0M 1% /run/lock\ntmpfs 3.6G 0 3.6G 0% /sys/fs/cgroup\ntmpfs 734M 88K 734M 1% /run/user/1000\n/dev/nvme0n1 458G 824M 434G 1% /ssd\n\n~$ docker info | grep Root\n Docker Root Dir: /ssd/docker\n\n~$ sudo ls -l /ssd/docker/\ntotal 44\ndrwx--x--x 4 root root 4096 Mar 22 11:44 buildkit\ndrwx--x--- 2 root root 4096 Mar 22 11:44 containers\ndrwx------ 3 root root 4096 Mar 22 11:44 image\ndrwxr-x--- 3 root root 4096 Mar 22 11:44 network\ndrwx--x--- 13 root root 4096 Mar 22 16:20 overlay2\ndrwx------ 4 root root 4096 Mar 22 11:44 plugins\ndrwx------ 2 root root 4096 Mar 22 16:19 runtimes\ndrwx------ 2 root root 4096 Mar 22 11:44 swarm\ndrwx------ 2 root root 4096 Mar 22 16:20 tmp\ndrwx------ 2 root root 4096 Mar 22 11:44 trust\ndrwx-----x 2 root root 4096 Mar 22 16:19 volumes\n\n~$ sudo du -chs /ssd/docker/\n752M /ssd/docker/\n752M total\n\n~$ docker info | grep -e \"Runtime\" -e \"Root\"\n Runtimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2\n Default Runtime: nvidia\n Docker Root Dir: /ssd/docker\n
Your Jetson is now set up with the SSD!
"},{"location":"try.html","title":"Try","text":"Jump to NVIDIA Jetson Store.
"},{"location":"tutorial-intro.html","title":"Tutorial - Introduction","text":""},{"location":"tutorial-intro.html#overview","title":"Overview","text":"Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.
"},{"location":"tutorial-intro.html#text-llm","title":"Text (LLM)","text":"text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui Ollama Get started effortlessly deploying GGUF models for chat and web UI llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! NanoLLM Optimized inferencing library for LLMs, multimodal agents, and speech. Small LLM (SLM) Deploy Small Language Models (SLM) with reduced memory usage and higher throughput. API Examples Learn how to write Python code for doing LLM inference using popular APIs."},{"location":"tutorial-intro.html#text-vision-vlm","title":"Text + Vision (VLM)","text":"Give your locally running LLM an access to vision!
LLaVA Different ways to run LLaVa vision/language model on Jetson for visual understanding. Live LLaVA Run multimodal models interactively on live video streams over a repeating set of prompts. NanoVLM Use mini vision/language models and the optimized multimodal pipeline for live streaming. Llama 3.2 Vision Run Meta's multimodal Llama-3.2-11B-Vision model on Orin with HuggingFace Transformers."},{"location":"tutorial-intro.html#vision-transformers","title":"Vision Transformers","text":"EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson SAM Meta's SAM, Segment Anything model TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation"},{"location":"tutorial-intro.html#image-generation","title":"Image Generation","text":"Flux + ComfyUI Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. Stable Diffusion Run AUTOMATIC1111'sstable-diffusion-webui
to generate images from prompts SDXL Ensemble pipeline consisting of a base model and refiner with enhanced image generation. nerfstudio Experience neural reconstruction and rendering with nerfstudio and onboard training."},{"location":"tutorial-intro.html#audio","title":"Audio","text":"Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR) AudioCraft Meta's AudioCraft, to produce high-quality audio and music Voicecraft Interactive speech editing and zero shot TTS"},{"location":"tutorial-intro.html#rag-vector-database","title":"RAG & Vector Database","text":"NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data LlamaIndex Realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents LlamaIndex Reference application for building your own local AI assistants using LLM, RAG, and VectorDB"},{"location":"tutorial-intro.html#api-integrations","title":"API Integrations","text":"ROS2 Nodes Optimized LLM and VLM provided as ROS2 nodes for robotics Holoscan SDK Use the Holoscan-SDK to run high-throughput, low-latency edge AI pipelines Jetson Platform Services Quickly build microservice driven vision applications with Jetson Platform Services Gapi Workflows Integrating generative AI into real world environments Gapi Micro Services Wrapping models and code to participate in systems Ultralytics YOLOv8 Run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT."},{"location":"tutorial-intro.html#about-nvidia-jetson","title":"About NVIDIA Jetson","text":"Note
We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores RAM(CPU+GPU) 64GB 32GB 8GB Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)"},{"location":"tutorial_api-examples.html","title":"Tutorial - API Examples","text":"It's good to know the code for generating text with LLM inference, and ancillary things like tokenization, chat templates, and prompting. On this page we give Python examples of running various LLM APIs, and their benchmarks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35) JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for l4t-text-generation
container image>10GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
The HuggingFace Transformers API is the de-facto API that models are released for, often serving as the reference implementation. It's not terribly fast, but it does have broad model support, and also supports quantization (AutoGPTQ, AWQ). This uses streaming:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer\nfrom threading import Thread\n\nmodel_name='meta-llama/Llama-2-7b-chat-hf'\nmodel = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nstreamer = TextIteratorStreamer(tokenizer)\n\nprompt = [{'role': 'user', 'content': 'Can I get a recipe for French Onion soup?'}]\ninputs = tokenizer.apply_chat_template(\n prompt,\n add_generation_prompt=True,\n return_tensors='pt'\n).to(model.device)\n\nThread(target=lambda: model.generate(inputs, max_new_tokens=256, streamer=streamer)).start()\n\nfor text in streamer:\n print(text, end='', flush=True)\n
To run this (it can be found here), you can mount a directory containing the script or your jetson-containers directory:
jetson-containers run --volume $PWD/packages/llm:/mount --workdir /mount \\\n $(autotag l4t-text-generation) \\\n python3 transformers/test.py\n
We use the l4t-text-generation
container because it includes the quantization libraries in addition to Transformers, for running the quanztized versions of the models like TheBloke/Llama-2-7B-Chat-GPTQ
The huggingface-benchmark.py
script will benchmark the models:
./run.sh --volume $PWD/packages/llm/transformers:/mount --workdir /mount \\\n $(./autotag l4t-text-generation) \\\n python3 huggingface-benchmark.py --model meta-llama/Llama-2-7b-chat-hf\n
* meta-llama/Llama-2-7b-chat-hf AVG = 20.7077 seconds, 6.2 tokens/sec memory=10173.45 MB\n* TheBloke/Llama-2-7B-Chat-GPTQ AVG = 12.3922 seconds, 10.3 tokens/sec memory=7023.36 MB\n* TheBloke/Llama-2-7B-Chat-AWQ AVG = 11.4667 seconds, 11.2 tokens/sec memory=4662.34 MB\n
"},{"location":"tutorial_api-examples.html#nanollm","title":"NanoLLM","text":"The NanoLLM
library uses the optimized MLC/TVM library for inference, like on the Benchmarks page:
from nano_llm import NanoLLM, ChatHistory, ChatTemplates\n\n# load model\nmodel = NanoLLM.from_pretrained(\n model='meta-llama/Meta-Llama-3-8B-Instruct', \n quantization='q4f16_ft', \n api='mlc'\n)\n\n# create the chat history\nchat_history = ChatHistory(model, system_prompt=\"You are a helpful and friendly AI assistant.\")\n\nwhile True:\n # enter the user query from terminal\n print('>> ', end='', flush=True)\n prompt = input().strip()\n\n # add user prompt and generate chat tokens/embeddings\n chat_history.append(role='user', msg=prompt)\n embedding, position = chat_history.embed_chat()\n\n # generate bot reply\n reply = model.generate(\n embedding, \n streaming=True, \n kv_cache=chat_history.kv_cache,\n stop_tokens=chat_history.template.stop,\n max_new_tokens=256,\n )\n\n # append the output stream to the chat history\n bot_reply = chat_history.append(role='bot', text='')\n\n for token in reply:\n bot_reply.text += token\n print(token, end='', flush=True)\n\n print('\\n')\n\n # save the inter-request KV cache \n chat_history.kv_cache = reply.kv_cache\n
This example keeps an interactive chat running with text being entered from the terminal. You can start it like this:
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat.example\n
Or for easy editing from the host device, copy the source into your own script and mount it into the container with the --volume
flag. And for authenticated models, request access through HuggingFace (like with Llama) and substitute your account's API token above.
Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
10.7 GB
for audiocraft
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag audiocraft)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
AudioCraft repo comes with demo Jupyter notebooks.
On Jupyter Lab navigation pane on the left, double-click demos
folder.
Run cells with Shift + Enter
, first one will download models, which can take some time.
Info
You may encounter an error message like the following when executing the first cell, but you can keep going.
A matching Triton is not available, some optimizations will not be enabled.\nError caught was: No module named 'triton'\n
In the Audio Continuation cells, you can generate continuation based on text, while in Text-conditional Generation you can generate audio based just on text descriptions.
You can also use your own audio as prompt, and use text descriptions to generate continuation:
prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\") # you can upload your own audio\nprompt_duration = 2\nprompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\noutput = model.generate_continuation(prompt_waveform.expand(3, -1, -1), prompt_sample_rate=prompt_sr,descriptions=[\n 'Subway train blowing its horn', # text descriptions for continuation\n 'Horse neighing furiously',\n 'Cat hissing'\n], progress=True)\ndisplay_audio(output, sample_rate=16000)\n
"},{"location":"tutorial_audiocraft.html#musicgen-and-magnet-demos","title":"MusicGen and MAGNeT demos","text":"The two other jupyter notebooks are similar to AuidioGen, where you can generate continuation or generate audio, while using models trained to generate music.
"},{"location":"tutorial_comfyui_flux.html","title":"ComfyUI and Flux on Jetson Orin","text":"Hey there, fellow developer! \ud83d\udc4b I'm excited to share with you our latest project: Flux, an open-source model for image generation. Here at NVIDIA, we're pushing the boundaries to make Flux work seamlessly across all platforms, including our Jetson Orin devices. While we're still fine-tuning the model for the Jetson Orin Nano, we've already got it running smoothly on the Jetson AGX Orin.
In this tutorial, I'm going to walk you through every step needed to get Flux up and running on your Jetson Orin, even if you've just flashed your system. Follow along, and you should have no trouble getting everything set up. And hey, if something doesn't work out, reach out to me\u2014I\u2019ll keep this guide updated to make sure it's always on point.
So, let's dive in and get Flux running on your Jetson!
"},{"location":"tutorial_comfyui_flux.html#1-install-miniconda-and-create-a-python-310-environment","title":"1. Install Miniconda and Create a Python 3.10 Environment","text":"First things first, you'll need to install Miniconda on your Jetson Orin and create a Python 3.10 environment called comfyui
. This will ensure all dependencies are handled properly within an isolated environment.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh\nchmod +x Miniconda3-latest-Linux-aarch64.sh\n./Miniconda3-latest-Linux-aarch64.sh\n\nconda update conda\n\nconda create -n comfyui python=3.10\nconda activate comfyui\n
"},{"location":"tutorial_comfyui_flux.html#2-install-cuda-cudnn-and-tensorrt","title":"2. Install CUDA, cuDNN, and TensorRT","text":"Once your environment is set up, install CUDA 12.4 along with the necessary cuDNN and TensorRT libraries to ensure compatibility and optimal performance on your Jetson Orin.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda-toolkit-12-4 cuda-compat-12-4\nsudo apt-get install cudnn python3-libnvinfer python3-libnvinfer-dev tensorrt\n
"},{"location":"tutorial_comfyui_flux.html#3-verify-and-configure-cuda","title":"3. Verify and Configure CUDA","text":"After installing CUDA, you'll want to verify that the correct version (12.4) is being used and make this change permanent in your environment.
ls -l /usr/local | grep cuda\nsudo ln -s /usr/local/cuda-12.4 /usr/local/cuda\n\nexport PATH=/usr/local/cuda/bin:$PATH\nnvcc --version\n\necho 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc\necho 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc\necho 'export CUDA_PATH=/usr/local/cuda' >> ~/.bashrc\nsource ~/.bashrc\n
"},{"location":"tutorial_comfyui_flux.html#4-compile-and-install-bitsandbytes-with-cuda-support","title":"4. Compile and Install bitsandbytes
with CUDA Support","text":"Now it\u2019s time to compile and install bitsandbytes
with CUDA support. This involves cloning the repository, configuring the build with CMake, compiling using all available cores, and installing the resulting package.
export BNB_CUDA_VERSION=124\nexport LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH\n\ngit clone https://github.com/timdettmers/bitsandbytes.git\ncd bitsandbytes\n\nmkdir -p build\ncd build\ncmake .. -DCOMPUTE_BACKEND=cuda -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4\nmake -j$(nproc)\n\ncd ..\npython setup.py install\n
Verify the installation by importing the package in Python:
python\n>>> import bitsandbytes as bnb\n>>> print(bnb.__version__)\n
"},{"location":"tutorial_comfyui_flux.html#5-install-pytorch-torchvision-and-torchaudio","title":"5. Install PyTorch, TorchVision, and TorchAudio","text":"Next up, install the essential libraries PyTorch
, torchvision
, and torchaudio
for Jetson Orin. You can always check for the latest links here.
pip install http://jetson.webredirect.org/jp6/cu124/+f/5fe/ee5f5d1a75229/torch-2.3.0-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/988/cb71323efff87/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/0aa/a066463c02b4a/torchaudio-2.3.0+952ea74-cp310-cp310-linux_aarch64.whl\n
"},{"location":"tutorial_comfyui_flux.html#6-clone-the-comfyui-repository","title":"6. Clone the ComfyUI Repository","text":"Clone the ComfyUI repository from GitHub to get the necessary source code.
git clone https://github.com/comfyanonymous/ComfyUI.git\ncd ComfyUI\n
"},{"location":"tutorial_comfyui_flux.html#7-update-dependencies","title":"7. Update Dependencies","text":"Make sure all the necessary dependencies are installed by running the requirements.txt
file.
pip install -r requirements.txt\n
"},{"location":"tutorial_comfyui_flux.html#8-resolve-issues-with-numpy","title":"8. Resolve Issues with NumPy","text":"If you encounter issues with NumPy, downgrade to a version below 2.0 to avoid compatibility problems.
pip install \"numpy<2\"\n
"},{"location":"tutorial_comfyui_flux.html#9-run-comfyui","title":"9. Run ComfyUI","text":"Finally, run ComfyUI to ensure everything is set up correctly.
python main.py\n
Great! Now that you\u2019ve got ComfyUI up and running, let's load the workflow to start using the Flux model.
flux1-schnell.safetensors
and vae ae.safetensors
from Hugging Face and place the model in the models/unet
folder and vae in models/vae
within ComfyUI.clip_l.safetensors
and t5xxl_fp8_e4m3fn.safetensors
from Stability's Hugging Face and place them inside models/clip
folder.Alright, you're all set to launch your first run! Head over to the URL provided by ComfyUI (127.0.0.1:8188) on your Jetson AGX Orin, and hit that Queue Prompt button. The first time might take a little longer as the model loads, but after that, each generation should take around 21 seconds. Plus, you can queue up multiple prompts and let it generate images for hours!!
Happy generating! \ud83c\udf89
ASIER \ud83d\ude80
Some examples:
"},{"location":"tutorial_distillation.html","title":"CLIP model distillation","text":"
See \"Jetson Introduction to Knowledge Distillation\" repo's README.md.
https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation
"},{"location":"tutorial_gapi_microservices.html","title":"Tutorial - Gapi Micro Services","text":"A Micro Service is a process that runs a wrapper python script that integrates your custom code/models so they can integrate into Gapi Workflows.
You can run a Micro Service wherever you like and connect it to a Gapi Server via the streaming, hybrid binary+json message protocol.
There are some out-of-the-box \u201cCommunity Micro Services\" that we integrate, test and pack into Docker images. When you run them, they auto integrate, load NVIDIA layers correctly and offer logging to the host system.
Even better, create your own! Just implement an on_message Python handler to process requests and respond. The rest is handled for you.
Think of a Micro Service as a simple wrapper to some code or model you have. It works the same as any other Node inside a Workflow. When it's your Micro Service Node's turn your on_message function will be called. Your script gets the rolling Transaction data for context and you then publish your data directly back into the flow.
"},{"location":"tutorial_gapi_microservices.html#running-the-community-micro-services","title":"Running the Community Micro Services","text":"Requirements for Community Micro Services
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space.
#1 Login and go to the Micro Services tab\n#2 Follow the instructions in the blue box on that page to download your custom configuration\n#3 Then follow the instructions below that for installing the Micro Service you want\n
Example of instruction page:
"},{"location":"tutorial_gapi_microservices.html#congrats-you-can-go-through-the-workflow-tips-now","title":"Congrats! You Can Go Through the Workflow Tips Now","text":""},{"location":"tutorial_gapi_microservices.html#creating-your-own-micro-service","title":"Creating Your Own Micro Service","text":"The entire Micro Service zip file is just 4KB with 4 files:
Full documentation here: Gapi Micro Service Docs. Synopsis below...
#1 Create logical Micro Service in UI and copy the key\n#2 Download the zip file from the UI\n#3 python gapi-ms.py ws://0.0.0.0:8090/gapi-ws [MICROSERVICE_KEY]\n#4 Refresh the UI to confirm it's online\n#5 Edit the message_handler.py to handle binary+json input and change the output\n#6 Add a Micro Service Node to a Workflow and tie it to your Micro Service. Hit Test.\n
"},{"location":"tutorial_gapi_workflows.html","title":"Tutorial - Gapi","text":"Gapi is an embeddable API gateway that creates streaming integrations between AI micro services and the systems that users leverage everyday.
The project's goal is to accelerate the speed of creating pilots and demos of Jetson AI Lab achievements into real world environments
\"On Device\" generative AI doesn't mean it has to live on an island!
Embeddable API gateway software that runs in the background with a low code workflow UI for testing. The server is a message hub and state machine for workflow 'nodes' that talk to Micro Services. Think of it as connective-tissue for applications.
A Micro Service is a process that runs some wrapper python scripts that integrates custom code/models into Workflows using a streaming API.
Gapi Server can run on any Jetson Orin or really any computer as the Micro Services connect outbound over secure web sockets. It doesn't use any GPU resources. There is a also a little demo version to skip the Server install (but you'll still need to run your own Micro Services).
Gapi Project Page
What you need to run Gapi Server on Jetson
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin NX (8GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Docker
Sufficient storage space.
Gapi Server will run on other environments. Email us at support@GenAINerds.com if that's something you think is worthwhile.
Explaining the Steps:
Copy and Run the Steps:
mkdir ~/gapiData && cd ~/gapiData\ncurl -L https://raw.githubusercontent.com/genai-nerds/Gapi/main/gapiConfigs.zip -o gapiConfigs.zip\nunzip -q gapiConfigs.zip\ndocker run -d --name gapi --network host -v ~/gapiData:/opt/gapi/vdata genainerds/gapi:arm64 /bin/bash -c \"cd /opt/gapi/bin && ./startGapi.sh\"\necho \"You may need to hit Enter now. Afterwards the Docker container 'gapi' should be running\"\n
Troubleshooting:
NOTE: You will need to run some Micro Services before doing anything meaningful, so please review the mini tour below but don't do any of it in the UI untill you complete the setup (instructions at the bottom)
"},{"location":"tutorial_gapi_workflows.html#ui","title":"UI","text":"Pass: !gapi2024
Change password in Settings! Docs shows how to add SSL cert.
When you login there will be an array of Tip Workflows that have notes and explain core concepts.
Tips:
Workflows visually connect the execution and flow of data between Nodes.
A Transaction (or single firing) has \"Rolling Input\" data it accumulates as Node to Node steps each talk to Micro Services and APIs. All Nodes enjoy variables and flow control using familiar json and javascript concepts.
Each Node can append or reference the rolling data in the Transaction while making decisions along the way.
Watch live Transactions as they start from clients, webhooks and published messages from Micro Services with visual feedback and debugging.
"},{"location":"tutorial_gapi_workflows.html#apis-to-business-systems","title":"APIs to Business Systems","text":"Gapi can help smoothly integrate generative AI into systems that people already use everyday via APIs. It has the streaming API to Micro Services plus the state management and chops to handle the outward (webhook) style APIs to existing systems.
Our hope is to rally contributions by the community to keep growing the out-of-the-box/tested Nodes but there is a DIY one as well to manually map what you need into your Workflows.
Some of the out-of-the-box API Nodes: Slack (IM), Twilio (SMS), SendGrid (Email), Service Now (Ticketing), DIY Webhook
"},{"location":"tutorial_gapi_workflows.html#micro-services","title":"Micro Services","text":"There are community published Micro Services as well as custom ones you can make yourself. Gapi Server becomes most useful when leveraging them so please follow the How To below.
Current Community Micro Services:
Complete the Setup: How To Run and/or Create Micro Services
"},{"location":"tutorial_gapi_workflows.html#support-contribute","title":"Support / Contribute","text":"Gapi is a project from the GenAI Nerds and hosted on Github.
The Holoscan SDK is part of NVIDIA Holoscan, the AI sensor processing platform that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications, from embedded to edge to cloud.
"},{"location":"tutorial_holoscan.html#holoscan-sdk-core-concepts","title":"Holoscan SDK - Core Concepts","text":"A Holoscan SDK Application
is composed of Fragments
, each of which runs a graph of Operators
. The implementation of that graph is sometimes referred to as a pipeline or workflow, which can be visualized below:
The best place to get started using the Holoscan SDK is the HoloHub repo. This is a central repository for the NVIDIA Holoscan AI sensor processing community to share apps and extensions.
So, let's walk through how to run the Surgical Tool Tracking example application from HoloHub!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
13.7 GB
for efficientvit
container image850 Mb
for Tool Tracking ONNX model + example videoClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build the Holoscan container.
Use the -v
option to mount HoloHub's ./build
and ./data
directories to the local Holoscan package directory so builds and data are cached across runs
jetson-containers run \\\n -v ./packages/holoscan/holohub/data:/opt/nvidia/holohub/data \\\n -v ./packages/holoscan/holohub/build:/opt/nvidia/holohub/build \\\n $(autotag holoscan)\n
"},{"location":"tutorial_holoscan.html#running-holohubs-endoscopy-tool-tracking-app","title":"Running HoloHub's Endoscopy Tool Tracking App","text":"An example application from HoloHub is the Endoscopy Tool Tracking application. This sample application demonstrates how the Holoscan SDK can be used to build an efficient pipeline that streams a video feed, preprocesses the data, runs inference using TensorRT, post-processes the data, and renders the video feed with the inference overlays.
"},{"location":"tutorial_holoscan.html#building-the-app","title":"Building The App","text":"The Holoscan SDK uses CMake to build C++ applications and also leverages CMake to pull and build app dependencies. So, regardless of whether an application is implemented using C++ or Python, many apps will still require that you \"build\" them first.
The Endoscopy Tool Tracking App has both a Python and C++ implementation. Building this app creates the C++ application program, pulls in an example video, and builds the TensorRT engine used for inference.
Go to the HoloHub directory
cd /opt/nvidia/holohub\n
Build the app using the 'run' script ./run build endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#running-the-python-app","title":"Running The Python App","text":"First, add the Holoscan SDK and the HoloHub build directory to your PYTHONPATH environment variable.
export HOLOHUB_BUILD_PATH=/opt/nvidia/holohub/build/endoscopy_tool_tracking\nexport PYTHONPATH=$PYTHONPATH:$HOLOHUB_BUILD_PATH/python/lib:/opt/nvidia/holoscan/python/lib\n
Next, run the application using Python! python3 /opt/nvidia/holohub/applications/endoscopy_tool_tracking/python/endoscopy_tool_tracking.python --data /opt/nvidia/holohub/data/endoscopy/\n
Shortly after launching the application, you will see the HoloViz window that visualizes the sample video and the model's outputs:
"},{"location":"tutorial_holoscan.html#running-the-c-app","title":"Running The C++ App","text":"The C++ app can be run using the run script by specifying the app name:
./run launch endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#next-steps","title":"Next Steps","text":"Congratulations! You've successfully run a Holoscan SDK application!
To dive deeper into the Holoscan SDK, explore the SDK's documentation on Core Concepts, Holoscan by Example, and Creating an Application.
"},{"location":"tutorial_jetson-copilot.html","title":"Tutorial - Jetson Copilot","text":"Jetson Copilot is a reference application for a local AI assistant, which demonstrates;
What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6 GB
for jetrag
container image4 GB
for downloading some default models (llama3
and mxbai-embed-large
)Info
To run Jetson Copilot, you do not need to have jetson-containers
on your system. It uses the jetrag
container image that is being managed and built on jetson-containers
.
If this is your first time to run Jetson Copilot on your Jetson, first run setup.sh
to ensure you have all the necessary software installed and the environment set up.
git clone https://github.com/NVIDIA-AI-IOT/jetson-copilot/\ncd jetson-copilot\n./setup_environment.sh\n
It will install the following, if not yet.
cd jetson-copilot\n./launch_jetson_copilot.sh\n
This will start a Docker container and start a Ollama server and Streamlit app inside the container. It will shows the URL on the console in order to access the web app hosted on your Jetson.
With your web browser on Jetson, open the Local URL (http://localhost:8501
).Or on a PC connected on the same network as on your Jetson, access the Network URL.
Local URL: http://localhost:8501\nNetwork URL: http://10.110.50.252:8501 \n
Info
You will need the Internet connection on Jetson when it launches for the first time, as it will pull the container image (and download the default LLM and embedding model when web UI starts for the first time).
When you access the web UI for the first time, it will dowload the default LLM (llama3
) and the embedding model (mxbai-embed-large
).
Tips
If you are on Ubuntu Desktop, a frameless Chromium window will pop up to access the web app, to make it look like an independent application. You need to close the window as stopping the container on the console won't shutdown Chromium.
"},{"location":"tutorial_jetson-copilot.html#how-to-use-jetson-copilot","title":"\ud83d\udcd6 How to use Jetson Copilot","text":""},{"location":"tutorial_jetson-copilot.html#0-interact-with-the-plain-llama3-8b","title":"0. Interact with the plain Llama3 (8b)","text":"You can use Jetson Copilot just to interact with a LLM withut enabling RAG feature.
By default, Llama3 (8b) model is downloaded when running for the first time and use as the default LLM.
You will be surprized how much a model like Llama3 is capable, but may soon find limitations as it does not have information prior to its cutoff date nor know anything about your specific subject matter.
"},{"location":"tutorial_jetson-copilot.html#1-ask-jetson-related-question-using-pre-built-index","title":"1. Ask Jetson related question using pre-built index","text":"On the side panel, you can toggle \"Use RAG\" on to enable RAG pipeline. The LLM will have an access to a custom knowledge/index that is selected under \"Index\".
As a sample, a pre-build index \"_L4T_README
\" is provided. This is built on all the README text files that supplied in the \"L4T-README\" folder on the Jetson desktop.
It is mounted as /media/<USER_NAME>/L4T-README/
once you execute udisksctl mount -b /dev/disk/by-label/L4T-README
.
You can ask questions like:
What IP address does Jetson gets assigned when connected to a PC via a USB cable in USB Device Mode?\n
"},{"location":"tutorial_jetson-copilot.html#2-build-your-own-index-based-on-your-documents","title":"2. Build your own index based on your documents","text":"You can build your own index based on your local and/or online documents.
First, on the console (or on the desktop) create a directory under Documents
directory to store your documents.
cd jetson-copilot\nmkdir Documents/Jetson-Orin-Nano\ncd Documents/Jetson-Orin-Nano\nwget https://developer.nvidia.com/downloads/assets/embedded/secure/jetson/orin_nano/docs/jetson_orin_nano_devkit_carrier_board_specification_sp.pdf\n
Now back on the web UI, open the side bar, toggle on \"Use RAG\", then click on \"\u2795Build a new index\" to jump to a \"Build Index\" page.
Give a name for the Index you are to build. (e.g. \"JON Carrier Board\") Type in the field and hit Enter
key, then it will check and show what path will be created for your index.
And then from the drop select box under \"Local documents\", select the directory you created and saved your documents in. (e.g. /opt/jetson_copilot/Documents/Jetson-Orin-Nano
).
It will show the summary of files found in the selected directory.
If you want to rather only or additionally supply URLs for the online docuemnts to be ingested, fill the text area with one URL per a line. You can skip this if you are building your index only based on your local documents.
Info
On the sidebar, make sure mxbai-embed-large
is selected for the embedding model.
Use of OpenAI embedding models is not well supported and needs more testing.
Finally, hit \"Build Index\" button. It will show the progress in the drop-down \"status container\", so you can check the status by clicking on it. Once done, it will show the summary of your index and time it took.
You can go back to the home screen to now select the index you just built.
"},{"location":"tutorial_jetson-copilot.html#3-test-different-llm-or-embedding-model","title":"3. Test different LLM or Embedding model","text":"TODO
"},{"location":"tutorial_jetson-copilot.html#development","title":"\ud83c\udfd7\ufe0f Development","text":"Streamlit based web app is very easy to develop.
On web UI, at the top-right of the screen to choose \"Always rerun\" to automatically update your app every time you change the source codes.
See Streamlit Documentation for the detail.
"},{"location":"tutorial_jetson-copilot.html#manually-run-streamlit-app-inside-the-container","title":"Manually run streamlit app inside the container","text":"In case you make more fundamental changes, you can also manually run streamlit app.
cd jetson-copilot\n./launch_dev.sh\n
Once in container;
streamlit run app.py\n
"},{"location":"tutorial_jetson-copilot.html#directory-structure","title":"\ud83e\uddf1 Directory structure","text":"\u2514\u2500\u2500 jetson-copilot\n \u251c\u2500\u2500 launch_jetson_copilot.sh\n \u251c\u2500\u2500 setup_environment.sh\n \u251c\u2500\u2500 Documents \n \u2502 \u2514\u2500\u2500 your_abc_docs\n \u251c\u2500\u2500 Indexes\n \u2502 \u251c\u2500\u2500 _L4T_README\n \u2502 \u2514\u2500\u2500 your_abc_index\n \u251c\u2500\u2500 logs\n \u2502 \u251c\u2500\u2500 container.log\n \u2502 \u2514\u2500\u2500 ollama.log\n \u251c\u2500\u2500 ollama_models\n \u2514\u2500\u2500 Streamlit_app\n \u251c\u2500\u2500 app.py\n \u251c\u2500\u2500 build_index.py\n \u2514\u2500\u2500 download_model.py\n
Following directories inside the jetson-copilot
directory are mounted in the Docker container.
Docuemtns
Directory to store your documents to be indexed Indexes
Directory to store pre-built (or built-by-you) indexes for LLM to perform RAG on logs
Directory for the app to store log files ollama_models
Directory for the ollama server to store download models stremlit_app
Directory for Python scripts to make up the web app"},{"location":"tutorial_jetson-copilot.html#troubleshooting","title":"\ud83d\udcab Troubleshooting","text":"If you find any issue, please check GitHub Issues of the Jetson Copilot repo.
"},{"location":"tutorial_jps.html","title":"Tutorial - Jetson Platform Services","text":"Jetson Plaform Services (JPS) provide a platform to simplify development, deployment and management of Edge AI applications on NVIDIA Jetson. JPS is a modular & extensible architecture for developers to distill large complex applications into smaller modular microservice with APIs to integrate into other apps & services. At its core are a collection of AI services leveraging generative AI, deep learning, and analytics, which provide state of the art capabilities including video analytics, video understanding and summarization, text based prompting, zero shot detection and spatio temporal analysis of object movement.
VLM Alert Workflow built with JPS
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
To get started with Jetson Platform Services, follow the quickstart guide to install and setup JPS. Then explore the reference workflows to learn how to use DeepStream, Analytics, Generative AI and more with JPS:
1) Quick Start Guide 2) AI Powered Network Video Recorder 3) Zero Shot Detection with NanoOWL 4) Visual Language Model Alerts
The reference workflows demonstrate how to use the microservices provided in JPS to build full end to end systems on your Jetson.
VLM Alert Workflow Architecture
View the links below to learn more about Jetson Platform Services: VLM Alert Blog JPS Product Page JPS Documentation VLM Alert Demo Video
"},{"location":"tutorial_live-llava.html","title":"Tutorial - Live LLaVA","text":"Recommended
Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for in interactive pipeline editor built from live VLMs.
This multimodal agent runs a vision-language model on a live camera feed or video stream, repeatedly applying the same prompts to it:
It uses models like LLaVA or VILA and has been quantized with 4-bit precision. This runs an optimized multimodal pipeline from the NanoLLM
library, including running the CLIP/SigLIP vision encoder in TensorRT, event filters and alerts, and multimodal RAG (see the NanoVLM page for benchmarks)
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>10GB
)Follow the chat-based LLaVA and NanoVLM tutorials first.
Supported vision/language models:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GBThe VideoQuery agent applies prompts to the incoming video feed with the VLM. Navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera (Chrome is recommended with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled)
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
This uses jetson_utils
for video I/O, and for options related to protocols and file formats, see Camera Streaming and Multimedia. In the example above, it captures a V4L2 USB webcam connected to the Jetson (under the device /dev/video0
) and outputs a WebRTC stream.
The example above was running on a live camera, but you can also read and write a video file or network stream by substituting the path or URL to the --video-input
and --video-output
command-line arguments like this:
jetson-containers run \\\n -v /path/to/your/videos:/mount\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /mount/my_video.mp4 \\\n --video-output /mount/output.mp4 \\\n --prompt \"What does the weather look like?\"\n
This example processes and pre-recorded video (in MP4, MKV, AVI, FLV formats with H.264/H.265 encoding), but it also can input/output live network streams like RTP, RTSP, and WebRTC using Jetson's hardware-accelerated video codecs.
"},{"location":"tutorial_live-llava.html#nanodb-integration","title":"NanoDB Integration","text":"If you launch the VideoQuery agent with the --nanodb
flag along with a path to your NanoDB database, it will perform reverse-image search on the incoming feed against the database by re-using the CLIP embeddings generated by the VLM.
To enable this mode, first follow the NanoDB tutorial to download, index, and test the database. Then launch VideoQuery like this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output \\\n --nanodb /data/nanodb/coco/2017\n
You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:
"},{"location":"tutorial_live-llava.html#video-vila","title":"Video VILA","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. The vision/video.py
example keeps a rolling history of frames:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_live-llava.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_live-llava.html#walkthrough-videos","title":"Walkthrough Videos","text":""},{"location":"tutorial_llamaindex.html","title":"Tutorial - LlamaIndex","text":"Let's use LlamaIndex, to realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents!
What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.5 GB
for llama-index
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag llama-index:samples)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
You can follow along LlamaIndex_Local-Models_L4T.ipynb
(which is based on the official LlamaIndex tutorial).
Talk live with Llama using streaming ASR/TTS, and chat about images with Llava!
NanoLLM
library provides optimized inference for LLM and speech models.The WebChat
agent has responsive conversational abilities and multimodal support for chatting about images with vision/language models, including overlapping ASR/LLM/TTS generation and verbal interruptability.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>10GB
)Start the Riva server first and test the ASR examples.
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --asr=riva --tts=piper\n
This will start llamaspeak with text LLM and ASR/TTS enabled. You can then navigate your browser to https://IP_ADDRESS:8050
--web-port
(and --ws-port
for the websocket port)The code and docs for the WebAgent
that runs llamaspeak can be found in the NanoLLM library. This block diagram shows the speech pipeline with interleaved model generation, user interruption, and streaming I/O:
If you load a multimodal vision/language model instead, you can drag images into the chat and ask questions about them:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model Efficient-Large-Model/VILA-7b \\\n --asr=riva --tts=piper\n
For more info about the supported vision/language models, see the NanoVLM page.
"},{"location":"tutorial_llamaspeak.html#function-calling","title":"Function Calling","text":"There's the ability to define functions from Python that the bot has access to and can invoke based on the chat flow:
This works by using the bot_function()
decorator and adding the API description's to the system prompt:
from nano_llm import NanoLLM, ChatHistory, BotFunctions, bot_function\nfrom datetime import datetime\n\n@bot_function\ndef DATE():\n \"\"\" Returns the current date. \"\"\"\n return datetime.now().strftime(\"%A, %B %-m %Y\")\n\n@bot_function\ndef TIME():\n \"\"\" Returns the current time. \"\"\"\n return datetime.now().strftime(\"%-I:%M %p\")\n\nsystem_prompt = \"You are a helpful and friendly AI assistant.\" + BotFunctions.generate_docs()\n
The system prompt can be autogenerated from the Python docstrings embedded in the functions themselves, and can include parameters that the bot can supply (for example, selectively saving relevant user info to a vector database for RAG like is shown in the video).
For more information about this topic, see the Function Calling section of the NanoLLM documentation.
"},{"location":"tutorial_llava.html","title":"Tutorial - LLaVA","text":"LLaVA is a popular multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries. Llava uses the CLIP vision encoder to transform images into the same embedding space as its LLM (which is the same as Llama architecture). Below we cover different methods to run Llava on Jetson, with increasingly optimized performance:
text-generation-webui
llava.serve.cli
llama.cpp
NanoVLM
text-generation-webui
4-bit (GPTQ) 2.3 9.7 GB llava.serve.cli
FP16 (None) 4.2 27.7 GB llama.cpp
4-bit (Q4_K) 10.1 9.2 GB NanoVLM
4-bit (MLC) 21.1 8.7 GB In addition to Llava, the NanoVLM
pipeline supports VILA and mini vision models that run on Orin Nano as well.
text-generation-webui
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for text-generation-webui
container image1.7GB
7.25GB
Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 download-model.py --output=/data/models/text-generation-webui \\\n TheBloke/llava-v1.5-13B-GPTQ\n
"},{"location":"tutorial_llava.html#start-web-ui-with-multimodal-extension","title":"Start Web UI with Multimodal Extension","text":"jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 server.py --listen \\\n --model-dir /data/models/text-generation-webui \\\n --model TheBloke_llava-v1.5-13B-GPTQ \\\n --multimodal-pipeline llava-v1.5-13b \\\n --loader autogptq \\\n --disable_exllama \\\n --verbose\n
Go to Chat tab, drag and drop an image into the Drop Image Here area, and your question in the text area and hit Generate:
"},{"location":"tutorial_llava.html#result","title":"Result","text":""},{"location":"tutorial_llava.html#2-run-from-the-terminal-with-llavaservecli","title":"2. Run from the terminal withllava.serve.cli
","text":"What you need
One of the following Jetson:
Jetson AGX Orin 64GB Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1GB
for llava
container14GB
for Llava-7B (or 26GB
for Llava-13B)This example uses the upstream Llava repo to run the original, unquantized Llava models from the command-line. It uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging. See the Llava container readme for more info.
"},{"location":"tutorial_llava.html#llava-v15-7b","title":"llava-v1.5-7b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-7b \\\n --image-file /data/images/hoover.jpg\n
"},{"location":"tutorial_llava.html#llava-v15-13b","title":"llava-v1.5-13b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-13b \\\n --image-file /data/images/hoover.jpg\n
Unquantized 13B may run only on Jetson AGX Orin 64GB due to memory requirements.
"},{"location":"tutorial_llava.html#3-quantized-gguf-models-with-llamacpp","title":"3. Quantized GGUF models withllama.cpp
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
llama.cpp is one of the faster LLM API's, and can apply a variety of quantization methods to Llava to reduce its memory usage and runtime. Despite its name, it uses CUDA. There are pre-quantized versions of Llava-1.5 available in GGUF format for 4-bit and 5-bit:
jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/hoover.jpg \\\n --prompt \"What does the sign say\"'\n
Quantization Bits Response Tokens/sec Memory Q4_K
4 The sign says \"Hoover Dam, Exit 9.\" 10.17 9.2 GB Q5_K
5 The sign says \"Hoover Dam exit 9.\" 9.73 10.4 GB A lower temperature like 0.1 is recommended for better quality (--temp 0.1
), and if you omit --prompt
it will describe the image:
jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/lake.jpg'\n\nIn this image, a small wooden pier extends out into a calm lake, surrounded by tall trees and mountains. The pier seems to be the only access point to the lake. The serene scene includes a few boats scattered across the water, with one near the pier and the others further away. The overall atmosphere suggests a peaceful and tranquil setting, perfect for relaxation and enjoying nature.\n
You can put your own images in the mounted jetson-containers/data
directory. The C++ code for llava-cli can be found here. The llama-cpp-python bindings also support Llava, however they are slower from Python (potentially handling of the tokens)
NanoVLM
","text":"What's Next
This section got too long and was moved to the NanoVLM page - check it out there for performance optimizations, mini VLMs, and live streaming!
"},{"location":"tutorial_minigpt4.html","title":"Tutorial - MiniGPT-4","text":"Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.8GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
minigpt4
container with models","text":"To start the MiniGPT4 container and webserver with the recommended models, run this command:
jetson-containers run $(autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 && python3 webui.py \\\n $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \\\n $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'\n
Then, open your web browser and access http://<IP_ADDRESS>:7860
.
17# First steps with Metropolis Microservices for Jetson
NVIDIA Metropolis Microservices for Jetson simplifies the development of vision AI applications, offering a suite of customizable, cloud-native tools. Before diving into this tutorial, ensure you've filled out the Metropolis Microservices for Jetson Early Access form to gain the necessary access to launch the services. This step is crucial as it enables you to utilize all the features and capabilities discussed in this guide.
Perfect for both newcomers and experts, this tutorial provides straightforward steps to kick-start your edge AI projects. Whether you're a student or an ecosystem partner working on a use case, this guide offers a straightforward start for every skill level.
"},{"location":"tutorial_mmj.html#0-install-nvidia-jetson-services","title":"0. Install NVIDIA Jetson Services:","text":"Ok, let's start by installing NVIDIA Jetson Services:
sudo apt install nvidia-jetson-services\n
Let's add some performance hacks that will be needed to run the demo faster and without streaming artifacts:
If you don't have the Orin at max performance, you can use these two commands, a reboot is needed after:
sudo nvpmodel -m 0 \nsudo jetson_clocks\n
After these two commands, a reboot is needed if your Jetson wasn't already in high-performance mode. These are optional, but they fine-tune your network buffers to ensure smoother streaming by optimizing how much data can be sent and received:
sudo sysctl -w net.core.rmem_default=2129920\nsudo sysctl -w net.core.rmem_max=10000000\nsudo sysctl -w net.core.wmem_max=2000000\n
Download NGC for ARM64 from the NGC for CLI site:
unzip ngccli_arm64.zip\nchmod u+x ngc-cli/ngc\necho \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" >> ~/.bash_profile && source ~/.bash_profile\nngc config set\n
Here it will ask for your API Key, and the organization name, to get those you need to login into NGC and generate an API key here. You should then paste the API key and use the organization name you are using. You can also press [Enter] to select the default values for the remaining options. After this, you should get the message:
Successfully saved NGC configuration to /home/jetson/.ngc/config\n
Then, login with the same API key:
sudo docker login nvcr.io -u \"\\$oauthtoken\" -p <NGC-API-KEY>\n
Now launch the Redis and Ingress services, as we need them for this tutorial.
sudo systemctl start jetson-redis\nsudo systemctl start jetson-ingress\n
"},{"location":"tutorial_mmj.html#2-download-and-launch-nvstreamer","title":"2. Download and launch NVStreamer","text":""},{"location":"tutorial_mmj.html#_1","title":"Tutorial mmj","text":"First, we need to install NVStreamer, an app that streams the videos MMJs will need to run AI on them. Follow this NVStreamer Link (In the top-left, click Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf nvstreamer.tar.gz\ncd nvstreamer\n
Launch it: sudo docker compose -f compose_nvstreamer.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#3-download-ai_nvr-and-launch","title":"3. Download AI_NVR and launch:","text":""},{"location":"tutorial_mmj.html#_2","title":"Tutorial mmj","text":"AI NVR (NGC) Link (Top-left -> Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf ai_nvr.tar.gz\nsudo cp ai_nvr/config/ai-nvr-nginx.conf /opt/nvidia/jetson/services/ingress/config/\ncd ai_nvr\nsudo docker compose -f compose_agx.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#4-download-some-sample-videos-and-upload-them-to-nvstreamer","title":"4. Download some sample videos and upload them to NVStreamer","text":"Download them from here.
unzip files.zip\n
Ok, now, this is important to understand, there are 2 web interfaces: So, first we need to upload the file in the Streamer interface, it looks like this:
There, go to File Upload, and drag and drop the file in the upload squared area.
After uploading it, go to the Dashboad option of the left menu, and copy the RTSP URL of the video you just uploaded, you will need it for the Camera Management Dashboard.
Now jump to the Camera Management Dashboard (http://localhost:30080/vst), it looks like this:
Go to the Camera Management option of the menu, then use the Add device manually option, and paste the RTSP URL, add the name of your video to the Name and Location text boxes, so it will be displayed on top of the stream.
Finally, click in the Live Streams option of the left menu, and you should be able to watch your video stream.
"},{"location":"tutorial_mmj.html#5-watch-rtsp-ai-processed-streaming-from-vlc","title":"5. Watch RTSP AI processed streaming from VLC","text":"Open VLC from another computer (localhost doesn't work here), and point to your Jetson Orin's IP address (you should be in the same network, or not having a firewal to access).
The easiest way to get Jetson's ip is launching:
ifconfig\n
And checking the IP of the interface (usually wlan0, inet IP). Then go to rtsp://[JETSON_IP]:8555/ds-test using VLC like this:
"},{"location":"tutorial_mmj.html#6-android-app","title":"6. Android app","text":"There is an Android app that allows you to track events and create areas of interest to monitor, you can find it on Google Play as AI NVR.
Here is a quick walkthough where you can see how to:
NanoLLM
is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM\u2019s, multimodality, speech services, vector databases with RAG, and web frontends like Agent Studio.
It provides similar APIs to HuggingFace, backed by highly-optimized inference libraries and quantization tools:
NanoLLM Reference Documentationfrom nano_llm import NanoLLM\n\nmodel = NanoLLM.from_pretrained(\n \"meta-llama/Meta-Llama-3-8B-Instruct\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n api_token='hf_abc123def', # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n\nresponse = model.generate(\"Once upon a time,\", max_new_tokens=128)\n\nfor token in response:\n print(token, end='', flush=True)\n
"},{"location":"tutorial_nano-llm.html#containers","title":"Containers","text":"To test a chat session with Llama from the command-line, install jetson-containers
and run NanoLLM like this:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Llama CLIAgent Studio jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --prompt \"Can you tell me a joke about llamas?\"\n
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
If you haven't already, request access to the Llama models on HuggingFace and substitute your account's API token above.
"},{"location":"tutorial_nano-llm.html#resources","title":"Resources","text":"Here's an index of the various tutorials & examples using NanoLLM on Jetson AI Lab:
Benchmarks Benchmarking results for LLM, SLM, VLM using MLC/TVM backend. API Examples Python code examples for chat, completion, and multimodal. Documentation Reference documentation for the NanoLLM model and agent APIs. Llamaspeak Talk verbally with LLMs using low-latency ASR/TTS speech models. Small LLM (SLM) Focus on language models with reduced footprint (7B params and below) Live LLaVA Realtime live-streaming vision/language models on recurring prompts. Nano VLM Efficient multimodal pipeline with one-shot image tagging and RAG support. Agent Studio Rapidly design and experiment with creating your own automation agents. OpenVLA Robot learning with Vision/Language Action models and manipulation in simulator."},{"location":"tutorial_nano-vlm.html","title":"NanoVLM - Efficient Multimodal Pipeline","text":"We saw in the previous LLaVA tutorial how to run vision-language models through tools like text-generation-webui
and llama.cpp
. In a similar vein to the SLM page on Small Language Models, here we'll explore optimizing VLMs for reduced memory usage and higher performance that reaches interactive levels (like in Liva LLava). These are great for fitting on Orin Nano and increasing the framerate.
There are 3 model families currently supported: Llava, VILA, and Obsidian (mini VLM)
"},{"location":"tutorial_nano-vlm.html#vlm-benchmarks","title":"VLM Benchmarks","text":"This FPS measures the end-to-end pipeline performance for continuous streaming like with Live Llava (on yes/no question)
"},{"location":"tutorial_nano-vlm.html#multimodal-chat","title":"Multimodal Chat","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>10GB
)Supported VLM models in NanoLLM
:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GBThe optimized NanoLLM
library uses MLC/TVM for quantization and inference provides the highest performance. It efficiently manages the CLIP embeddings and KV cache. You can find Python code for the chat program used in this example here.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32\n
This starts an interactive console-based chat with Llava, and on the first run the model will automatically be downloaded from HuggingFace and quantized using MLC and W4A16 precision (which can take some time). See here for command-line options.
You'll end up at a >> PROMPT:
in which you can enter the path or URL of an image file, followed by your question about the image. You can follow-up with multiple questions about the same image. Llava does not understand multiple images in the same chat, so when changing images, first reset the chat history by entering clear
or reset
as the prompt. VILA supports multiple images (area of active research)
During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'what does the road sign say?' \\\n --prompt 'what kind of environment is it?' \\\n --prompt 'reset' \\\n --prompt '/data/images/lake.jpg' \\\n --prompt 'please describe the scene.' \\\n --prompt 'are there any hazards to be aware of?'\n
You can also use --prompt /data/prompts/images.json
to run the test sequence, the results of which are in the table below.
\u2022 \u00a0 The model responses are with 4-bit quantization enabled, and are truncated to 128 tokens for brevity. \u2022 \u00a0 These chat questions and images are from /data/prompts/images.json
(found in jetson-containers)
When prompted, these models can also output in constrained JSON formats (which the LLaVA authors cover in their LLaVA-1.5 paper), and can be used to programatically query information about the image:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model liuhaotian/llava-v1.5-13b \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'extract any text from the image as json'\n\n{\n \"sign\": \"Hoover Dam\",\n \"exit\": \"2\",\n \"distance\": \"1 1/2 mile\"\n}\n
"},{"location":"tutorial_nano-vlm.html#web-ui","title":"Web UI","text":"To use this through a web browser instead, see the llamaspeak tutorial:
"},{"location":"tutorial_nano-vlm.html#live-streaming","title":"Live Streaming","text":"These models can also be used with the Live Llava agent for continuous streaming - just substitute the desired model name below:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
Then navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera. Using Chrome or Chromium is recommended for a stable WebRTC connection, with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled.
The Live Llava tutorial shows how to enable additional features like vector database integration, image tagging, and RAG.
"},{"location":"tutorial_nano-vlm.html#video-sequences","title":"Video Sequences","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model. The vision/video.py
example shows how to use this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_nano-vlm.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_nanodb.html","title":"Tutorial - NanoDB","text":"Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
16GB
for container image40GB
for MS COCO datasetClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Just for an example, let's use MS COCO dataset:
cd jetson-containers\nmkdir -p data/datasets/coco/2017\ncd data/datasets/coco/2017\n\nwget http://images.cocodataset.org/zips/train2017.zip\nwget http://images.cocodataset.org/zips/val2017.zip\nwget http://images.cocodataset.org/zips/unlabeled2017.zip\n\nunzip train2017.zip\nunzip val2017.zip\nunzip unlabeled2017.zip\n
"},{"location":"tutorial_nanodb.html#download-index","title":"Download Index","text":"You can download a pre-indexed NanoDB that was already prepared over the COCO dataset from here:
cd jetson-containers/data\nwget https://nvidia.box.com/shared/static/icw8qhgioyj4qsk832r4nj2p9olsxoci.gz -O nanodb_coco_2017.tar.gz\ntar -xzvf nanodb_coco_2017.tar.gz\n
This allow you to skip the indexing process in the next step, and jump to starting the Web UI.
"},{"location":"tutorial_nanodb.html#indexing-data","title":"Indexing Data","text":"If you didn't download the NanoDB index for COCO from above, we need to build the index by scanning your dataset directory:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --scan /data/datasets/coco/2017 \\\n --path /data/nanodb/coco/2017 \\\n --autosave --validate \n
This will take a few hours on AGX Orin. Once the database has loaded and completed any start-up operations , it will drop down to a >
prompt from which the user can run search queries. You can quickly check the operation by typing your query on this prompt:
> a girl riding a horse\n\n* index=80110 /data/datasets/coco/2017/train2017/000000393735.jpg similarity=0.29991915822029114\n* index=158747 /data/datasets/coco/2017/unlabeled2017/000000189708.jpg similarity=0.29254037141799927\n* index=123846 /data/datasets/coco/2017/unlabeled2017/000000026239.jpg similarity=0.292171448469162\n* index=127338 /data/datasets/coco/2017/unlabeled2017/000000042508.jpg similarity=0.29118549823760986\n* index=77416 /data/datasets/coco/2017/train2017/000000380634.jpg similarity=0.28964102268218994\n* index=51992 /data/datasets/coco/2017/train2017/000000256290.jpg similarity=0.28929752111434937\n* index=228640 /data/datasets/coco/2017/unlabeled2017/000000520381.jpg similarity=0.28642547130584717\n* index=104819 /data/datasets/coco/2017/train2017/000000515895.jpg similarity=0.285491943359375\n
You can press Ctrl+C to exit. For more info about the various options available, see the NanoDB container documentation.
"},{"location":"tutorial_nanodb.html#interactive-web-ui","title":"Interactive Web UI","text":"Spin up the Gradio server:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --path /data/nanodb/coco/2017 \\\n --server --port=7860\n
Then navigate your browser to http://<IP_ADDRESS>:7860
, and you can enter text search queries as well as drag/upload images:
To use the dark theme, navigate to http://<IP_ADDRESS>:7860/?__theme=dark
instead"},{"location":"tutorial_ollama.html","title":"Tutorial - Ollama","text":"
Ollama is a popular LLM tool that's easy to get started with, and includes a built-in model library of pre-quantized weights that will automatically be downloaded and run using llama.cpp underneath for inference. The ollama container was compiled with CUDA support.
ollama
client can run inside or outside container after starting the server.What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7GB
for ollama
container image>5GB
)# models cached under jetson-containers/data\njetson-containers run --name ollama $(autotag ollama)\n\n# models cached under your user's home directory\ndocker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0\n
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted jetson-containers/data/models/ollama
directory (or another directory that you override with OLLAMA_MODELS
)
Start the Ollama command-line chat client with your desired model (for example: llama3
, phi3
, mistral
)
# if running inside the same container as launched above\n/bin/ollama run phi3\n\n# if launching a new container for the client in another terminal\njetson-containers run $(autotag ollama) /bin/ollama run phi3\n
Or you can install Ollama's binaries for arm64 outside of container (without CUDA, which only the server needs)
# download the latest ollama release for arm64 into /bin\nsudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort=\"version:refname\" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama\nsudo chmod +x /bin/ollama\n\n# use the client like normal outside container\n/bin/ollama run phi3\n
"},{"location":"tutorial_ollama.html#open-webui","title":"Open WebUI","text":"To run an Open WebUI server for client browsers to connect to, use the open-webui
container:
docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main\n
You can then navigate your browser to http://JETSON_IP:8080
, and create a fake account to login (these credentials are only local)
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the Llava page. It gets roughly half of peak performance versus the faster APIs like NanoLLM, but is generally considered fast enough for text chat.
"},{"location":"tutorial_slm.html","title":"Tutorial - Small Language Models (SLM)","text":"Small Language Models (SLMs) represent a growing class of language models that have <7B parameters - for example StableLM, Phi-2, and Gemma-2B. Their smaller memory footprint and faster performance make them good candidates for deploying on Jetson Orin Nano. Some are very capable with abilities at a similar level as the larger models, having been trained on high-quality curated datasets.
This tutorial shows how to run optimized SLMs with quantization using the NanoLLM
library and MLC/TVM backend. You can run these models through tools like text-generation-webui
and llama.cpp as well, just not as fast - and since the focus of SLMs is reduced computational and memory requirements, here we'll use the most optimized path available. Those shown below have been profiled:
\u2022 \u00a0 The HuggingFace Open LLM Leaderboard is a collection of multitask benchmarks including reasoning & comprehension, math, coding, history, geography, ect. \u2022 \u00a0 The model's memory footprint includes 4-bit weights and KV cache at full context length (factor in extra for process overhead, library code, ect) \u2022 \u00a0 The Chat Model
is the instruction-tuned variant for chatting with in the commands below, as opposed to the base completion model.
Based on user interactions, the recommended models to try are stabilityai/stablelm-zephyr-3b
and princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT
, for having output quality on par with Llama-2-7B and well-optimized neural architectures. These models have also been used as the base for various fine-tunes (for example Nous-Capybara-3B-V1.9
) and mini VLMs. Others may not be particularly coherent.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>5GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
The nano_llm.chat
program will automatically download and quantize models from HuggingFace like those listed in the table above:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT\n
\u2022 \u00a0 For models requiring authentication, use --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN>
\u2022 \u00a0 Press Ctrl+C twice in succession to exit (once will interrupt bot output)
This will enter into interactive mode where you chat back and forth using the keyboard (entering reset
will clear the chat history)
During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model stabilityai/stablelm-zephyr-3b \\\n --max-new-tokens 512 \\\n --prompt 'hi, how are you?' \\\n --prompt 'whats the square root of 900?' \\\n --prompt 'can I get a recipie for french onion soup?'\n
You can also load JSON files containing prompt sequences, like with --prompt /data/prompts/qa.json
(the output of which is below)
\u2022 \u00a0 The model responses are with 4-bit quantization, and are truncated to 256 tokens for brevity. \u2022 \u00a0 These chat questions are from /data/prompts/qa.json
(found in jetson-containers)
Nemotron-Mini-4B-Instruct is a 4B SLM tuned for on-device deployment, RAG, and function calling and is based on Minitron-4B (pruned and distilled from Nemotron4 15B). Inference on Jetson is available through HuggingFace Transformers and llama.cpp for quantization. Here's how to run a local OpenAI-compatible server with llama.cpp and 4-bit quantized GGUF:
jetson-containers run $(autotag llama_cpp) \\\n llama-server \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --host 0.0.0.0 \\\n --port 8080\n
For a quick test, you can navigate your browser to http://JETSON_IP:8080
, connect other clients like Open WebUI, or have applications send requests to your server's OpenAI chat completion endpoints (i.e. from openai-python, REST, JavaScript, ect)
You can more easily see the performance with the llama-cli
tool:
jetson-containers run $(autotag llama_cpp) \\\n llama-cli \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --ignore-eos \\\n -n 128 \\\n -p \"The meaning to life and the universe is\"\n
# Jetson AGX Orin\nllama_print_timings: load time = 1408.27 ms\nllama_print_timings: sample time = 70.05 ms / 128 runs ( 0.55 ms per token, 1827.32 tokens per second)\nllama_print_timings: prompt eval time = 120.08 ms / 9 tokens ( 13.34 ms per token, 74.95 tokens per second)\nllama_print_timings: eval time = 3303.93 ms / 127 runs ( 26.02 ms per token, 38.44 tokens per second)\nllama_print_timings: total time = 3597.17 ms / 136 tokens\n
The model can also be previewed at build.nvidia.com (example client requests for OpenAI API are also there)
"},{"location":"tutorial_slm.html#llama-32","title":"Llama 3.2","text":"Meta has released multilingual 1B and 3B SLMs in the latest additions to the Llama family with Llama-3.2-1B
and Llama-3.2-3B
. These can be run with INT4 quantization using the latest MLC container for Jetson (dustynv/mlc:0.1.2-r36.3.0
). After having requested access to the models from Meta with your HuggingFace API key, you can download, quantize, and benchmark them with these commands:
HUGGINGFACE_KEY=YOUR_API_KEY \\\nMLC_VERSION=0.1.2 \\\n jetson-containers/packages/llm/mlc/benchmark.sh \\\n meta-llama/Llama-3.2-1B\n
Llama-3.2-1B
\u00a0 Jetson Orin Nano 54.8 tokens/sec, Jetson AGX Orin 163.9 tokens/secLlama-3.2-3B
\u00a0 Jetson Orin Nano 27.7 tokens/sec, Jetson AGX Orin 80.4 tokens/secThe Llama-3.2 SLMs use the same core Llama architecture as previous Llama releases (except tie_word_embeddings=True
), so it is already supported with quantization and full performance on edge devices. Thanks to Meta for continuing to advance open generative AI models with Llama.
Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities. All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.
What you need
One of the following Jetson devices (SDXL requires >= ~13GB memory)
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image12.4GB
for SDXL modelsHave followed the previous stable-diffusion-webui
tutorial and have the webserver container running.
Stable Diffusion XL is supported through AUTOMATIC1111's stable-diffusion-webui
with some additional settings. First you need to download the SDXL models to your jetson-containers
data directory (which is automatically mounted into the container)
# run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device\nCONTAINERS_DIR=/path/to/your/jetson-containers\nMODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/\n\nsudo chown -R $USER $MODEL_DIR\n\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors\n
It's assumed that you already have the stable-diffusion-webui
container and webserver running from the previous tutorial.
After the models have finished downloading, click the \ud83d\udd04 button to refresh the model list in the web UI. Select sd_xl_base_1.0.safetensors
from the Stable Diffusion checkpoint drop-down:
Then under the Generation tab, expand the Refiner section, and select sd_xl_refiner_1.0.safetensors
from the drop-down:
Guidance on other relevant settings:
Switch at
setting dictates the step at which the refiner takes over from the base model. At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image.Sampling steps
are between 20-40 and Switch at
is between 0.7-0.9. This takes experimentation to find the best combination for the characteristics of your desired output.low quality, jpeg artifacts, blurry, ect
)CFG Scale
controls how closely the model conforms to your prompt versus how creative it is.When you get a good image, remember to save your random seed and settings so you can re-generate it later!
"},{"location":"tutorial_stable-diffusion-xl.html#results","title":"Results","text":"photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)
a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)
small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)
small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Let's run AUTOMATIC1111's stable-diffusion-webui
on NVIDIA Jetson to generate images from our prompts!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image4.1GB
for SD 1.5 modelThe jetson-containers project provides pre-built Docker images for stable-diffusion-webui
. You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag stable-diffusion-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/stable-diffusion-webui && python3 launch.py \\\n --data=/data/models/stable-diffusion \\\n --enable-insecure-extension-access \\\n --xformers \\\n --listen \\\n --port=7860\n
You should see it downloading the model checkpoint on the first run.
Open your browser and access http://<IP_ADDRESS>:7860
To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui
on NVIDIA Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for container imageThe jetson-containers project provides pre-built Docker images for text-generation-webui
along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect). You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag text-generation-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/text-generation-webui && python3 server.py \\\n --model-dir=/data/models/text-generation-webui \\\n --chat \\\n --listen\n
Open your browser and access http://<IP_ADDRESS>:7860
.
See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using download-model.py
jetson-containers run --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \\\n 'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'\n
From within the web UI, select Model tab and navigate to \"Download model or LoRA\" section.
You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub). Then click the Download button.
"},{"location":"tutorial_text-generation.html#gguf-models","title":"GGUF models","text":"The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.
You can download a single model file for a particular quantization, like *.Q4_K_M.bin
. Input the file name and hit Download button.
TheBloke/Llama-2-7b-Chat-GGUF
llama-2-7b-chat.Q4_K_M.gguf
5,268 TheBloke/Llama-2-13B-chat-GGUF
llama-2-13b-chat.Q4_K_M.gguf
8,609 TheBloke/LLaMA-30b-GGUF
llama-30b.Q4_K_S.gguf
19,045 TheBloke/Llama-2-70B-chat-GGUF
llama-2-70b-chat.Q4_K_M.gguf
37,655 Info
"},{"location":"tutorial_text-generation.html#model-selection-for-jetson-orin-nano","title":"Model selection for Jetson Orin Nano","text":"Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below. The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano. Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.
"},{"location":"tutorial_text-generation.html#load-a-model","title":"Load a model","text":"After you have downloaded a model, click the \ud83d\udd04 button to refresh your model list, and select the model you want to use.
For a GGUF model, remember to
n-gpu-layers
to 128
n_gqa
to 8
if you using Llama-2-70B (on Jetson AGX Orin 64GB)Then click the Load button.
"},{"location":"tutorial_text-generation.html#chat-template","title":"Chat Template","text":"If you're using a Llama model fine-tuned for chat, like the models listed above (except for LLaMA-30b
), you need to use the oobabooga Instruct mode and set the template. On the Parameters tab, go to the Instruction Template sub-tab, then select Llama-v2
from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)
For the base text completion models (like LLaMA-30b
), use the Default or Notebook tab.
Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior. There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like temperature
and top_p
for controlling randomness.
Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode). Then you can start chatting with the LLM!
"},{"location":"tutorial_text-generation.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html#things-to-do-with-your-llm","title":"Things to do with your LLM","text":"Here are some common test prompts for coding, math, history ect. You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else. Also Llama-2 is quite playful and likes to play games to test its logic abilities!
>> What games do you like to play?\n\nI'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:\n\n- 20 Questions: You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions. The goal is to guess the object in as few questions as possible.\n- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!\n- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!\n- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!\n- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own\n- Jokes: I love to tell jokes and make people laugh with my \"Dad humor\"! Knock knock! *giggles*\n
"},{"location":"tutorial_text-generation.html#model-size-tested","title":"Model size tested","text":"With llama.cpp, GGUF model, 4-bit quantization:
Model size Jetson AGX Orin 64GB Jetson AGX Orin 32GB Jetson Orin Nano 8GB 70B model \u2705 30B model \u2705 \u2705 13B model \u2705 \u2705 7B model \u2705 \u2705 \u2705Want to explore using Python APIs to run LLMs directly? See jetson-containers
for its LLM related packages and containers.
Limited to 7B model (4-bit quantized).\u00a0\u21a9
Let's run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT.
Ultralytics YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Nano (4GB)
Running one of the following versions of JetPack:
JetPack 4 (L4T r32.x) JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.
JetPack 4JetPack 5JetPack 6t=ultralytics/ultralytics:latest-jetson-jetpack4\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack5\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack6\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
"},{"location":"tutorial_ultralytics.html#convert-model-to-tensorrt-and-run-inference","title":"Convert model to TensorRT and run inference","text":"The YOLOv8n model in PyTorch format is converted to TensorRT to run inference with the exported model.
Example
PythonCLIfrom ultralytics import YOLO\n\n# Load a YOLOv8n PyTorch model\nmodel = YOLO(\"yolov8n.pt\")\n\n# Export the model\nmodel.export(format=\"engine\") # creates 'yolov8n.engine'\n\n# Load the exported TensorRT model\ntrt_model = YOLO(\"yolov8n.engine\")\n\n# Run inference\nresults = trt_model(\"https://ultralytics.com/images/bus.jpg\")\n
# Export a YOLOv8n PyTorch model to TensorRT format\nyolo export model=yolov8n.pt format=engine # creates 'yolov8n.engine'\n\n# Run inference with the exported model\nyolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'\n
Manufacturing Sports Wildlife Vehicle Spare Parts Detection Football Player Detection Tiger pose Detection Note
Visit the Export page to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when dynamic=False
. To change the input source for inference, please refer to Model Prediction page.
Benchmarks of the YOLOv8 variants with TensorRT were run by Seeed Studio on their reComputer systems:
Xavier NX 8GBOrin NX 16GBAGX Orin 32GB Model PyTorch FP32 FP16 INT8 YOLOv8n 32 63 120 167 YOLOv8s 25 26 69 112 YOLOv8m 11 11 33 56 YOLOv8l 6 6 20 38 Model PyTorch FP32 FP16 INT8 YOLOv8n 56 115 204 256 YOLOv8s 53 67 128 196 YOLOv8m 26 31 63 93 YOLOv8l 16 20 42 69 Model PyTorch FP32 FP16 INT8 YOLOv8n 77 192 323 385 YOLOv8s 67 119 213 303 YOLOv8m 40 56 105 145 YOLOv8l 27 38 73.5 114To learn more, visit our comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson including benchmarks!
Note
Ultralytics YOLOv8 models are offered under AGPL-3.0 License
which is an OSI-approved open-source license and is ideal for students and enthusiasts, promoting open collaboration and knowledge sharing. See the LICENSE file for more details.
Install the package:
pip install jetson-examples\n
Restart your reComputer:
sudo reboot\n
Run Ultralytics YOLO on Jetson with one command:
reComputer run ultralytics-yolo\n
http://127.0.0.1:5001
or http://device_ip:5001
in your browser to access the Web UI.
For more details, please read: Jetson-Example: Run Ultralytics YOLO Platform Service on NVIDIA Jetson Orin .
"},{"location":"tutorial_ultralytics.html#follow-ultralytics-to-stay-updated","title":"Follow Ultralytics to stay updated!","text":""},{"location":"tutorial_voicecraft.html","title":"Tutorial - VoiceCraft","text":"Let's run VoiceCraft, a Zero-Shot Speech Editing and Text-to-Speech in the Wild!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
15.6 GB
for voicecraft
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag voicecraft)\n
The container has a default run command (CMD
) that will automatically start the Gradio app.
Open your browser and access http://<IP_ADDRESS>:7860
.
VoiceCraft repo comes with Gradio demo app.
/data
folder, where are saved to from previous runsWarning
For TTS it's okay to use only first few seconds of audio as prompt, since it consumes a lot of memory. On AGX 32GB Orin the maximal TTS length of generated audio is around ~16 seconds in headless mode.
"},{"location":"tutorial_voicecraft.html#resources","title":"Resources","text":"If you want to know how it works under the hood, you can read following papers:
Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1 GB
for whisper
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag whisper)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server, with SSL enabled.
Open your browser and access https://<IP_ADDRESS>:8888
.
Attention
Note it is https
(not http
).
HTTPS (SSL) connection is needed to allow ipywebrtc
widget to have access to your microphone (for record-and-transcribe.ipynb
).
You will see a warning message like this.
Press \"Advanced\" button and then click on \"Proceed to (unsafe)\" link to proceed to the Jupyter Lab web interface.
The default password for Jupyter Lab is nvidia
.
Whisper repo comes with demo Jupyter notebooks, which you can find under /notebooks/
directory.
jetson-containers
also adds one convenient notebook (record-and-transcribe.ipynb
) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.
record-and-transcribe.ipynb
","text":"This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's medium
model to transcribe the audio sample.
It uses Jupyter notebook/lab's ipywebrtc
extension to record an audio sample on your web browser.
Attention
When you click the \u23fa botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.
Final checkOnce done, if you click on the \"\u26a0 Not secure\" part in the URL bar, you should see something like this.
"},{"location":"tutorial_whisper.html#result","title":"Result","text":"Once you go through all the steps, you should see the transcribe result in text like this.
"},{"location":"vit/index.html","title":"Vision Transformers & Computer Vision","text":"You can quickly run these realtime Vision Transformers (ViT) and computer vision models onboard your Jetson:
"},{"location":"vit/index.html#efficient-vit","title":"Efficient ViT","text":""},{"location":"vit/index.html#nanosam","title":"NanoSAM","text":""},{"location":"vit/index.html#nanoowl","title":"NanoOWL","text":""},{"location":"vit/index.html#sam","title":"SAM","text":""},{"location":"vit/index.html#tam","title":"TAM","text":""},{"location":"vit/index.html#ultralytics-yolov8","title":"Ultralytics YOLOv8","text":""},{"location":"vit/tutorial_efficientvit.html","title":"Tutorial - EfficientViT","text":"Let's run MIT Han Lab's EfficientViT on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
10.9 GB
for efficientvit
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag efficientvit)\n
"},{"location":"vit/tutorial_efficientvit.html#usage-of-efficientvit","title":"Usage of EfficientViT","text":"The official EfficientViT repo shows the complete usage information: https://github.com/mit-han-lab/efficientvit#usage
Inside the container, a small benchmark script benchmark.py
is added under /opt/efficientvit
directory by the jetson-container build process.
It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.
"},{"location":"vit/tutorial_efficientvit.html#download-l2pt-model","title":"Downloadl2.pt
model","text":"mkdir -p /data/models/efficientvit/sam/\ncd /data/models/efficientvit/sam/\nwget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt\n
The downloaded checkpoint file is stored on the /data/
directory that is mounted from the Docker host.
cd /opt/efficientvit\npython3 ./benchmark.py\n
At the end you should see a summary like the following.
AVERAGE of 2 runs:\n encoder --- 0.062 sec\n latency --- 0.083 sec\nMemory consumption : 3419.68 MB\n
"},{"location":"vit/tutorial_efficientvit.html#check-the-outputresult","title":"Check the output/result","text":"The output image file (of the last inference result) is stored as /data/benchmarks/efficientvit_sam_demo.png
.
It is stored under /data/
directory that is mounted from the Docker host. So you can go back to your host machine, and check jetson-containers/data/benchmark/
directory.
You should find the output like this.
"},{"location":"vit/tutorial_nanoowl.html","title":"Tutorial - NanoOWL","text":"Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7.2 GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl)\n
"},{"location":"vit/tutorial_nanoowl.html#how-to-run-the-tree-prediction-live-camera-example","title":"How to run the tree prediction (live camera) example","text":"Ensure you have a camera device connected
ls /dev/video*\n
If no video device is found, exit from the container and check if you can see a video device on the host side.
Launch the demo
cd examples/tree_demo\npython3 tree_demo.py ../../data/owl_image_encoder_patch32.engine\n
Info
If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.
python3 -m nanoowl.build_image_encoder_engine \\\n data/owl_image_encoder_patch32.engine\n
Second, open your browser to http://<ip address>:7860
Type whatever prompt you like to see what works!
Here are some examples
[a face [a nose, an eye, a mouth]]
[a face (interested, yawning / bored)]
(indoors, outdoors)
Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.3GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nanosam)\n
"},{"location":"vit/tutorial_nanosam.html#run-examples","title":"Run examples","text":"Inside the container, you can move to /opt/nanosam
directory, to go through all the examples demonstrated on the repo.
cd /opt/nanosam\n
To run the \"Example 1 - Segment with bounding box\":
python3 examples/basic_usage.py \\\n --image_encoder=\"data/resnet18_image_encoder.engine\" \\\n --mask_decoder=\"data/mobile_sam_mask_decoder.engine\"\n
The result is saved under /opt/nanosam/data/basic_usage_out.jpg
.
To check on your host machine, you can copy that into /data
directory of the container where that is mounted from the host.
cp data/basic_usage_out.jpg /data/\n
Then you can go to your host system, and find the file under jetson-containers/data/basic_usage_out.jpg
Let's run Meta's SAM
on NVIDIA Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag sam)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
In Jupyter Lab, navigate to notebooks
and open automatic_mask_generator_example.py
notebook.
Create a new cell at the top, insert the model download command below and run the cell.
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\n
Then go through executing all the cells below Set-up.
"},{"location":"vit/tutorial_sam.html#results","title":"Results","text":"The biggest vit_h
(2.4GB) model may not ran due to OOM, but vit_l
(1.1GB) runs on Jetson Orin Nano.\u00a0\u21a9
Let's run TAM
to perform Segment Anything on videos on NVIDIA Jetson.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag tam)\n
The container has a default run command (CMD
) that will automatically start TAM's web server.
Open your browser and access http://<IP_ADDRESS>:12212
.
Check out the official tutorial to learn how to operate the web UI.
"},{"location":"vit/tutorial_tam.html#results","title":"Results","text":""},{"location":"vit/tutorial_tam.html#troubleshooting","title":"Troubleshooting","text":""},{"location":"vit/tutorial_tam.html#filenotfounderror-errno-2-no-such-file-or-directory-checkpointse2fgvi-hq-cvpr22pth","title":"FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/E2FGVI-HQ-CVPR22.pth'
","text":"You may find the TAM app fails to download a checkpoint file E2FGVI-HQ-CVPR22.pth
.
Downloading checkpoints from Google Drive... tips: If you cannot see the progress bar, please try to download it manuall and put it in the checkpointes directory. E2FGVI-HQ-CVPR22.pth: https://github.com/MCG-NKU/E2FGVI(E2FGVI-HQ model)\nAccess denied with the following error:\n\n Cannot retrieve the public link of the file. You may need to change\n the permission to 'Anyone with the link', or have had many accesses. \n\nYou may still be able to access the file from the browser:\n\n https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \n
You can manually download the checkpoint file on your Docker host machine.
cd jetson-containers/\npip install gdown\nsource ~/.profile\ngdown https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \nmv E2FGVI-HQ-CVPR22.pth ./data/models/tam/\n
And you can try running the TAM container.
jetson-containers run $(autotag tam)\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"agent_studio.html","title":"Agent Studio","text":"Rapidly design and experiment with creating your own automation agents, personal assistants, and edge AI systems in an interactive sandbox for connecting multimodal LLMs, speech and vision transformers, vector databases, prompt templating, and function calling to live sensors and I/O. Optimized for deployment onboard Jetson with on-device compute, low-latency streaming, and unified memory.
\u2022 Edge LLM inference with quantization and KV caching (NanoLLM) \u2022 Realtime vision/language models (ala Live Llava and Video VILA) \u2022 Speech recognition and synthesis (Whisper ASR, Piper TTS, Riva) \u2022 Multimodal vector database from NanoDB \u2022 Audio and video streaming (WebRTC, RTP, RTSP, V4L2) \u2022 Performance monitoring and profiling \u2022 Native bot-callable functions and agent tools \u2022 Extensible plugins with auto-generated UI controls \u2022 Save, load, and export pipeline presets \u2022 Native bot-callable functions and agent tools"},{"location":"agent_studio.html#running-agent-studio","title":"Running Agent Studio","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>5GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
This will start the server running on your device. You can then navigate your browser to https://IP_ADDRESS:8050
--load
flag (like --load /data/nano_llm/presets/xyz.json
) --web-port
(and --ws-port
for the websocket port)To make code changes without rebuilding the container, clone the NanoLLM sources and then mount them over /opt/NanoLLM
git clone https://github.com/dusty-nv/NanoLLM\njetson-containers run \\\n -v ${PWD}/NanoLLM:/opt/NanoLLM \\\n $(autotag nano_llm)\n
You can then edit the source from outside the container. And in the terminal that starts, you can install other packages from apt/pip/ect.
"},{"location":"agent_studio.html#plugins","title":"Plugins","text":"The models, I/O, and logic blocks share a lightweight plugin interface, exposing a process()
function and any runtime properties:
from nano_llm import Plugin\n\nclass TextFilter(Plugin):\n def __init__(self, search: str='bad word', replace: str='censored', **kwargs):\n \"\"\"\n A simple plugin performing text replacement. Write docs because they're extracted for the UI,\n and for the tool descriptions that go to the bot for function calling (and use the type hints) \n\n Args:\n search: The string to look for an replace in incoming text.\n replace: What to replace occurances of the search string with.\n \"\"\"\n super().__init__(inputs=['text'], outputs=['text'])\n\n # set these attributes and register them as configurable\n self.add_parameters(search=search, replace=replace)\n\n def process(self, input, **kwargs):\n \"\"\" Substitute the text, and the return value will be sent to the plugins connected to this. \"\"\"\n return input.replace(self.search, self.replace)\n
Typically each plugin runs asynchronously its own thread and queue of data, although some lightweight routines are executed inline. Usually they're in the same process, but could make requests to microservices. Websockets are used for clients and the web UI.
Below are descriptions of commonly-used components. Help text for these is extracted from their Python docs and shown in the UI, along with settings that you can control. In the node editor, click on a plugin to open its grid widget or configuration window.
"},{"location":"agent_studio.html#llm","title":"LLM","text":"NanoLLMAutoPromptUserPromptTextStream >nano_llm.plugins.NanoLLM
(source)
Load quantized LLM/VLM with MLC (speed), AWQ (quality), or HF Transformers (compatability)\n\n - Inputs\n * str, list[str], np.ndarray, torch.Tensor, cudaImage, PIL.Image \n\n - Commands \n * /reset /refresh /generate (these inputs can control the chat)\n\n - Outputs\n * delta The latest detokenized text to be generated\n * partial All of the text generated in the response so far\n * final The complete response sent at the end of the request\n * words Similar to 'delta', but with at least one word at a time\n * history A list of the messages in the chat history (as dictionaries)\n * tools Enable the bot to call functions in plugins connected on this channel\n
> nano_llm.plugins.AutoPrompt
(source)
Apply a template each time new data comes in, for example '<image> Describe the image' \n would tag each incoming image along with that text prompt (with images, usually for VLM)\n\n AutoPrompt can be used to compose structured messages that reference prior inputs, \n like `Image 1: <image> Image 2: <image>` or using <text> for last text recieved: \n\n The three most recent replies were:\n * <text>\n * <text>\n * <text> \n\nThese most recent inputs are used in newest to oldest order from a LIFO queue.\n
> nano_llm.plugins.UserPrompt
(source)
This plugin is for inputting text on the keyboard, either from the terminal or text box in the UI.\n\n It can also load prompts from a text or json file, and those files can reference other files to load.\n
> nano_llm.plugins.TextStream
(source)
Simple plugin for viewing any stream of text from the system in a text box in the UI. \n\n For sources emanating from ASR or LLM, can apply color highlighting to partial/final responses.\n
"},{"location":"agent_studio.html#speech","title":"Speech","text":"Whisper ASRPiper TTSVAD FilterRiva > nano_llm.plugins.WhisperASR
(source)
Whisper streaming speech-to-text with TensorRT (github.com/NVIDIA-AI-IOT/whisper_trt)\n\n Supports these Whisper models: 'tiny' (39M), 'base' (74M), 'small' (244M)\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * final The text transcript of the entire segment spoken in the input\n * partial Rolling partial transcript of the words spoken so far in the audio segment\n
The Whisper plugin is designed to be used in a pipeline with the VAD Filter to reduce erroneous output from background noise:
>nano_llm.plugins.PiperTTS
(source)
Piper text-to-speech using CUDA and onnxruntime (github.com/rhasspy/piper)\n\n Populates a list of the available Piper models that it can download, along with the speaker voices \n for multi-speaker models. The default model is 'en_US-libritts-high'. Has controls for voice speed.\n\n - Inputs\n * str either individual words, sentences, or an entire passage to synthesize\n\n - Outputs\n * audio np.ndarray (int16) at 16KHz sample rate (for low-quality models) or 22.5KHz (for medium/high)\n
This PiperTTS preset is available, which uses RateLimit to stream the output at 1x realtime factor (so it can be muted later)
>nano_llm.plugins.VADFilter
(source)
Voice activity detection model using Silero. Drops incoming audio unless it exceeds the VAD threshold.\n\n Use this on incoming audio before ASR plugins to reduce spurious transcripts produced from background noise.\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * audio np.ndarray or torch.Tensor (only when VAD confidence > threshold)\n
Riva ASR / TTS services - these run in another container over GRPC. See these resources for setup:
NGC - riva-embedded quickstart guide
jetson-containers - riva container setup
JetsonHacks - Speech AI on NVIDIA Jetson Tutorial
AudioInputDevice
USB/I2S soundcard input via PulseAudio/PyAudio (for microphone directly attached to system) AudioOutputDevice
USB/I2S soundcard output via PulseAudio/PyAudio (for speakers directly attached to system) AudioRecorder
Save an audio stream to WAV file on the server WebAudioIn
Recieve audio samples streamed from the client over websockets WebAudioOut
Transmit audio samples to the client over websockets"},{"location":"agent_studio.html#video","title":"Video","text":"VideoSource
Capture images from camera device (V4L2/CSI), network stream (RTP, RTSP), video file (MP4, MKV, AVI, FLV) VideoOutput
Output H264/H265-encoded video to network stream (RTP, RTSP, WebRTC), display, or file (MP4, MKV, AVI, FLV) VideoOverlay
Draw text on top of video streams for HUD or OSD-style display RateLimit
Throttle transmission to a specified rate (can also be used with audio)"},{"location":"agent_studio.html#database","title":"Database","text":"NanoDB
Optimized in-memory multimodal vectorDB for txt2img/img2img similarity search and image tagging [[\u2197]](tutorial_nanodb.md) Deduplicate
Filter incoming text against previous entries with embedding model and cosine similarity over a timeout period EventFilter
Look for keywords or search strings in streams of text, keeping track of the begin/end times that these occur."},{"location":"agent_studio.html#tips-tricks","title":"Tips & Tricks","text":"Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recreated in the dynamic environment of Agent Studio without needing to touch any code. Here are some practical tips as you go about trying different combinations of model pipelines:
"},{"location":"agent_studio.html#ui","title":"UI","text":"jetson-containers/data/nano_llm/presets
--load
flag at startup.Clear Cache
button in the Agent menu. RateLimiter
plugin to throttle data sources and balance resources.drop_inputs
setting that some plugins expose will always keep them up with the latest request.${DATE}
and ${TIME}
${HOMEASSISTANT.DEVICES}
Image 1: <image> Image 2: <image>
or: The three most recent replies were:\n * <text>\n * <text>\n * <text>\n
\ud83e\udd16 Have fun bot building! If you need help, reach out on the Jetson Forums or GitHub Issues.
"},{"location":"benchmarks.html","title":"Benchmarks","text":""},{"location":"benchmarks.html#large-language-models-llm","title":"Large Language Models (LLM)","text":"For running LLM benchmarks, see the MLC
container documentation.
Small language models are generally defined as having fewer than 7B parameters (Llama-7B shown for reference) For more data and info about running these models, see the SLM
tutorial and MLC
container documentation.
This measures the end-to-end pipeline performance for continuous streaming like with Live Llava. For more data and info about running these models, see the NanoVLM
tutorial.
VIT performance data from [1] [2] [3]
"},{"location":"benchmarks.html#stable-diffusion","title":"Stable Diffusion","text":""},{"location":"benchmarks.html#riva","title":"Riva","text":"For running Riva benchmarks, see ASR Performance and TTS Performance.
"},{"location":"benchmarks.html#vector-database","title":"Vector Database","text":"For running vector database benchmarks, see the NanoDB
container documentation.
Below, you'll find a collection of guides, tutorials, and articles contributed by the community showcasing the implementation of generative AI on the Jetson platform.
"},{"location":"community_articles.html#github-jetbot-voice-activated-copilot-tools-with-ros2-riva-and-nanollm-9-21-2024","title":"GitHub Jetbot Voice-Activated Copilot Tools with ROS2, RIVA, and NanoLLM (9-21-2024)","text":"Jen Hung Ho created ROS2 nodes for ASR/TTS and LLM/VLM on Jetson that can can be used to control JetBot, including customizable voice commands and the execution of advanced actions. Check it out on GitHub under Jen-Hung-Ho/ros2_jetbot_tools
and Jen-Hung-Ho/ros2_jetbot_voice
and on the forums here.
Nurgaliyev Shakhizat creates an interactive talking avatar using ASR, TTS, LLM, and Audio2Face with NVIDIA Omniverse and Unreal Engine:
"},{"location":"community_articles.html#hackster-an-eye-for-an-item-4-26-2024","title":"Hackster An Eye for an Item (4-26-2024)","text":"Allen Tao builds an indoor robot with Isaac ROS that maps your home and remembers where things are using SLAM and NanoDB!
"},{"location":"community_articles.html#hackster-escalator-people-tracker-4-2-2024","title":"Hackster Escalator People Tracker (4-2-2024)","text":"George Profenza & team install local CV solution for detecting and tracking people in large retail spaces to drive live Generative AI graphics:
"},{"location":"community_articles.html#hackster-edge-style-fashion-preview-at-the-edge-4-1-2024","title":"Hackster Edge Style: Fashion Preview at the Edge (4-1-2024)","text":"AI-powered app from Andrei Ciobanu shows virtual try-ons with customer images, enhancing retail shopping using Jetson Orin for speed and privacy:
"},{"location":"community_articles.html#hackster-cooking-meals-with-a-local-ai-assistant-on-jetson-axg-orin-4-1-2024","title":"Hackster Cooking meals with a local AI assistant on Jetson AXG Orin (4-1-2024)","text":"Dimiter Kendri builds a multimodal, multi AI agent, fully local, conversational chatbot with multi agent research capabilities via speech queries:
"},{"location":"community_articles.html#hackster-realtime-language-segment-anything-on-jetson-orin-3-4-2024","title":"Hackster Realtime Language-Segment-Anything on Jetson Orin (3-4-2024)","text":"Huy Mai enables Segment Anything (SAM) with natural language prompting using GroundingDINO for object detection:
"},{"location":"community_articles.html#github-japanese-nmt-translation-for-stable-diffusion-2-23-2024","title":"GitHub Japanese NMT Translation for Stable Diffusion (2-23-2024)","text":"Toshihiko Aoki has created a prompt generator for stable-diffusion-webui
that translates Japanese queries into English using a fine-tuned GPT-2 NMT model before feeding them into Stable Diffusion. Check out the full guide on GitHub under to-aoki/ja-tiny-sd-webui
, including the training dataset and LoRA building!
Vy Pham has created a novel denoising pipeline using a custom trained Transformer-based diffusion model and GAN upscaler for image enhancement, running on Jetson AGX Orin. It runs interactively in a Streamlit web UI for photo capturing and the processing of images and videos. Great work!
Your browser does not support the video tag."},{"location":"community_articles.html#hackster-ai-powered-application-for-the-blind-and-visually-impaired-12-13-2023","title":"Hackster AI-Powered Application for the Blind and Visually Impaired (12-13-2023)","text":"Nurgaliyev Shakhizat demonstrates a locally-hosted Blind Assistant Device running on Jetson AGX Orin 64GB Developer Kit for realtime image-to-speech translation:
\u00a0 Find more resources about this project here: [Hackster] [GitHub]
"},{"location":"community_articles.html#daves-armoury-bringing-glados-to-life-with-robotics-and-ai-2-8-2024","title":"Dave's Armoury Bringing GLaDOS to life with Robotics and AI (2-8-2024)","text":"See how DIY robotics legend Dave Niewinski from davesarmoury.com brings GLaDOS to life using Jetson AGX Orin, running LLMs onboard alongside object + depth tracking, and RIVA ASR/TTS with a custom-trained voice model for speech recognition and synthesis! Using Unitree Z1 arm with 3D printing and StereoLabs ZED2.
\u00a0 Find more resources about this project here: [Forums] [GitHub]
"},{"location":"community_articles.html#hackster-seeed-studios-local-voice-chatbot-puts-a-speech-recognizing-llama-2-llm-on-your-jetson-2-7-2024","title":"Hackster Seeed Studio's Local Voice Chatbot Puts a Speech-Recognizing LLaMa-2 LLM on Your Jetson (2-7-2024)","text":"Seeed Studio has announced the launch of the Local Voice Chatbot, an NVIDIA Riva- and LLaMa-2-based large language model (LLM) chatbot with voice recognition capabilities \u2014 running entirely locally on NVIDIA Jetson devices, including the company's own reComputer range. Follow the step-by-step guide on the Seeed Studio wiki.
"},{"location":"community_articles.html#youtube-genai-nerds-react-insider-look-at-nvidias-newest-generative-ai-2-6-2024","title":"YouTube GenAI Nerds React - Insider Look at NVIDIA's Newest Generative AI (2-6-2024)","text":"Watch this panel about the latest trends & tech in edge AI, featuring Kerry Shih from OStream, Jim Benson from JetsonHacks, and Dusty from NVIDIA.
"},{"location":"community_articles.html#nvidia-bringing-generative-ai-to-life-with-nvidia-jetson-11-7-2023","title":"NVIDIA Bringing Generative AI to Life with NVIDIA Jetson (11-7-2023)","text":"Watch this webinar about deploying LLMs, VLMs, ViTs, and vector databases onboard Jetson Orin for building next-generation applications using Generative AI:
"},{"location":"community_articles.html#jetsonhacks-jetson-ai-labs-generative-ai-playground-10-31-2023","title":"JetsonHacks Jetson AI Labs \u2013 Generative AI Playground (10-31-2023)","text":"JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model.
"},{"location":"community_articles.html#hackster-vision2audio-giving-the-blind-an-understanding-through-ai-10-15-2023","title":"Hackster Vision2Audio - Giving the blind an understanding through AI (10-15-2023)","text":"Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people:
"},{"location":"community_articles.html#nvidia-generative-ai-models-at-the-edge-10-19-2023","title":"NVIDIA Generative AI Models at the Edge (10-19-2023)","text":"Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:
\u00a0 Technical Blog - https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/
"},{"location":"community_articles.html#medium-how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-10-02-2023","title":"Medium How to set up your Jetson device for LLM inference and fine-tuning (10-02-2023)","text":"Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.
https://medium.com/@michaelyuan_88928/how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-682e36444d43
"},{"location":"community_articles.html#hackster-getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-09-16-2023","title":"Hackster Getting Started with AI on Nvidia Jetson AGX Orin Dev Kit (09-16-2023)","text":"Nurgaliyev Shakhizat demonstrates llamaspeak
on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
Nick Bild provides an insightful introduction to the Jetson Generative AI Playground:
https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598
"},{"location":"community_articles.html#jetsonhacks-use-these-jetson-docker-containers-tutorial-09-04-2023","title":"JetsonHacks Use These! Jetson Docker Containers Tutorial (09-04-2023)","text":"JetsonHacks has a in-depth tutorial on how to use jetson-containers
and even show text-generation-webui
and stable-diffusion-webui
containers in action!
Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
"},{"location":"community_articles.html#hackster-running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-08-14-2023","title":"Hackster Running a ChatGPT-Like LLM-LLaMA2 on a Nvidia Jetson Cluster (08-14-2023)","text":"Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat:
"},{"location":"community_articles.html#jetsonhacks-speech-ai-on-nvidia-jetson-tutorial-08-07-2023","title":"JetsonHacks Speech AI on NVIDIA Jetson Tutorial (08-07-2023)","text":"JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.
"},{"location":"community_articles.html#hackster-llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-07-12-2023","title":"Hackster LLM based Multimodal AI w/ Azure Open AI & NVIDIA Jetson (07-12-2023)","text":"Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.
"},{"location":"community_articles.html#hackster-how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-06-13-2023","title":"Hackster How to Run a ChatGPT-Like LLM on NVIDIA Jetson board (06-13-2023)","text":"Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.
"},{"location":"getting-started.html","title":"Getting started","text":""},{"location":"hello_ai_world.html","title":"Hello AI World","text":"Hello AI World is an in-depth tutorial series for DNN-based inference and training of image classification, object detection, semantic segmentation, and more. It is built on the jetson-inference
library using TensorRT for optimized performance on Jetson.
It's highly recommended to familiarize yourself with the concepts of machine learning and computer vision before diving into the more advanced topics of generative AI here on the Jetson AI Lab. Many of these models will prove useful to have during your development.
HELLO AI WORLD >> https://github.com/dusty-nv/jetson-inference
"},{"location":"initial_setup_jon.html","title":"Initial Setup Guide for Jetson Orin Nano Developer Kit","text":"Note
This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.
The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.
This guide explains the complete flow from opening the box, updating the firmware if needed, flashing the latest JetPack 6.0 GA image on SD card, and the initial software setup, so that you will be ready for tutorials listed on this site and other AI projects.
"},{"location":"initial_setup_jon.html#check-your-inventory","title":"Check your inventory","text":"Following item is needed or highly desired to set up your Jetson Orin Nano Developer Kit. If you don't have them in your inventory, you want to arrange them and come back to this guide once they are available.
What not come in the box - What you need/want to prepare
"},{"location":"initial_setup_jon.html#storage","title":"Storage","text":"You need either of the following set:
What you find in the box
"},{"location":"initial_setup_jon.html#jetson-orin-nano-developer-kit","title":"Jetson Orin Nano Developer Kit","text":"The Jetson Orin Nano Developer Kit consists of Jetson Orin Nano module (enlarged SO-DIMM form factor), and the reference carrier board.
It is designed to use a microSD card as the primary storage, thus the module (that has a big black heat sink with a fan) has a microSD card slot at the bottom side of the module.
"},{"location":"initial_setup_jon.html#19v-dc-power-supply","title":"19V DC power supply","text":""},{"location":"initial_setup_jon.html#overall-flow","title":"Overall flow","text":"Jetson Orin Nano Initial Setup Flowchart (Click to expand)
flowchart\n A(start) --> B{1. Check<br>Manufactured after May 2024?<br>i.e. Jetson UEFI Firmware<br>newer than version 36.0}\n B --[YES] --> O[6. Flash JetPack 6.0 GA image on microSD card]\n B --[No] --> C[2. Flash JetPack 5.1.3 image on microSD card]\n C --> D[3. Reboot] --> E{{Firmware update during reboot}}\n E --> F[4. Run QSPI updater] --> G[5. Reboot] --> H{{Firmware update during reboot}}\n H --> O\n O --> P(7. Start developing on JetPack 6.0) \n\n style C fill:#fee\n style D fill:#DEE,stroke:#333\n style G fill:#DEE,stroke:#333\n style F stroke-width:4px\n style E stroke-width:2px,stroke-dasharray: 5 5\n style H stroke-width:2px,stroke-dasharray: 5 5\n style O fill:#fee
"},{"location":"initial_setup_jon.html#1-check-if-jetson-uefi-firmware-version-363","title":"1. Check if Jetson UEFI Firmware version > 36.3
","text":"Your Jetson Orin Nano Developer Kit may have the latest firmware (\"Jetson UEFI firmware\" on QSPI-NOR flash memory) flashed at the factory.
If not, we need to go through a set of procedures to upgrade to the latest firmware. (Luckily, we can now do this all just on Jetson, meaning we don't need to use a host Ubuntu PC any more!)
So let's first check the version of your Jetson UEFI Firmware. You can take one of the following methods.
Monitor-attached Headless\ud83d\ude01I'm feeling luckyJ14
\"button\" header of carrier board located under the Jetson module. RXD
(Pin 3) Adafruit adaptor cable Green TXD
(Pin 4) Adafruit adaptor cable White GND
(Pin 7) Adafruit adaptor cable Black For the detail, refer to Jetson Orin Nano Developer Kit Carrier Board Specification.
You could skip to 6. Flash JetPack 6.0 GA image onto your microSD card, and try your luck to see if your Jetson just boots your Jetson Orin Nano Developer Kit up to the initial software set up (OEM-config).
"},{"location":"initial_setup_jon.html#360-upgrade-the-jetson-uefi-firmware-to-36x","title":"[< 36.0] Upgrade the Jetson UEFI firmware to 36.x","text":"Attention
Select the appropriate tab below based on your firmware version you found in the above step.
If you found your Jetson Orin Nano needs its firmware updated to run JetPack 6, click \" Firmware < 36.0\" tab, and then additional step 2 to 5 will appear for you to follow.
If you know your Jetson Orin Nano has the latest firmware, stay on \" Firmware 36.x\" tab, and skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)
Firmware 36.x Firmware < 36.0Your Jetson Orin Nano has the latest firmware that is ready for JetPack 6 SD card.
Skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)
Your Jetson Orin Nano needs its firmware updated in order to make JetPack 6 SD card work.
Perform the following steps (2 to 5).
"},{"location":"initial_setup_jon.html#2-flash-jetpack-513-image-onto-your-microsd-card","title":"2. Flash JetPack 5.1.3 image onto your microSD card","text":"First we need to run JetPack 5.1.3 in order let its nvidia-l4t-bootloader
package to get its bootloader/firmware updater activated, so that the firmware update automatically runs the next time it reboots.
Download SD card image on to your PC
On your PC, download JetPack 5.1.3 image for Jetson Orin Nano Developer Kit from the official JetPack 5.1.3 page or from the below direct link button.
Warning
NVIDIA had updated the JetPack 5.1.3 image on 5/28/2024, as the old version had some issue and the following process did not work.So please download and use the latest image (the new file name is JP513-orin-nano-sd-card-image_b29.zip
).
Jetson Orin Nano Developer KitJetPack 5.1.3 image
Use Balena Etcher to flash image to SD card
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the flashed microSD card into the slot on Jetson module
Power-on
Turn on the Jetson Orin Nano Developer Kit with JetPack 5.1.3 SD card inserted by plugging in the DC power supply.
Complete the initial software setup (oem-config
)
Ensure firmware update is scheduled.
Once Jetson boots into Jetson Linux system, a background service automatically runs to schedule a firmware update (if needed) to be performed during the next boot-up process.
Once you see the following, or just wait about 5 minutes after powering on to ensure the scheduling is done, reboot.
GUI CUI$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n
Reboot
Reboot your Jetson Orin Nano Developer Kit.
You should see the following during the boot up process.
With monitor Headless (serial)Once done, you will boot into JetPack 5.1.3 (again), with underlying firmware updated to 5.0-35550185
.
Double-check your firmware version is up to date (35.5.0
= JetPack 5.1.3)
Once it reboots back into Jetson Linux system, on Jetson terminal, run the following:
sudo nvbootctrl dump-slots-info\n
You should see something like the following, with the Current version indicating 35.5.0
.
Current version: 35.5.0\nCapsule update status: 0\nCurrent bootloader slot: A\nActive bootloader slot: A\nnum_slots: 2\nslot: 0, status: normal\nslot: 1, status: normal\n
Install QSPI Updater Debian package to trigger another (final) firmware update
On Jetson terminal, run the following:
sudo apt-get install nvidia-l4t-jetson-orin-nano-qspi-updater\n
Installing the nvidia-l4t-jetson-orin-nano-qspi-updater
automatically runs its script to schedule another (final) firmware update to be performed during the next boot process, so that the firmware is ready for JetPack 6.
Reboot
Once the QSPI update is scheduled, reboot your Jetson Orin Nano Developer Kit.
Observe update
You can observe the update during the boot up process.
Power off
Once the update is done, it reboots and tries to boot, however it will get stuck UNLESS you change the SD card to JetPack 6 one.
Therefore you should just power off the developer kit by disconnecting the DC power supply.
Attention
This part may look very confusing as neither the attached monitor nor the debug UART shows any explicit message on what action to take next.
What is going on here is that the Jetson's firmware (inside the QSPI-NOR flash memory) is now updated, ready for the JetPack 6 SD card, however it is now incompatible with JetPack 5.1.3 SD card left in the Jetson module's slot, so after the reboot it gets stuck in the boot process.
So there is nothing problematic with this boot halt (or endless rebooting). We just need to power-off and insert a new SD card.
Once we know the onboard firmware is up-to-date and ready for JetPack 6, we can boot Jetson Orin Nano Developer Kit with a microSD card for JetPack 6.
Download SD card image on to your PC
On your PC, download JetPack 6.0 GA image for Jetson Orin Nano Developer Kit from the official JetPack 6.0 page or from the below direct link button.
Jetson Orin Nano Developer KitJetPack 6.0 GA image
Use Balena Etcher to flash image to SD card
Insert your microSD card into your PC's SD card slot, and use Balena Etcher to flash the SD card with the image you just downloaded.
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the JetPack 6 microSD card into the slot on Jetson module
Power-on by plugging the DC power supply
Complete the initial software setup (oem-config
)
\ud83c\udf8a Congratulations! Your Jetson Orin Nano Developer Kit is set up with JetPack 6 SD card and you are ready to develop on JetPack 6.
"},{"location":"initial_setup_jon.html#next-step","title":"Next step","text":""},{"location":"initial_setup_jon.html#nvme-ssd-installation","title":"NVMe SSD installation","text":"Take a look at this page for installing NVMe SSD and setting up Docker with it.
"},{"location":"lerobot.html","title":"HuggingFace LeRobot","text":"Let's run HuggingFace LeRobot
to train Transformer-based action diffusion policies and ACT onboard NVIDIA Jetson. These models learn to predict actions for a particular task from visual inputs and prior trajectories, typically collected during teleoperation or in simulation.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 GA (L4T r36.3) JetPack 6.1 (L4T r36.4)
Sufficient storage space (preferably with NVMe SSD).
16.5GB
for lerobot
container image>2GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
This section gives the guide on how you can work through the LeRobot official example of Getting Started with Real-World Robots (7_get_started_with_real_robot.md
) on your Jetson.
Tip
It's recommended to work on your Jetson in monitor-attached mode.
lerobot
is designed to show camera view in windows and playback TTS audio while capturing dataset, so it is more convenient to setup your Jetson with its monitor (and speakers) attached to Jetson.d
jetson-container
's location","text":"Through out the course of all the workflows of lerobot
, we will be generating a lot of data, especially for capturing dataset.
We will clone the lerobot
directory on host and mount the directory in the container to keep all the data persistant, but first make sure your jetson-containers
directory is placed on your SSD, not on your eMMC or microSD card.
If you have created the jetson-containers
directory on eMMC or microSD card (likely the case if you first set up your Jetson device without SSD first and later added SSD), then use the rsync
command to move the entire directory under SSD mount point.
rsync -aHAX /home/jetson/jetson-containers/ /ssd/jetson-containers/\n
Then run the installer again.
bash /ssd/jetson-containers/install.sh\n
"},{"location":"lerobot.html#b-create-lerobot-dir-on-host","title":"b. Create lerobot
dir on host","text":"As described above, we will setup the lerobot
directory under data
directory of jetson-containers
for monting it inside the container so that generated data persist.
cd jetson-containers\n./packages/robots/lerobot/clone_lerobot_dir_under_data.sh\n./packages/robots/lerobot/copy_overlay_files_in_data_lerobot.sh\n
"},{"location":"lerobot.html#c-pulseaudio-setup","title":"c. PulseAudio setup","text":"LeRobot's dataset capture flow (control_robot.py
) utilizes Speech Dispatcher to use espeak TTS, in order to give operators audio queues for notifying the status and signaling the next operation. It's actually very helpful.
Speech Dispatcher utilizes Pulse Audio, so rather than just sharing the /dev/snd
device when docker run
(which is good for ALSA), we need to add the following arguments.
--device /dev/snd \\\n -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native \\\n -v ${XDG_RUNTIME_DIR}/pulse:${XDG_RUNTIME_DIR}/pulse \\\n
This is already added to run.sh
of jetson-containers
, however, we need to edit /etc/pulse/default.pa
in order to allow the root user access to the socket file.
sudo vi /etc/pulse/default.pa\n
Find the section loading module-native-protomocl-unix
and add auth-anonymous=1
### Load several protocols\n.ifexists module-esound-protocol-unix.so\nload-module module-esound-protocol-unix auth-anonymous=1\n.endif\nload-module module-nativ\n
Then restart PulseAudio service to make the config take effect.
pulseaudio --kill\npulseaudio --start\n
For troubleshootings or details, please check the docs.md
of speech-dispatcher
package.
It is more convenient if the lerobot programs can always find the device of leader and follower arm with unique names.
For that, we set an udev rule so that arms always get assigned the same device name as following. This is first done on Jetson host side.
/dev/ttyACM_kochleader
: Leader arm/dev/ttyACM_kochfollower
: Follower armFirst only connect the leader arm to Jetson and record the serial ID by running the following:
ll /dev/serial/by-id/\n
The output should look like this.
lrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_BA98C8C350304A46462E3120FF121B06-if00 -> ../../ttyACM1\n
Then edit the first line of ./99-usb-serial.rules
like the following.
You can find the template of this file under ./packages/robots/lerobot
directory.
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"00000000000000000000000000000000\", SYMLINK+=\"ttyACM_kochfollower\"\n
Now disconnect the leader arm, and then only connect the follower arm to Jetson.
Repeat the same steps to record the serial to edit the second line of 99-usb-serial.rules
file.
$ ll /dev/serial/by-id/\nlrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_483F88DC50304A46462E3120FF0C081A-if00 -> ../../ttyACM0\n$ vi ./packages/robots/lerobot\n
You should have ./99-usb-serial.rules
now looking like this:
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"483F88DC50304A46462E3120FF0C081A\", SYMLINK+=\"ttyACM_kochfollower\"\n
Finally copy this under /etc/udev/rules.d/
(of host), and restart Jetson.
sudo cp ./99-usb-serial.rules /etc/udev/rules.d/\nsudo reboot\n
After reboot, check if we now have achieved the desired fixed simlinks names for the arms.
ls -l /dev/ttyACM*\n
You should get something like this:
crw-rw---- 1 root dialout 166, 0 Sep 24 17:20 /dev/ttyACM0\ncrw-rw---- 1 root dialout 166, 1 Sep 24 16:13 /dev/ttyACM1\nlrwxrwxrwx 1 root root 7 Sep 24 17:20 /dev/ttyACM_kochfollower -> ttyACM0\nlrwxrwxrwx 1 root root 7 Sep 24 16:13 /dev/ttyACM_kochleader -> ttyACM1\n
"},{"location":"lerobot.html#e-optional-csi-cameras","title":"e. (Optional) CSI cameras","text":"If you plan to use CSI cameras (not USB webcams) for data capture, you will use the new --csi2webcam
options of jetson-containers
, which exposes V4L2loopback devices that performs like USB webcams (MJPEG) for CSI cameras using Jetson's hardware JPEG encoder.
This feature require some packages to be installed.
sudo apt update && sudo apt install v4l2loopback-dkms v4l-utils\n
"},{"location":"lerobot.html#f-increse-the-swap-file-size","title":"f. Increse the swap file size","text":"You may ran out of memory when are setting up to perform ACT model training.
swapoff -a -v\nsudo rm /swfile\nsudo systemctl disable nvzramconfi\nsudo fallocate -l 8G /ssd/8GB.swap\nsudo chmod 600 /ssd/8GB.swap\nsudo mkswap /ssd/8GB.swap\nsudo echo \"/ssd/8GB.swap swap swap defaults 0 0\" >> /etc/fstab\nsudo reboot\n
"},{"location":"lerobot.html#g-starting-the-lerobot-container","title":"g. Starting the lerobot
container","text":"USB webcamsCSI cameras cd jetson-containers\n./run.sh \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
cd jetson-containers\n./run.sh \\\n --csi2webcam --csi-capture-res='1640x1232@30' --csi-output-res='640x480@30' \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
"},{"location":"lerobot.html#work-with-real-world-robots-once-in-container","title":"Work with Real-World Robots - Once in container","text":"JupyerLab tip
Inside the lerobot
container, JupyterLab server process starts.
You can access with http://localhost:8888/
(or http://<IP_ADDRESS>:8888/
from other PC on the same network).
In the notebooks
, there are some Jupyter notebooks for each segment of the official tutorial Getting Started with Real-World Robots (7_get_started_with_real_robot.md
).
Please note that some of them (like notebooks/7-2_real-robot_configure-motors.ipynb
) can be used as a real work notebook to execute python codes and scritps convniently inside the notebook along with instructions (rather than switching to console).
However, keep in mind that you are encouraged to always check the original official tutorial, and some operation like training is much better executed on console.
Bash history tip
Inside the container, on the console, you can press Up key to scroll through some of the frequently used commands pre-registered in bash history.
"},{"location":"lerobot.html#q-setup-audio","title":"q. Setup audio","text":"Check if PulseAudio is available.
pactl info\n
If you need to set the default audio output device, use set-default-sink
.
pactl list short sinks\npactl set-default-sink [SINK_NAME_OR_INDEX]\n
"},{"location":"lerobot.html#1-order-and-assemble-your-koch-v11","title":"1. Order and Assemble your Koch v1.1","text":"You can order the Koch v1.1 kits from ROBOTIS. (Note: they don't come with 3d printed parts)
TODO:
Follow the Jupyter notebook 7-2_real-robot_configure-motors.ipynb
.
You should mostly operate on the container's terminal.
Follow the official document's section.
Camera config tip
The official document demonstrates the two camera positions, one at the top (\"phone\") and the other at directly in front facing the arm (\"laptop\").
In our trials, this camera placement worked, but we needed to make the camera zoom-up to the scene so that they capture better spacial resolution.
Another thing worth experimenting is the wrist cam. More to come later.
Tip
If you plan to perfom training on a different machine, scp
the dataset directory.
scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/data/${HF_USER}/\n
scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/home/<USER>/lerobot/data/${HF_USER}/\n
"},{"location":"lerobot.html#4-train-a-policy-on-your-data","title":"4. Train a policy on your data","text":"You should operate on ther container's terminal.
Follow the official document's section.
Tip
Following commands are registered in Bash history inside the lerobot
container.
wandb login\nexport HF_USER=\npython lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/koch_test_$(date +%Y%m%d_%H%M%S) \\\n --tags tutorial \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10\n
Tip
If you perform the training on other Jetson or PC, scp
the outputs directory content back to the orinal Jetson that has the leader and follower arm attached.
scp -r outputs/train/act_koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/outputs/train/ \n
"},{"location":"lerobot.html#5-evaluate-your-policy","title":"5. Evaluate your policy","text":"You should operate on the container's terminal.
Follow the official document's section.
Tip for a. Use koch.yaml
and our record
function
Modify the command in the bash history to add -p
arugment to points to the policy checkpoint.
python lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/eval_koch_test_01 \\\n --tags tutorial eval \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10 \\\n -p outputs/train/act_koch_test/checkpoints/last/pretrained_model \\\n --run-compute-stats 0\n
Tip for Visualize evaluation afterwards
python lerobot/scripts/visualize_dataset.py \\\n --root data \\\n --repo-id ${HF_USER}/eval_koch_test\n
If everything goes well, you should see
"},{"location":"lerobot.html#basic-walkthrough","title":"Basic Walkthrough","text":"This is from the lerobot top README.md.
"},{"location":"lerobot.html#visualize-datasets","title":"Visualize Datasets","text":"Outside of container, first launch the rerun.io visualization tool that LeRobot uses [\u2197]
pip3 install rerun-sdk\nrerun\n
Then, start the docker container to playback one of these LeRobot datasets.
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/visualize_dataset.py \\\n --repo-id lerobot/pusht \\\n --episode-index 0\n
"},{"location":"lerobot.html#evaluate-a-pretrained-diffusion-policy","title":"Evaluate a Pretrained Diffusion Policy","text":"This will download and run a pre-trained diffusion model on the PushT environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/eval.py \\\n -p lerobot/diffusion_pusht \\\n eval.n_episodes=10 \\\n eval.batch_size=10\n
"},{"location":"lerobot.html#train-your-own-act-policy","title":"Train your own ACT Policy","text":"Next, train ACT on the Aloha manipulation environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/train.py \\\n policy=act \\\n env=aloha \\\n env.task=AlohaInsertion-v0 \\\n dataset_repo_id=lerobot/aloha_sim_insertion_human \n
See Trossen Robotics for dual-arm ALOHA kits, and Robotis for the low-cost Koch v1.1 kit used in the LeRobot tutorial:
HuggingFace LeRobot - Get Started with Real-World Robots (YouTube Playlist)
"},{"location":"llama_vlm.html","title":"Llama 3.2 Vision","text":"The latest additions to Meta's family of foundation LLMs include multimodal vision/language models (VLMs) in 11B and 90B sizes with high-resolution image inputs (1120x1120) and cross-attention with base completion and instruction-tuned chat variants:
Llama-3.2-11B-Vision
Llama-3.2-11B-Vision-Instruct
Llama-3.2-90B-Vision
Llama-3.2-90B-Vision-Instruct
While quantization and optimization efforts are underway, we have started with running the unquantized 11B model in a container based on HuggingFace Transformers that has been updated with the latest support for Llama-3.2-Vision a jump start on trying out these exciting new multimodal models - thanks to Meta for continuing to release open Llama models!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
12.8GB
for llama-vision
container image>25GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Request access to the gated models here with your HuggingFace API key.
Today Llama-3.2-11B-Vision is able to be run on Jetson AGX Orin in FP16 via HuggingFace Transformers. Here's a simple code example from the model card for using it:
import time\nimport requests\nimport torch\n\nfrom PIL import Image\nfrom transformers import MllamaForConditionalGeneration, AutoProcessor\n\nmodel_id = \"meta-llama/Llama-3.2-11B-Vision\"\nmodel = MllamaForConditionalGeneration.from_pretrained(model_id, device_map=\"auto\", torch_dtype=torch.bfloat16)\nprocessor = AutoProcessor.from_pretrained(model_id)\n\nprompt = \"<|image|><|begin_of_text|>If I had to write a haiku for this one\"\nurl = \"https://llava-vl.github.io/static/images/view.jpg\"\nraw_image = Image.open(requests.get(url, stream=True).raw)\n\ninputs = processor(text=prompt, images=raw_image, return_tensors=\"pt\").to(model.device)\noutput = model.generate(**inputs, do_sample=False, max_new_tokens=32)\n
If I had to write a haiku for this one, it would be:\n\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\n
Initial testing seems that Llama-3.2-Vision has more conversational abilities than VLMs typically retain after VQA alignment. This llama_vision.py script has interactive completion and image loading to avoid re-loading the model. It can be launched from the container like this:
jetson-containers run \\\n -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n $(autotag llama-vision) \\\n python3 /opt/llama_vision.py \\\n --model \"meta-llama/Llama-3.2-11B-Vision\" \\\n --image \"/data/images/hoover.jpg\" \\\n --prompt \"I'm out in the\" \\\n --max-new-tokens 32 \\\n --interactive\n
After processing the initial image, it will ask you to submit another prompt or image:
total 4.8346s (39 tokens, 8.07 tokens/sec)\n\nEnter prompt or image path/URL:\n\n>> \n
We will update this page and container as support for the Llama-3.2-Vision architecture is added to quantization APIs like MLC and llama.cpp for GGUF, which will reduce the memory and latency.
"},{"location":"nerf.html","title":"Nerfstudio - Neural Reconstruction","text":"Nerfstudio is a tool for training and visualizing Neural Radiance Fields (NeRF) models for 3D volumetric reconstruction from a collection of images taken of a scene or environment. It's designed to be easy to use and starts generating imagery at the outset, which becomes further refined as training progresses. Neural reconstruction is being applied to edge devices and robotics like with FruitNeRF. This tutorial will guide you through the process of training high-quality NeRF models using nerfstudio on Jetson.
Special thanks to Johnny N\u00fa\u00f1ez Cano for porting the NeRF stack to Jetson! See NeRF in 2023: Theory and Practice by It-Jim for useful background info.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
17.6GB
for nerfstudio
container image>5GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use this command to automatically run, build, or pull a compatible container image for nerfstudio:
jetson-containers run $(autotag nerfstudio)\n
To mount your own directories into the container, use the -v
or --volume
flags: jetson-containers run -v /path/on/host:/path/in/container $(autotag nerfstudio)\n
Later a webserver will be reachable at http://localhost:7007
. The official docs include command-line usage and models [\u2197]
It's recommended to train a test scene first by following these steps from the getting started tutorial:
# Download some test data\nns-download-data nerfstudio --capture-name=poster\n\n# Train model\nns-train nerfacto --data /data/nerfstudio/models\n
After it loads, you can navigate your browser to http://localhost:7007
(or substitute your Jetson's IP address)
The 3D visualizations will begin rendering after the first training step, and progressively become more refined. This process typically takes an hour on Jetson AGX Orin to complete the default setting of 30,000 training steps. The model checkpoints are saved under the mounted jetson-containers/data/nerfstudio
directory, and generate the same output quality as before.
The creators behind FruitNeRF fused NeRFs with segmentation for more accurate counting of fruits in 3D. By training a semantic neural radiance field on both RGB and segmentation labels from SAM, the model is robust to occlusions, double-counting, and mistaken or undesirable fruits.
Extended sampling of the volumetric grid can then blob objects for pose estimation, occupancy mapping, and navigation. Such approaches are promising for combining the strengths of NeRFs for 3D reconstruction, segmentation VITs for open-vocabulary classification, and Jetson's onboard compute for high-definition environmental scanning and perception while in the field.
"},{"location":"nerf.html#data-preparation","title":"Data Preparation","text":"We'll use the FruitNeRF datasets as an example of training NeRFs on custom data with nerfstudio. To train a NeRF model on a different scene, you first need to capture a set of images and corresponding camera poses (which can be estimated with the included photogrammetry and image registration tools like COLMAP and OpenSFM as shown here)
The images should be taken from different viewpoints of the scene you want to model, structured in the nerfstudio dataset format. For FruitNeRF, you can just download and extract pre-recorded real or synthetic data from here:
Real DataSynthetic datacd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Real.zip\nunzip FruitNeRF_Real.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Synthetic.zip\nunzip FruitNeRF_Synthetic.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
"},{"location":"nerf.html#training-a-fruitnerf-model","title":"Training a FruitNeRF Model","text":"This command will generate a NeRF on the first tree (there are multiple trees to try from each dataset). You can find the complete usage information and models on the official GitHub repo: github.com/meyerls/FruitNeRF
ns-train nerfacto \\\n --data /data/nerfstudio/FruitNeRF_Dataset/tree_01 \\\n --output-dir /data/nerfstudio/models \\\n --pipeline.datamanager.camera-res-scale-factor 0.5\n
The memory usage depends on the number of images in the dataset and their resolution, so if you're running low on memory try adjusting the scaling factors and image sampling parameters. The simulated data has a lower resolution and can run at full size.
"},{"location":"openvla.html","title":"OpenVLA - Vision/Language Action Models for Embodied Robotics","text":"Fine Tuning and Deployment Guide
The tutorials's goal is to provide optimized quantization and inference for deploying VLA models, along with reference fine-tuning workflows for adapting models for new robots, tasks, and environments. Rigorous performance and accuracy validation is applied in a self-contained sim environment with scenario generation and domain randomization (MimicGen). Future phases will include sim2real with Isaac Lab and ROS2 integration, study of related models like CrossFormer and optimizations to the neural architecture for realtime performance.
\u2705 Quantization and inference optimizations for VLA models \u2705 Accuracy validation of the original OpenVLA-7B weights \u2705 Reference fine-tuning workflow with synthetic data generation \u2705 On-device training with LoRA's on Jetson AGX Orin and full fine-tuning on A100/H100 instances \u2705 85% accuracy on an example block-stacking task with domain randomization \u2705 Sample datasets and test models for reproducing results
Thank you to OpenVLA, Open X-Embodiment, MimicGen, Robosuite and many others with related work for sharing their promising research, models, and tools for advancing physical AI and robotics.
"},{"location":"openvla.html#vla-architecture","title":"VLA Architecture","text":"OpenVLA is a vision/language action model for embodied robotics and behavioral learning built on LLM/VLMs (this base model is a Prismatic VLM using Llama-7B, DINOv2, and SigLIP). Instead of image captioning or visual question/answering, VLA models generate action tokens from camera images and natural language instructions that are used for controlling the robot. Action tokens are discrete token ID's reserved from the text tokenizer's vocabulary that map to continuous values, normalized against the range of motion of each robot. These real-valued tokens are more efficient and accurate than the model outputting numerical data as text in JSON or Pydantic formats, where each digit, decimal point, separator, and whitespace takes an additional token to generate. Other hybrid vision/language models like Florence-2 have adopted similar approaches for continuous-domain prediction using Transformers.
Each action token generated by the model represents a degree-of-freedom of the output coordinate space (i.e. xyz, rotation pose), or a component of the robot that can be controlled (like the gripper). OpenVLA-7B was trained on the Open X-Embodiment dataset for manipulation, with a 7-DoF action space consisting of (delta xyz, delta roll/pitch/yaw, gripper)
. The position and rotation are relative changes to the end-effector (EEF) pose, with an external inverse kinematics (IK) solution like cuMotion solving joint constraints specific to each robotic arm. The gripper dimension is an absolute control between 0 (open) and 1 (closed) that does not recieve further scaling/normalization.
OpenVLA reserves 256 of the least-frequently used tokens out of the Llama-7B vocabulary for action values, which gives it 8-bit resolution over the controls. It has an input image resolution of 224x224 to stacked DINOv2/SigLIP vision encoders that are projected to ~275 input tokens (plus the text prompt), and outputs 7 tokens mapped to (\u0394pos, \u0394rotation, gripper)
coordinates.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>15GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Support for OpenVLA has been added to NanoLLM on top of its streaming VLM pipeline with INT4/FP8 quantization using MLC and vision encoders in FP16 with TensorRT. First we'll test the model on BridgeData V2, one of the top weighted datasets from the Open X-Embodiment collection. The model was trained on this data and is used to confirm that the quantization and inference are working correctly during deployment. This is what the dataset looks like, courtesy of their website:
The following command starts the container, downloads the dataset and model (if needed), quantizes it on the first run, and measures the accuracy of the action values against the groundtruth from the dataset using normalized mean-squared error (NRMSE) to unbias the varying ranges each dimension of the action space can have. We extracted a 100-episode subset of the original Bridge data here on HuggingFace Hub, so you don't need to download the entire ~400GB dataset just for these tests.
INT4FP8FP16jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model openvla/openvla-7b \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp16.json\n
Quantization Accuracy Latency FPS FP16 95.3% 840 ms 1.19 FP8 95.2% 471 ms 2.12 INT4 90.1% 336 ms 2.97 These results were run on Jetson AGX Orin 64GB with JetPack 6, and we will see later with our fine-tuned model the INT4 accuracy match FP8/FP16.
Each frame, the 7D action vector predicted by the model is printed along with the groundtruth, along with the accuracy, latency, and framerate for that frame. The numbers printed after ~
are the averages of those so far, with the last value reported being the mean over the entire dataset processed.
# INT4\nstep 355 [-0.02692 0.00776 -0.00299 0.08160 0.07292 0.04791 0.99608] accuracy 0.8466 ~0.9017 time=336.2 ms fps=2.96 ~2.97\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP8\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9523 time=469.7 ms fps=2.13 ~2.12\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP16\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9531 time=842.3 ms fps=1.19 ~1.18\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n
The per-frame metrics and averages can be saved with the --save-stats
argument, and in the interests of time you can cap the amount of episodes processed with --max-episodes
. As mentioned above, the Bridge dataset used was included in the training dataset, and further below we run this again on data we generated not from the training dataset with significant variation. This tool can also load other datasets in RLDS/TFDS format from Open X-Embodiment, and HDF5 from Robomimic/MimicGen. You can also create your own agents and scripts using the exposed APIs from the coding examples below.
The code is simple for running VLA inference on camera streams using the NanoLLM library in the container:
VLA on Videofrom nano_llm import NanoLLM\nfrom nano_llm.plugins import VideoSource\n\n# load vision/language action model\nmodel = NanoLLM.from_pretrained(model, quantization='q4f16_ft')\ncamera = VideoSource(video_source, cuda_stream=0)\n\nassert(model.vla) # make sure this is a VLA\n\nwhile True:\n # returns a cudaImage, np.ndarray, or torch.Tensor on the GPU\n image = camera.capture()\n\n if image is None: # in case of timeout, keep trying\n continue\n\n # returns a np.ndarray or torch.Tensor with vla.dof elements\n # for OpenVLA, this is (\u0394x, \u0394y, \u0394z, \u0394roll, \u0394pitch, \u0394yaw, gripper)\n actions = model.vla.predict_action(\n image, \n instruction=\"pick up the nearest object\", \n action_space=\"normalized\",\n return_tensors='np',\n )\n\n # send the actions to your robot controller or IK solver\n ...\n
VLA models are also supported in Agent Studio, which includes the simulator components as well.
"},{"location":"openvla.html#online-validation","title":"Online Validation","text":"Given the challenging task domain, dynamic feedback loops, and computational demands for sim/training/inference, using VLAs for language-guided dexterous manipulation involves a significant increase in complexity over baseline usage of LLMs and VLMs. To go from predicting logits at the token level to actions consistently correct enough over an extended series of frames to form useful behaviors, it's important to cross-check outputs and measure accuracy at each stage of the training/inference workflow to be able to identify the source of potential regressions when they occur.
Unlike typical applications in supervised learning, the metrics for end-task completion and success aren't measured from static pre-recorded datasets that don't account for the temporal domain and feedback from physical interactions along with compounding errors - they require online validation, either in simulation or real-world tests.
Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab
During training the token classification accuracy is measured from the groundtruth action labels (i.e. how many action tokens were predicted exactly right), with the model optimizing to minimize this loss (as is normal for LLMs). Action accuracy in the continuous domain is also is also measured during training from the L1 error of the detokenized real-valued outputs. Continuous action accuracy trends slightly higher than token classification accuracy, as the later does not provide any reward for being closer to the desired result. In practice, these should be >95% accurate at this level for completing tasks successfully in similar environments. To achieve that high degree of accuracy, it seems intentional in the work and related research to overfit the model by training it for many epochs (upwards of 30 epochs on the same 900K episodes for OpenVLA). Transformers are known to recall specific knowledge from few training examples, and are sensitive to overfitting and forgetting previously learned information. As such, LLMs are normally only trained for a few epochs at most to preserved their zero-shot capabilities and ability to generatize to out-of-distribution inputs. During the fine-tuning part of this project, we characterize the impacts on model accuracy and task success from the number of distinct training episodes versus the number of epochs over repeated data.
The actual task success rate doesn't get measured until the inference stage, when it is either connected to a simulator or physically tested in a series of time-consuming trials under similar conditions. We integrated MimicGen directly with the OpenVLA training scripts for an endless source of unseen data, but encountered gradient instabilities after the model had received a significant number of episodes.
"},{"location":"openvla.html#simulation-with-mimicgen","title":"Simulation with MimicGen","text":"MimicGen creates randomized episodes from as few as 10 teleoperated examples by utilizing scene graph information and task/subtask metadata about which objects in the environment are targets of the current subtask, in order to interpolate the original teloperated trajectories into their new random locations and poses. This generates large amounts of unique training data to improve robustness, without needing large amounts of human effort for the robot learning new skills and behaviors.
MimicGen is built on the Robomimic and Robosuite simulators and are able to run onboard Jetson headlessly alongside the VLA, simplifying the setup for reproducibility. The RoboCasa project is built on MimicGen and being integrated with NVIDIA Omniverse and OSMO, and in future work we'd use Isaac Lab for scalability, more accurate physics, and photorealistic rendering.
MimicGen includes 12 tasks like block stacking, pick and place, assembly, and kitchen scenarios. And each type of task has variants increasing in difficulty as learning progresses, which would be interesting to compare curated approaches to the purely random sequencing that OpenVLA uses with Open X-Embodiment. In this phase of the tutorial, we focus on the block stacking task to understand the training requirements and runtime performance needed to master a new task with success rates of >75-80%, similar to the paper. This will help inform scaling to multiple behaviors and more complex scenarios that vary significantly from in-distribution examples like the MimicGen environments (as evidenced by the original OpenVLA weights scoring zero successes in them).
"},{"location":"openvla.html#data-generation","title":"Data Generation","text":"We built MimicGen containers for Jetson from a fork of the code with some patches for aarch64+igpu along with enhancements like generation of natural language labels with random variations for the relevant tasks, along with additional domain randomization for the colors/materials of objects (these environments were added as Stack_D2
, Stack_D3
, and Stack_D4
). For training OpenVLA, the images and labels are saved to disk, whereas later inference is done with online simulation to measure the task success rate. To that effect we integrated MimicGen with Agent Studio for interactively testing the models and quickly dropping in components like ASR for verbally commanding the robot.
Online Training
There's initial support for direct integration of MimicGen in this fork of OpenVLA for live simulation and validation during training and endless episodes without repeating epochs. The models experienced spiking gradients later into LoRA's, and should try again with lower learning rates or by similarly integrating MimicGen into their full fine-tuning script using FDSP for increasing the batch size on dGPU.
This command will generate the specified number of training episodes, saved in Robomimic HDF5 format. We provide the rendered datasets for these on HuggingFace Hub with 1000 and 2500 episodes. OpenVLA suggests only needing 10-150 episodes for fine-tuning and data-efficient adaptation, which perhaps performs similarly in comparable spaces, but we ultimately found insufficient for the MimicGen environments.
jetson-containers run $(autotag nano_llm) \\\n python3 -m mimicgen.generate \\\n --tasks Stack_D4 \\\n --episodes 100 \\\n --output /data/datasets/mimicgen \\\n --cameras agentview \\\n --camera-width 224 \\\n --camera-height 224\n
The HDF5 dataset will be saved to /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5
(which is in a mounted volume under your jetson-containers/data
directory outside of the container), along with a video of sample episodes that were rendered:
This video is actually of Stack_D2 to avoid subjecting everyone to flickering colors. Stack_D4 is used for training and generates blocks with random colors and positions each frame, along with language labels augmented through the random combination of various nouns, adjectives, and verbs that form the instruction (Stack the red block on the green block
, Put the little cube on top
). Stack_D3 randomizes colors/positions each frame, and instructions each episode. Stack_D2 does them all per-episode (which is typically used at runtime). Since OpenVLA uses a single frame at a time with no temporal aspect during training, applying domain randomization per-frame as opposed to per-episode is feasible provides more variance in the dataset. The block-stacking episodes typically come out to be around ~110 frames each, and take around 10-15 seconds to generate per episode on Jetson AGX Orin with per-frame domain randomization, and 5 seconds per episode without domain randomization.
The agentview camera looks onward from the front of the scene towards the robot. There are others available like sideview and eye_in_hand (wrist view) - we tried using the onboard wrist camera, but found the model would too easily veer off track and get 'lost' offscreen. It may be possible for wrist-only to work should the dataset add examples of the robot recovering and returning to a wider vantage point. Other VIT-based embodied models like Octo and CrossFormer use both cameras, and is a future experiment with VLA's based on multi-image VLM's like VILA.
"},{"location":"openvla.html#rlds-conversion","title":"RLDS Conversion","text":"OpenVLA uses datasets in RLDS format (which is based on TFDS), so we provide a converter from HDF5. This extra step can also be time-consuming for a large number of epiodes, like those used here. This is one of the reasons we desire to run MimicGen online with training and performed the initial integration directly with OpenVLA. Unless you are generating different data, you can skip this and use the MimicGen datasets that we uploaded here in RLDS format.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.datasets \\\n --dataset /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5 \\\n --dataset-type mimicgen \\\n --convert rlds \\\n --remap-keys agentview:image \\\n --output /data/datasets/mimicgen/rlds/stack_d4_ep2500\n
This will create a set of tfrecord files under the output directory that are able to be loaded by the OpenVLA training scripts.
"},{"location":"openvla.html#fine-tuning","title":"Fine Tuning","text":"A primary objective of this project is to characterize the training needed to adapt the model to different robots and tasks. Our development primarily consisted of running test LoRA's onboard Jetson AGX Orin 64GB and debugging issues locally, and when the results were encouraging to perform a full fine-tuning with FDSP on multiple A100/H100's from spot instance providers like Brev.dev, Vast.ai, and RunPod. Full fine-tuning on 2x Jetson AGX Orin's was attempted with FDSP, but ran out of memory with all settings that we tried. We provide the test models we trained on HuggingFace for others to try in the inference + sim setup below. Below are the training GPU configurations that were used, along with their batch sizes that maximized memory usage:
Batch Size FPS $/hr Jetson AGX Orin 64GB 8 (LoRA) 1.57 - 2x A100 SMX4 80GB 48 13.07 ~$1.50 8x H100 NVL 94GB 256 92.4 ~$25The rental fees are ballpark averages over the spot instances available with these GPUs at the time of writing, and becomes quite reasonable when used alongside a Jetson repurposed for training daily test LoRA's on a reduced amount of data. Training until convergence on Jetson and 2xA100 took roughly 24-36 hours depending on the amount of data and number of epochs. We kept to <5 epochs for the full fine-tunes in an attempt to prevent the afformentioned overfitting, instead opting to increase the number of episodes.
Below we provide the steps to run the OpenVLA LoRA training on Jetson, and for the dGPU systems refer to Fully Fine-Tuning OpenVLA. Typically you will launch a spot instance with your provider of choice in a CUDA or PyTorch container, then install the OpenVLA repo and its dependencies with pip, and download your dataset to the system before launching the command (or create a bundled container with it all included to save time). Here's the WandB Dashboard from the full fine-tuning runs that you can inspect, comparing a fewer number of episodes for more epochs, versus a larger number of episodes trained for fewer epochs:
"},{"location":"openvla.html#on-device-lora","title":"On-Device LoRA","text":"The OpenVLA repo provides working training scripts for LoRA/qLoRA and multi-node multi-GPU full fine-tunes using PyTorch FDSP. It was not difficult to go in a make changes and enhancements, some of which we have done for our purposes of on-device training in this fork. Overall we found the process to be more similar than not to training other vision DNNs, just with larger datasets and rigorous validation required of the data pipeline that all the coordinate spaces and transformations matched up at every step of the sim\u2192training\u2192inference workflow.
We built an OpenVLA container for JetPack that runs the LoRA training, which you can find the specific documentation about from the OpenVLA readme (it's also recommended to read their paper which includes many insights into the training process).
jetson-containers run $(autotag openvla) \\\n torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py \\\n --vla_path openvla/openvla-7b \\\n --data_root_dir /data/datasets/mimicgen/rlds \\\n --dataset_name stack_d4_ep2500 \\\n --run_root_dir /data/models/openvla \\\n --lora_rank 32 \\\n --batch_size 8 \\\n --grad_accumulation_steps 2 \\\n --learning_rate 5e-4 \\\n --image_aug False \\\n --save_steps 250 \\\n --epochs 5\n
This will start a TensorBoard server on port 6006 to monitor the training progress. Typically you would set the script running for more epochs than you intend to actually run, so that you can instead stop when the model converges (typically occurring with a loss below 0.5 and token accuracy >95%). This script was adapted so that if you interrupt training by pressing Ctrl+D
from the terminal, it will gracefully stop early and still merge the LoRA weights before exiting. If training is terminated otherwise, we added a merge.py
script that you should run afterwards get the model ready for inference.
Now that we have trained our test model (or you can download one from here), let's re-validate it again like we did above on the original OpenVLA model, but this time on unseen data from MimicGen with a different random seed (dusty-nv/mimicgen-stack_d4-ep100
). These commands will download and run the fully fine-tuned checkpoint (on 2500 episodes for 4 epochs) that we released to (dusty-nv/openvla-7b-mimicgen
. If you trained your own model, you can substitute the local path to the HF checkpoint.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp16.json\n
The results from this are collated in the next section along with the end-task success rates. Time to see it in action!
"},{"location":"openvla.html#inference-simulation","title":"Inference + Simulation","text":"To measure how well our model actually performs at completing the task, we spin up a MimicGen environment in Agent Studio that's connected to the VLA model. It counts the number of successful episodes by checking the reward issued by the sim, which is not used by the model but signals when the task was completed. We use a horizon of 200 frames for evaluation, after which it is deemed to be a failure.
INT4FP8FP16jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-INT4\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP8\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP16\n
To start the benchmarking, connect the output of the AutoPrompt
node to the OpenVLA
node. On its own it will run forever - we did it for 100 episodes each, which can take several hours since the sim operates in lock step with the model (future experiments will train on actions accumulated from multiple timesteps and also reduce the model size to improve performance).
This is using the model fine-tuned on 2500 episodes for 4 epochs, and although the task may have been simple, is evidence of achieving the sought-after success rates of ~85%. Quantization has a negligible ~1% impact while scaling performance almost linearly. The average number of frames is how long it took the robot to complete the task, which efficiency is another important end-metric to evalulate models by (consider the source teleop episodes were ~110 frames long, and we realized after that these averages include failed episodes during evaluation). The training dataset dusty-nv/bridge_orig_ep2500
was used to measure the action Train Accuracy, while the previously unused and distinct dusty-nv/bridge_orig_ep100
was used for Validation Accuracy.
Upon measuring the success rates of the other fine-tuned models that were trained on fewer episodes for more epochs, we can see the impact of increasing the size of the dataset:
Episodes Epochs Frames Task Success Avg Frames 500 10 550K 23% 186 1000 6 660K 48% 165 2500 4 1.1M 84% 138That isn't to say that the 2500-episode model still isn't overfit - it is after learning to always stack the smallr block and essentially ignore the instructions. More task diversity in the training is required, which we can do now that we have the workflow. And we want to branch out into real-world domains instead of test environments in simulation to prove model viability. The amount of variance and data required to achieve generalist models in the challenging problem space of manipulation does raise interesting questions about purely random task learning versus more guided and curated approaches that ramp up in complexity as training progresses.
"},{"location":"openvla.html#future-research","title":"Future Research","text":"Embodied agents are an exciting emerging area at the forefront of robotics and physical AI, with many promising avenues to investigate further. Follow-up publications to OpenVLA include Embodied-CoT and CrossFormer, along with others sure to be in development. In addition to proceeding to physical testing, these areas of interest we'll experiment with as research progresses:
The Jetson AI Lab Research Group is a global collective for advancing open-source Edge ML, open to anyone to join and collaborate with others from the community and leverage each other's work. Our goal is using advanced AI for good in real-world applications in accessible and responsible ways. By coordinating together as a group, we can keep up with the rapidly evolving pace of AI and more quickly arrive at deploying intelligent multimodal agents and autonomous robots into the field.
There are virtual meetings that anyone is welcome to join, offline discussion on the Jetson Projects forum, and guidelines for upstreaming open-source contributions.
Next Meeting - 10/15
The next team meeting is on Tuesday, October 15th at 9am PST - see the invite below or click here to join the meeting in progress.
"},{"location":"research.html#topics-of-interest","title":"Topics of Interest","text":"These are some initial research topics for us to discuss and investigate. This list will vary over time as experiments evolve and the SOTA progresses:
\u2022 Controller LLMs for dynamic pipeline code generation \u2022 Fine-tuning LLM/VLM onboard Jetson AGX Orin 64GB \u2022 HomeAssistant.io integration for smart home [1] [2] \u2022 Continuous multi-image VLM streaming and change detection \u2022 Recurrent LLM architectures (Mamba, RKVW, ect) [1] \u2022 Lightweight low-memory streaming ASR/TTS models \u2022 Diffusion models for image processing and enhancement \u2022 Time Series Forecasting with Transformers [1] [2] \u2022 Guidance, grammars, and guardrails for constrained output \u2022 Inline LLM function calling / plugins from API definitions \u2022 ML DevOps, edge deployment, and orchestration \u2022 Robotics, IoT, and cyberphysical systems integrationNew topics can be raised to the group either during the meetings or on the forums (people are welcome to work on whatever they want of course)
"},{"location":"research.html#contribution-guidelines","title":"Contribution Guidelines","text":"When experiments are successful, ideally the results will be packaged in such a way that they are easily reusable for others to integrate into their own projects:
Ongoing technical discussions are encouraged to occur on the forums or GitHub Issues, with status updates on projects given during the meetings.
"},{"location":"research.html#meeting-schedule","title":"Meeting Schedule","text":"We'll aim to meet monthly or bi-weekly as a team in virtual meetings that anyone is welcome to join and speak during. We'll discuss the latest updates and experiments that we want to explore. Please remain courteous to others during the calls. We'll stick around after for anyone who has questions or didn't get the chance to be heard.
Tuesday October 15th at 9am PST (10/15/24)
264 770 145 196
Uwbdgj
Jetson AI Lab Research Group.ics
The agenda will be listed here beforehand - post to the forum to add agenda items. The meetings will be recorded so anyone unable to attend live can watch them after.
"},{"location":"research.html#past-meetings","title":"Past Meetings","text":"October 1, 2024Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
Topics Covered:
NousResearch/Hermes-2-Pro-Llama-3-8B
)Topics Covered:
Topics Covered:
jetson-container
UpdatesProject Kickoffs:
Below are shown some of the sustaining members of the group who have been working on generative AI in edge computing:
Dustin Franklin, NVIDIA Principal Engineer | Pittsburgh, PA (jetson-inference, jetson-containers) Nurgaliyev Shakhizat Institute of Smart Systems and AI | Kazakhstan (Assistive Devices, Vision2Audio, HPC) Kris Kersey, Kersey Fabrications Embedded Software Engineer | Atlanta, GA (The OASIS Project, AR/VR, 3D Fabrication) Johnny N\u00fa\u00f1ez Cano PhD Researcher in CV/AI | Barcelona, Spain (Recurrent LLMs, Pose & Behavior Analysis) Doruk S\u00f6nmez, ConnectTech Intelligent Video Analytics Engineer | Turkey (NVIDIA DLI Certified Instructor, IVA, VLM) Akash James, Spark Cognition AI Architect, UC Berkeley Researcher | Oakland (NVIDIA AI Ambassador, Personal Assistants) Mieszko Syty, MS/1 Design AI/ML Engineer | Warsaw, Poland (LLM, Home Assistants, ML DevOps) Jim Benson, JetsonHacks DIY Extraordinaire | Los Angeles, CA (AI in Education, RACECAR/J) Chitoku Yato, NVIDIA Jetson AI DevTech | Santa Clara, CA (JetBot, JetRacer, MinDisk, Containers) Dana Sheahen, NVIDIA DLI Curriculum Developer | Santa Clara, CA (AI in Education, Jetson AI Fundamentals) Sammy Ochoa, NVIDIA Jetson AI DevTech | Austin, TX (Metropolis Microservices) John Welsh, NVIDIA (NanoOWL, NanoSAM, JetBot, JetRacer, torch2trt, trt_pose, Knowledge Distillation) Dave Niewinski Dave's Armoury | Waterloo, Ontario (GLaDOS, Fetch, Offroad La-Z-Boy, KUKA Bot) Gary Hilgemann, REBOTNIX CEO & AI Roboticist | L\u00fcnen, Germany (GUSTAV, SPIKE, VisionTools, GenAI) Elaine Wu, Seeed Studio AI & Robotics Partnerships | Shenzhen, China (reComputer, YOLOv8, LocalJARVIS, Voice Bot) Patty Delafuente, NVIDIA Data Scientist & UMBC PhD Student | MD (AI in Education, DLI Robotics Teaching Kit) Song Han, MIT HAN Lab NVIDIA Research | Cambridge, MA (Efficient Large Models, AWQ, VILA) Bryan Hughes, Mimzy AI Founder, Entrepreneur | SF Bay Area (Multimodal Assistants, AI at the Edge) Tianqi Chen, CMU Catalyst OctoML, CTO | Seattle, WA (MLC, Apache TVM, XGBoost) Michael Gr\u00fcner, RidgeRun Team Lead / Architect | Costa Rica (Embedded Vision & AI, Multimedia) Jesse Flot, CMU Robotics Academy Co-Director | Pittsburgh, PA (Applied AI & Robotics, Autonomy Foundations) Paul DeCarlo, Microsoft Professor | University of Houston (Azure IoT, Video Analytics, Microsoft JARVIS) Mike Hansen, Nabu Casa Voice Engineering Lead | Urbandale, Iowa (Home Assistant, Piper TTS, Wyoming) Lakshantha Dissanayake, Ultralytics Embedded CV Engineer | Vancouver, BC (YOLOv8, TensorRT, DeepStream) Kerry Shih, GenAI Nerds Founder, CEO | Los Angeles, CA (Gapi) Ziad Youssfi, CMU ECE Professor | Pittsburgh, PA (ML in Robotics & Embedded Systems) Walter Lucetti, Stereolabs Robotics & Vision Engineer | Paris, France (MyzharBot, ROS2, GStreamer) Raffaello Bonghi, NVIDIA AI & Robotics Engineer | Manchester, UK (Nanosaur, Panther, jetson-stats) Alvaro Costa, ANS Group AI & Robotics Lead | Epsom, UK (TeknTrash, StipraPOD) David Pearson, ConnectTech Embedded Systems Engineer | Ontario, CA (Edge AI Systems, Vision/Language Models) Jason Seawall, Numurus CEO | Seattle, WA (NEPI, Edge AI & Automation) Martin Cerven AI Researcher | Germany (Autonomous Robotics, Voicecraft) Romil Shah, Amazon GenAI IIoT @ AWS | San Jose, CA (aws-samples/genai-at-edge
) Kabilan Kb, Roboticist NVIDIA DLI Ambassador | Tamil Nadu, IN (ROS2 Tutorials, Autonomous Wheelchair)"},{"location":"ros.html","title":"ROS2 Nodes for Generative AI","text":"The ros2_nanollm
package provides ROS2 nodes for running optimized LLM's and VLM's locally inside a container. These are built on NanoLLM and ROS2 Humble for deploying generative AI models onboard your robot with Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm:humble
container image>10GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Recommended
Before you start, please review NanoVLM and Live LLaVa demos. For primary documentation, view ROS2 NanoLLM.
Ensure you have a camera device connected
ls /dev/video*\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nano_llm:humble) \\\n ros2 launch ros2_nanollm camera_input_example.launch.py\n
This command will start the launch file of the container.
By default this will load the Efficient-Large-Model/Llama-3-VILA1.5-8B
VLM and publish the image captions and overlay to topics that can be subscribed to by your other nodes, or visualized with RViz or Foxglove. Refer to the ros2_nanollm
repo for documentation on the input/output topics that are exposed.
To build your own ROS2 node using LLM or VLM, first create a ROS 2 workspace and package in a directory mounted to the container (following the ROS 2 Humble Documentation). Your src folder should then look like this:
\u2514\u2500\u2500 src \n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 launch \n \u2514\u2500\u2500 camera_input.launch.py\n \u251c\u2500\u2500 resource\n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 your-package-name\n \u2514\u2500\u2500 __init__.py \n \u2514\u2500\u2500 your-node-name_py.py\n \u251c\u2500\u2500 test\n \u2514\u2500\u2500 test_copyright.py\n \u2514\u2500\u2500 test_flake8.py\n \u2514\u2500\u2500 test_pep257.py\n \u251c\u2500\u2500 package.xml\n \u251c\u2500\u2500 setup.cfg\n \u251c\u2500\u2500 setup.py\n \u2514\u2500\u2500 README.md\n
We will create the launch folder, as well as the camera_input.launch.py and your-node-name_py.py files in later steps.
"},{"location":"ros.html#editing-the-setup","title":"Editing the Setup","text":"Let\u2019s begin by editing the setup.py
file. At the top of the file, add
from glob import glob \n
In the setup method, find the data_files=[]
line, and make sure it looks like this:
data_files=[\n ('share/ament_index/resource_index/packages',\n ['resource/' + package_name]),\n ('share/' + package_name, ['package.xml']),\n ('share/' + package_name, glob('launch/*.launch.py')),\n ],\n
Edit the maintainer line with your name. Edit the maintainer email to your email. Edit the description line to describe your package.
maintainer='kshaltiel', \nmaintainter_email='kshaltiel@nvidia.com', \ndescription='YOUR DESCRIPTION', \n
Find the console_scripts
line in the entry_points method. Edit the inside to be:
'your-node-name_py = your-package-name.your-node-name_py:main'\n
For example:
entry_points={\n 'console_scripts': [\n 'nano_llm_py = ros2_nanollm.nano_llm_py:main'\n ],\n },\n
All done for this file!"},{"location":"ros.html#creating-the-node","title":"Creating the Node","text":"Inside your package, under the folder that shares your package's name and contains the __init__.py
file, create a file named after your node. For NanoLLM, this file would be called nano_llm_py.py
.
Paste the following code into the empty file:
import rclpy \nfrom std_msgs.msg import String\nfrom sensor_msgs.msg import Image\nfrom cv_bridge import CvBridge\nfrom PIL import Image as im\nfrom MODEL_NAME import NECESSARY_MODULES\n\nclass Your_Model_Subscriber(Node):\n\n def __init__(self):\n super().__init__('your_model_subscriber')\n\n #EDIT PARAMETERS HERE \n self.declare_parameter('param1', \"param1_value\") \n self.declare_parameter('param2', \"param2_value\")\n\n # Subscriber for input query\n self.query_subscription = self.create_subscription(\n String,\n 'input_query',\n self.query_listener_callback,\n 10)\n self.query_subscription # prevent unused variable warning\n\n # Subscriber for input image\n self.image_subscription = self.create_subscription(\n Image,\n 'input_image',\n self.image_listener_callback,\n 10)\n self.image_subscription # prevent unused variable warning\n\n # To convert ROS image message to OpenCV image\n self.cv_br = CvBridge() \n\n #LOAD THE MODEL\n self.model = INSERT_MODEL.from_pretrained(\"PATH-TO-MODEL\")\n\n #chatHistory var \n self.chat_history = ChatHistory(self.model)\n\n ## PUBLISHER\n self.output_publisher = self.create_publisher(String, 'output', 10)\n self.query = \"Describe the image.\"\n\n def query_listener_callback(self, msg):\n self.query = msg.data\n\n def image_listener_callback(self, data): \n input_query = self.query\n\n # call model with input_query and input_image \n cv_img = self.cv_br.imgmsg_to_cv2(data, 'rgb8')\n PIL_img = im.fromarray(cv_img)\n\n # Parsing input text prompt\n prompt = input_query.strip(\"][()\")\n text = prompt.split(',')\n self.get_logger().info('Your query: %s' % text) #prints the query\n\n #chat history \n self.chat_history.append('user', image=PIL_img)\n self.chat_history.append('user', prompt, use_cache=True)\n embedding, _ = self.chat_history.embed_chat()\n\n #GENERATE OUTPUT\n output = self.model.generate(\n inputs=embedding,\n kv_cache=self.chat_history.kv_cache,\n min_new_tokens = 10,\n streaming = False, \n do_sample = True,\n )\n\n output_msg = String()\n output_msg.data = output\n self.output_publisher.publish(output_msg)\n self.get_logger().info(f\"Published output: {output}\")\n\ndef main(args=None):\n rclpy.init(args=args)\n\n your_model_subscriber = Your_Model_Subscriber()\n\n rclpy.spin(your_model_subscriber)\n\n # Destroy the node explicitly\n # (optional - otherwise it will be done automatically\n # when the garbage collector destroys the node object)\n nano_llm_subscriber.destroy_node()\n rclpy.shutdown()\n\nif __name__ == '__main__':\n main()\n
Edit the import statement at the top of the file to import the necessary modules from the model.
Next, edit the class name and name inside the __init__()
function to reflect the model that will be used.
Find the comment that reads #EDIT PARAMETERS HERE
. Declare all parameters except for the model name following the format in the file. Under the #LOAD THE MODEL
comment, include the path to the model.
Lastly, edit the generate method under the GENERATE OUTPUT
comment to include any additional parameters.
All done for this file!
"},{"location":"ros.html#creating-the-launch-file","title":"Creating the Launch File","text":"Inside your package, create the launch folder. Create your launch file inside of it.
mkdir launch\ncd launch \ntouch camera_input.launch.py\n
You can edit this file externally, and it will update within the container. Paste the following code into the empty file.
from launch import LaunchDescription\nfrom launch_ros.actions import Node\nfrom launch.substitutions import LaunchConfiguration\nfrom launch.actions import DeclareLaunchArgument\n\ndef generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'param1',\n default_value='param1_default',\n description='Description of param1'),\n DeclareLaunchArgument(\n 'param2',\n default_value='param2_default',\n description='Description of param2'),\n ]\n\n\n #Your model parameters \n param1 = LaunchConfiguration('param1')\n param2 = LaunchConfiguration('param2')\n\n #camera node for camera input\n cam2image_node = Node(\n package='image_tools',\n executable='cam2image',\n remappings=[('image', 'input_image')],\n )\n\n #model node\n model_node = Node(\n package='your-package-name', #make sure your package is named this\n executable='your-node-name_py', \n parameters=[{\n 'param1': param1, \n 'param2': param2,\n }]\n )\n\n final_launch_description = launch_args + [cam2image_node] + [model_node]\n\n return LaunchDescription(final_launch_description)\n
Find the required parameters for your model. You can view this by looking at the Model API for your specific model and taking note to how the model is called. For example, NanoLLM retrieves models through the following:
model = NanoLLM.from_pretrained(\n \"meta-llama/Llama-3-8b-hf\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n
The parameters for NanoLLM would be the model name, api, and quantization.
In the generate_launch_description
function, edit the DeclareLaunchArgument
to accomodate for all parameters except the model name. For NanoLLM, this would look like:
def generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'api',\n default_value='mlc',\n description='The model backend to use'),\n DeclareLaunchArgument(\n 'quantization',\n default_value='q4f16_ft',\n description='The quantization method to use'),\n ]\n
Then edit the lines under #Your model Parameters
to match the parameters of your model, again excluding the model name. Lastly, fill in the code under the #model node
comment with your package name, the name of your node file, and all of your parameters, this time including the model.
All done for this file!
"},{"location":"tips_ram-optimization.html","title":"RAM Optimization","text":"Running a LLM requires a huge RAM space.
Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models.
Here we share a couple of ways to optimize the system RAM usage.
"},{"location":"tips_ram-optimization.html#disabling-the-desktop-gui","title":"Disabling the Desktop GUI","text":"If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).
You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:
$ sudo init 3 # stop the desktop\n# log your user back into the console (Ctrl+Alt+F1, F2, ect)\n$ sudo init 5 # restart the desktop\n
If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:
To disable desktop on boot
sudo systemctl set-default multi-user.target\n
To enable desktop on boot
sudo systemctl set-default graphical.target\n
sudo systemctl disable nvargus-daemon.service\n
"},{"location":"tips_ram-optimization.html#mounting-swap","title":"Mounting Swap","text":"If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:
If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.
sudo systemctl disable nvzramconfig\nsudo fallocate -l 16G /ssd/16GB.swap\nsudo mkswap /ssd/16GB.swap\nsudo swapon /ssd/16GB.swap\n
Then add the following line to the end of /etc/fstab to make the change persistent:
/ssd/16GB.swap none swap sw 0 0\n
"},{"location":"tips_ssd-docker.html","title":"Tips - SSD + Docker","text":"Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using jetson-containers
, you want to make sure you have a huge storage space for all the containers and the models you will download.
We are going to show how you can install SSD on your Jetson, and set it up for Docker.
"},{"location":"tips_ssd-docker.html#ssd","title":"SSD","text":""},{"location":"tips_ssd-docker.html#physical-installation","title":"Physical installation","text":"Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:
lspci\n
The output should look like the following:
0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)\n
Run lsblk
to find the device name.
lsblk\n
The output should look like the following:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT\nloop0 7:0 0 16M 1 loop \nmmcblk1 179:0 0 59.5G 0 disk \n\u251c\u2500mmcblk1p1 179:1 0 58G 0 part /\n\u251c\u2500mmcblk1p2 179:2 0 128M 0 part \n\u251c\u2500mmcblk1p3 179:3 0 768K 0 part \n\u251c\u2500mmcblk1p4 179:4 0 31.6M 0 part \n\u251c\u2500mmcblk1p5 179:5 0 128M 0 part \n\u251c\u2500mmcblk1p6 179:6 0 768K 0 part \n\u251c\u2500mmcblk1p7 179:7 0 31.6M 0 part \n\u251c\u2500mmcblk1p8 179:8 0 80M 0 part \n\u251c\u2500mmcblk1p9 179:9 0 512K 0 part \n\u251c\u2500mmcblk1p10 179:10 0 64M 0 part \n\u251c\u2500mmcblk1p11 179:11 0 80M 0 part \n\u251c\u2500mmcblk1p12 179:12 0 512K 0 part \n\u251c\u2500mmcblk1p13 179:13 0 64M 0 part \n\u2514\u2500mmcblk1p14 179:14 0 879.5M 0 part \nzram0 251:0 0 1.8G 0 disk [SWAP]\nzram1 251:1 0 1.8G 0 disk [SWAP]\nzram2 251:2 0 1.8G 0 disk [SWAP]\nzram3 251:3 0 1.8G 0 disk [SWAP]\nnvme0n1 259:0 0 238.5G 0 disk \n
Identify the device corresponding to your SSD. In this case, it is nvme0n1
.
Format the SSD, create a mount point, and mount it to the filesystem.
sudo mkfs.ext4 /dev/nvme0n1\n
You can choose any name for the mount point directory. We use /ssd
here, but in jetson-containers
' setup.md documentation, /mnt
is used.
sudo mkdir /ssd\n
sudo mount /dev/nvme0n1 /ssd\n
In order to ensure that the mount persists after boot, add an entry to the fstab
file:
First, identify the UUID for your SSD:
lsblk -f\n
Then, add a new entry to the fstab
file:
sudo vi /etc/fstab\n
Insert the following line, replacing the UUID with the value found from lsblk -f
:
UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2\n
Finally, change the ownership of the /ssd
directory.
sudo chown ${USER}:${USER} /ssd\n
Install nvidia-container
package.
Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including nvidia-containers
) and Docker are already pre-installed, so this step can be skipped.
sudo apt update\nsudo apt install -y nvidia-container\n
JetPack 6.0 DP users
If you flash Jetson Linux (L4T) R36.2 (JetPack 6.0 DP) on your Jetson using SDK Manager, and install nvidia-container
using apt
, on JetPack 6.0 it no longer automatically installs Docker.
Therefore, you need to run the following to manually install Docker and set it up.
sudo apt update\nsudo apt install -y nvidia-container curl\ncurl https://get.docker.com | sh && sudo systemctl --now enable docker\nsudo nvidia-ctk runtime configure --runtime=docker\n
Restart the Docker service and add your user to the docker
group, so that you don't need to use the command with sudo
.
sudo systemctl restart docker\nsudo usermod -aG docker $USER\nnewgrp docker\n
Add default runtime in /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert the \"default-runtime\": \"nvidia\"
line as following:
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\"\n}\n
Restart Docker
sudo systemctl daemon-reload && sudo systemctl restart docker\n
Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.
Stop the Docker service.
sudo systemctl stop docker\n
Move the existing Docker folder
sudo du -csh /var/lib/docker/ && \\\n sudo mkdir /ssd/docker && \\\n sudo rsync -axPS /var/lib/docker/ /ssd/docker/ && \\\n sudo du -csh /ssd/docker/ \n
Edit /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert \"data-root\"
line like the following.
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\",\n \"data-root\": \"/ssd/docker\"\n}\n
Rename the old Docker data directory
sudo mv /var/lib/docker /var/lib/docker.old\n
Restart the docker daemon
sudo systemctl daemon-reload && \\\n sudo systemctl restart docker && \\\n sudo journalctl -u docker\n
[Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.
watch -n1 df \n
[Terminal 2] Next, open a new terminal and start Docker pull.
docker pull nvcr.io/nvidia/l4t-base:r35.2.1\n
[Terminal 1] Observe that the disk usage on /ssd
goes up as the container image is downloaded and extracted.
~$ docker image ls\nREPOSITORY TAG IMAGE ID CREATED SIZE\nnvcr.io/nvidia/l4t-base r35.2.1 dc07eb476a1d 7 months ago 713MB\n
Reboot your Jetson, and verify that you observe the following:
~$ sudo blkid | grep nvme\n/dev/nvme0n1: UUID=\"9fc06de1-7cf3-43e2-928a-53a9c03fc5d8\" TYPE=\"ext4\"\n\n~$ df -h\nFilesystem Size Used Avail Use% Mounted on\n/dev/mmcblk1p1 116G 18G 94G 16% /\nnone 3.5G 0 3.5G 0% /dev\ntmpfs 3.6G 108K 3.6G 1% /dev/shm\ntmpfs 734M 35M 699M 5% /run\ntmpfs 5.0M 4.0K 5.0M 1% /run/lock\ntmpfs 3.6G 0 3.6G 0% /sys/fs/cgroup\ntmpfs 734M 88K 734M 1% /run/user/1000\n/dev/nvme0n1 458G 824M 434G 1% /ssd\n\n~$ docker info | grep Root\n Docker Root Dir: /ssd/docker\n\n~$ sudo ls -l /ssd/docker/\ntotal 44\ndrwx--x--x 4 root root 4096 Mar 22 11:44 buildkit\ndrwx--x--- 2 root root 4096 Mar 22 11:44 containers\ndrwx------ 3 root root 4096 Mar 22 11:44 image\ndrwxr-x--- 3 root root 4096 Mar 22 11:44 network\ndrwx--x--- 13 root root 4096 Mar 22 16:20 overlay2\ndrwx------ 4 root root 4096 Mar 22 11:44 plugins\ndrwx------ 2 root root 4096 Mar 22 16:19 runtimes\ndrwx------ 2 root root 4096 Mar 22 11:44 swarm\ndrwx------ 2 root root 4096 Mar 22 16:20 tmp\ndrwx------ 2 root root 4096 Mar 22 11:44 trust\ndrwx-----x 2 root root 4096 Mar 22 16:19 volumes\n\n~$ sudo du -chs /ssd/docker/\n752M /ssd/docker/\n752M total\n\n~$ docker info | grep -e \"Runtime\" -e \"Root\"\n Runtimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2\n Default Runtime: nvidia\n Docker Root Dir: /ssd/docker\n
Your Jetson is now set up with the SSD!
"},{"location":"try.html","title":"Try","text":"Jump to NVIDIA Jetson Store.
"},{"location":"tutorial-intro.html","title":"Tutorial - Introduction","text":""},{"location":"tutorial-intro.html#overview","title":"Overview","text":"Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.
"},{"location":"tutorial-intro.html#text-llm","title":"Text (LLM)","text":"text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui Ollama Get started effortlessly deploying GGUF models for chat and web UI llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! NanoLLM Optimized inferencing library for LLMs, multimodal agents, and speech. Small LLM (SLM) Deploy Small Language Models (SLM) with reduced memory usage and higher throughput. API Examples Learn how to write Python code for doing LLM inference using popular APIs."},{"location":"tutorial-intro.html#text-vision-vlm","title":"Text + Vision (VLM)","text":"Give your locally running LLM an access to vision!
LLaVA Different ways to run LLaVa vision/language model on Jetson for visual understanding. Live LLaVA Run multimodal models interactively on live video streams over a repeating set of prompts. NanoVLM Use mini vision/language models and the optimized multimodal pipeline for live streaming. Llama 3.2 Vision Run Meta's multimodal Llama-3.2-11B-Vision model on Orin with HuggingFace Transformers."},{"location":"tutorial-intro.html#vision-transformers","title":"Vision Transformers","text":"EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson SAM Meta's SAM, Segment Anything model TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation"},{"location":"tutorial-intro.html#image-generation","title":"Image Generation","text":"Flux + ComfyUI Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. Stable Diffusion Run AUTOMATIC1111'sstable-diffusion-webui
to generate images from prompts SDXL Ensemble pipeline consisting of a base model and refiner with enhanced image generation. nerfstudio Experience neural reconstruction and rendering with nerfstudio and onboard training."},{"location":"tutorial-intro.html#audio","title":"Audio","text":"Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR) AudioCraft Meta's AudioCraft, to produce high-quality audio and music Voicecraft Interactive speech editing and zero shot TTS"},{"location":"tutorial-intro.html#rag-vector-database","title":"RAG & Vector Database","text":"NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data LlamaIndex Realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents LlamaIndex Reference application for building your own local AI assistants using LLM, RAG, and VectorDB"},{"location":"tutorial-intro.html#api-integrations","title":"API Integrations","text":"ROS2 Nodes Optimized LLM and VLM provided as ROS2 nodes for robotics Holoscan SDK Use the Holoscan-SDK to run high-throughput, low-latency edge AI pipelines Jetson Platform Services Quickly build microservice driven vision applications with Jetson Platform Services Gapi Workflows Integrating generative AI into real world environments Gapi Micro Services Wrapping models and code to participate in systems Ultralytics YOLOv8 Run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT."},{"location":"tutorial-intro.html#about-nvidia-jetson","title":"About NVIDIA Jetson","text":"Note
We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores RAM(CPU+GPU) 64GB 32GB 8GB Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)"},{"location":"tutorial_api-examples.html","title":"Tutorial - API Examples","text":"It's good to know the code for generating text with LLM inference, and ancillary things like tokenization, chat templates, and prompting. On this page we give Python examples of running various LLM APIs, and their benchmarks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35) JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for l4t-text-generation
container image>10GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
The HuggingFace Transformers API is the de-facto API that models are released for, often serving as the reference implementation. It's not terribly fast, but it does have broad model support, and also supports quantization (AutoGPTQ, AWQ). This uses streaming:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer\nfrom threading import Thread\n\nmodel_name='meta-llama/Llama-2-7b-chat-hf'\nmodel = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nstreamer = TextIteratorStreamer(tokenizer)\n\nprompt = [{'role': 'user', 'content': 'Can I get a recipe for French Onion soup?'}]\ninputs = tokenizer.apply_chat_template(\n prompt,\n add_generation_prompt=True,\n return_tensors='pt'\n).to(model.device)\n\nThread(target=lambda: model.generate(inputs, max_new_tokens=256, streamer=streamer)).start()\n\nfor text in streamer:\n print(text, end='', flush=True)\n
To run this (it can be found here), you can mount a directory containing the script or your jetson-containers directory:
jetson-containers run --volume $PWD/packages/llm:/mount --workdir /mount \\\n $(autotag l4t-text-generation) \\\n python3 transformers/test.py\n
We use the l4t-text-generation
container because it includes the quantization libraries in addition to Transformers, for running the quanztized versions of the models like TheBloke/Llama-2-7B-Chat-GPTQ
The huggingface-benchmark.py
script will benchmark the models:
./run.sh --volume $PWD/packages/llm/transformers:/mount --workdir /mount \\\n $(./autotag l4t-text-generation) \\\n python3 huggingface-benchmark.py --model meta-llama/Llama-2-7b-chat-hf\n
* meta-llama/Llama-2-7b-chat-hf AVG = 20.7077 seconds, 6.2 tokens/sec memory=10173.45 MB\n* TheBloke/Llama-2-7B-Chat-GPTQ AVG = 12.3922 seconds, 10.3 tokens/sec memory=7023.36 MB\n* TheBloke/Llama-2-7B-Chat-AWQ AVG = 11.4667 seconds, 11.2 tokens/sec memory=4662.34 MB\n
"},{"location":"tutorial_api-examples.html#nanollm","title":"NanoLLM","text":"The NanoLLM
library uses the optimized MLC/TVM library for inference, like on the Benchmarks page:
from nano_llm import NanoLLM, ChatHistory, ChatTemplates\n\n# load model\nmodel = NanoLLM.from_pretrained(\n model='meta-llama/Meta-Llama-3-8B-Instruct', \n quantization='q4f16_ft', \n api='mlc'\n)\n\n# create the chat history\nchat_history = ChatHistory(model, system_prompt=\"You are a helpful and friendly AI assistant.\")\n\nwhile True:\n # enter the user query from terminal\n print('>> ', end='', flush=True)\n prompt = input().strip()\n\n # add user prompt and generate chat tokens/embeddings\n chat_history.append(role='user', msg=prompt)\n embedding, position = chat_history.embed_chat()\n\n # generate bot reply\n reply = model.generate(\n embedding, \n streaming=True, \n kv_cache=chat_history.kv_cache,\n stop_tokens=chat_history.template.stop,\n max_new_tokens=256,\n )\n\n # append the output stream to the chat history\n bot_reply = chat_history.append(role='bot', text='')\n\n for token in reply:\n bot_reply.text += token\n print(token, end='', flush=True)\n\n print('\\n')\n\n # save the inter-request KV cache \n chat_history.kv_cache = reply.kv_cache\n
This example keeps an interactive chat running with text being entered from the terminal. You can start it like this:
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat.example\n
Or for easy editing from the host device, copy the source into your own script and mount it into the container with the --volume
flag. And for authenticated models, request access through HuggingFace (like with Llama) and substitute your account's API token above.
Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
10.7 GB
for audiocraft
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag audiocraft)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
AudioCraft repo comes with demo Jupyter notebooks.
On Jupyter Lab navigation pane on the left, double-click demos
folder.
Run cells with Shift + Enter
, first one will download models, which can take some time.
Info
You may encounter an error message like the following when executing the first cell, but you can keep going.
A matching Triton is not available, some optimizations will not be enabled.\nError caught was: No module named 'triton'\n
In the Audio Continuation cells, you can generate continuation based on text, while in Text-conditional Generation you can generate audio based just on text descriptions.
You can also use your own audio as prompt, and use text descriptions to generate continuation:
prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\") # you can upload your own audio\nprompt_duration = 2\nprompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\noutput = model.generate_continuation(prompt_waveform.expand(3, -1, -1), prompt_sample_rate=prompt_sr,descriptions=[\n 'Subway train blowing its horn', # text descriptions for continuation\n 'Horse neighing furiously',\n 'Cat hissing'\n], progress=True)\ndisplay_audio(output, sample_rate=16000)\n
"},{"location":"tutorial_audiocraft.html#musicgen-and-magnet-demos","title":"MusicGen and MAGNeT demos","text":"The two other jupyter notebooks are similar to AuidioGen, where you can generate continuation or generate audio, while using models trained to generate music.
"},{"location":"tutorial_comfyui_flux.html","title":"ComfyUI and Flux on Jetson Orin","text":"Hey there, fellow developer! \ud83d\udc4b I'm excited to share with you our latest project: Flux, an open-source model for image generation. Here at NVIDIA, we're pushing the boundaries to make Flux work seamlessly across all platforms, including our Jetson Orin devices. While we're still fine-tuning the model for the Jetson Orin Nano, we've already got it running smoothly on the Jetson AGX Orin.
In this tutorial, I'm going to walk you through every step needed to get Flux up and running on your Jetson Orin, even if you've just flashed your system. Follow along, and you should have no trouble getting everything set up. And hey, if something doesn't work out, reach out to me\u2014I\u2019ll keep this guide updated to make sure it's always on point.
So, let's dive in and get Flux running on your Jetson!
"},{"location":"tutorial_comfyui_flux.html#1-install-miniconda-and-create-a-python-310-environment","title":"1. Install Miniconda and Create a Python 3.10 Environment","text":"First things first, you'll need to install Miniconda on your Jetson Orin and create a Python 3.10 environment called comfyui
. This will ensure all dependencies are handled properly within an isolated environment.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh\nchmod +x Miniconda3-latest-Linux-aarch64.sh\n./Miniconda3-latest-Linux-aarch64.sh\n\nconda update conda\n\nconda create -n comfyui python=3.10\nconda activate comfyui\n
"},{"location":"tutorial_comfyui_flux.html#2-install-cuda-cudnn-and-tensorrt","title":"2. Install CUDA, cuDNN, and TensorRT","text":"Once your environment is set up, install CUDA 12.4 along with the necessary cuDNN and TensorRT libraries to ensure compatibility and optimal performance on your Jetson Orin.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda-toolkit-12-4 cuda-compat-12-4\nsudo apt-get install cudnn python3-libnvinfer python3-libnvinfer-dev tensorrt\n
"},{"location":"tutorial_comfyui_flux.html#3-verify-and-configure-cuda","title":"3. Verify and Configure CUDA","text":"After installing CUDA, you'll want to verify that the correct version (12.4) is being used and make this change permanent in your environment.
ls -l /usr/local | grep cuda\nsudo ln -s /usr/local/cuda-12.4 /usr/local/cuda\n\nexport PATH=/usr/local/cuda/bin:$PATH\nnvcc --version\n\necho 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc\necho 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc\necho 'export CUDA_PATH=/usr/local/cuda' >> ~/.bashrc\nsource ~/.bashrc\n
"},{"location":"tutorial_comfyui_flux.html#4-compile-and-install-bitsandbytes-with-cuda-support","title":"4. Compile and Install bitsandbytes
with CUDA Support","text":"Now it\u2019s time to compile and install bitsandbytes
with CUDA support. This involves cloning the repository, configuring the build with CMake, compiling using all available cores, and installing the resulting package.
export BNB_CUDA_VERSION=124\nexport LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH\n\ngit clone https://github.com/timdettmers/bitsandbytes.git\ncd bitsandbytes\n\nmkdir -p build\ncd build\ncmake .. -DCOMPUTE_BACKEND=cuda -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4\nmake -j$(nproc)\n\ncd ..\npython setup.py install\n
Verify the installation by importing the package in Python:
python\n>>> import bitsandbytes as bnb\n>>> print(bnb.__version__)\n
"},{"location":"tutorial_comfyui_flux.html#5-install-pytorch-torchvision-and-torchaudio","title":"5. Install PyTorch, TorchVision, and TorchAudio","text":"Next up, install the essential libraries PyTorch
, torchvision
, and torchaudio
for Jetson Orin. You can always check for the latest links here.
pip install http://jetson.webredirect.org/jp6/cu124/+f/5fe/ee5f5d1a75229/torch-2.3.0-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/988/cb71323efff87/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/0aa/a066463c02b4a/torchaudio-2.3.0+952ea74-cp310-cp310-linux_aarch64.whl\n
"},{"location":"tutorial_comfyui_flux.html#6-clone-the-comfyui-repository","title":"6. Clone the ComfyUI Repository","text":"Clone the ComfyUI repository from GitHub to get the necessary source code.
git clone https://github.com/comfyanonymous/ComfyUI.git\ncd ComfyUI\n
"},{"location":"tutorial_comfyui_flux.html#7-update-dependencies","title":"7. Update Dependencies","text":"Make sure all the necessary dependencies are installed by running the requirements.txt
file.
pip install -r requirements.txt\n
"},{"location":"tutorial_comfyui_flux.html#8-resolve-issues-with-numpy","title":"8. Resolve Issues with NumPy","text":"If you encounter issues with NumPy, downgrade to a version below 2.0 to avoid compatibility problems.
pip install \"numpy<2\"\n
"},{"location":"tutorial_comfyui_flux.html#9-run-comfyui","title":"9. Run ComfyUI","text":"Finally, run ComfyUI to ensure everything is set up correctly.
python main.py\n
Great! Now that you\u2019ve got ComfyUI up and running, let's load the workflow to start using the Flux model.
flux1-schnell.safetensors
and vae ae.safetensors
from Hugging Face and place the model in the models/unet
folder and vae in models/vae
within ComfyUI.clip_l.safetensors
and t5xxl_fp8_e4m3fn.safetensors
from Stability's Hugging Face and place them inside models/clip
folder.Alright, you're all set to launch your first run! Head over to the URL provided by ComfyUI (127.0.0.1:8188) on your Jetson AGX Orin, and hit that Queue Prompt button. The first time might take a little longer as the model loads, but after that, each generation should take around 21 seconds. Plus, you can queue up multiple prompts and let it generate images for hours!!
Happy generating! \ud83c\udf89
ASIER \ud83d\ude80
Some examples:
"},{"location":"tutorial_distillation.html","title":"CLIP model distillation","text":"
See \"Jetson Introduction to Knowledge Distillation\" repo's README.md.
https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation
"},{"location":"tutorial_gapi_microservices.html","title":"Tutorial - Gapi Micro Services","text":"A Micro Service is a process that runs a wrapper python script that integrates your custom code/models so they can integrate into Gapi Workflows.
You can run a Micro Service wherever you like and connect it to a Gapi Server via the streaming, hybrid binary+json message protocol.
There are some out-of-the-box \u201cCommunity Micro Services\" that we integrate, test and pack into Docker images. When you run them, they auto integrate, load NVIDIA layers correctly and offer logging to the host system.
Even better, create your own! Just implement an on_message Python handler to process requests and respond. The rest is handled for you.
Think of a Micro Service as a simple wrapper to some code or model you have. It works the same as any other Node inside a Workflow. When it's your Micro Service Node's turn your on_message function will be called. Your script gets the rolling Transaction data for context and you then publish your data directly back into the flow.
"},{"location":"tutorial_gapi_microservices.html#running-the-community-micro-services","title":"Running the Community Micro Services","text":"Requirements for Community Micro Services
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space.
#1 Login and go to the Micro Services tab\n#2 Follow the instructions in the blue box on that page to download your custom configuration\n#3 Then follow the instructions below that for installing the Micro Service you want\n
Example of instruction page:
"},{"location":"tutorial_gapi_microservices.html#congrats-you-can-go-through-the-workflow-tips-now","title":"Congrats! You Can Go Through the Workflow Tips Now","text":""},{"location":"tutorial_gapi_microservices.html#creating-your-own-micro-service","title":"Creating Your Own Micro Service","text":"The entire Micro Service zip file is just 4KB with 4 files:
Full documentation here: Gapi Micro Service Docs. Synopsis below...
#1 Create logical Micro Service in UI and copy the key\n#2 Download the zip file from the UI\n#3 python gapi-ms.py ws://0.0.0.0:8090/gapi-ws [MICROSERVICE_KEY]\n#4 Refresh the UI to confirm it's online\n#5 Edit the message_handler.py to handle binary+json input and change the output\n#6 Add a Micro Service Node to a Workflow and tie it to your Micro Service. Hit Test.\n
"},{"location":"tutorial_gapi_workflows.html","title":"Tutorial - Gapi","text":"Gapi is an embeddable API gateway that creates streaming integrations between AI micro services and the systems that users leverage everyday.
The project's goal is to accelerate the speed of creating pilots and demos of Jetson AI Lab achievements into real world environments
\"On Device\" generative AI doesn't mean it has to live on an island!
Embeddable API gateway software that runs in the background with a low code workflow UI for testing. The server is a message hub and state machine for workflow 'nodes' that talk to Micro Services. Think of it as connective-tissue for applications.
A Micro Service is a process that runs some wrapper python scripts that integrates custom code/models into Workflows using a streaming API.
Gapi Server can run on any Jetson Orin or really any computer as the Micro Services connect outbound over secure web sockets. It doesn't use any GPU resources. There is a also a little demo version to skip the Server install (but you'll still need to run your own Micro Services).
Gapi Project Page
What you need to run Gapi Server on Jetson
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin NX (8GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Docker
Sufficient storage space.
Gapi Server will run on other environments. Email us at support@GenAINerds.com if that's something you think is worthwhile.
Explaining the Steps:
Copy and Run the Steps:
mkdir ~/gapiData && cd ~/gapiData\ncurl -L https://raw.githubusercontent.com/genai-nerds/Gapi/main/gapiConfigs.zip -o gapiConfigs.zip\nunzip -q gapiConfigs.zip\ndocker run -d --name gapi --network host -v ~/gapiData:/opt/gapi/vdata genainerds/gapi:arm64 /bin/bash -c \"cd /opt/gapi/bin && ./startGapi.sh\"\necho \"You may need to hit Enter now. Afterwards the Docker container 'gapi' should be running\"\n
Troubleshooting:
NOTE: You will need to run some Micro Services before doing anything meaningful, so please review the mini tour below but don't do any of it in the UI untill you complete the setup (instructions at the bottom)
"},{"location":"tutorial_gapi_workflows.html#ui","title":"UI","text":"Pass: !gapi2024
Change password in Settings! Docs shows how to add SSL cert.
When you login there will be an array of Tip Workflows that have notes and explain core concepts.
Tips:
Workflows visually connect the execution and flow of data between Nodes.
A Transaction (or single firing) has \"Rolling Input\" data it accumulates as Node to Node steps each talk to Micro Services and APIs. All Nodes enjoy variables and flow control using familiar json and javascript concepts.
Each Node can append or reference the rolling data in the Transaction while making decisions along the way.
Watch live Transactions as they start from clients, webhooks and published messages from Micro Services with visual feedback and debugging.
"},{"location":"tutorial_gapi_workflows.html#apis-to-business-systems","title":"APIs to Business Systems","text":"Gapi can help smoothly integrate generative AI into systems that people already use everyday via APIs. It has the streaming API to Micro Services plus the state management and chops to handle the outward (webhook) style APIs to existing systems.
Our hope is to rally contributions by the community to keep growing the out-of-the-box/tested Nodes but there is a DIY one as well to manually map what you need into your Workflows.
Some of the out-of-the-box API Nodes: Slack (IM), Twilio (SMS), SendGrid (Email), Service Now (Ticketing), DIY Webhook
"},{"location":"tutorial_gapi_workflows.html#micro-services","title":"Micro Services","text":"There are community published Micro Services as well as custom ones you can make yourself. Gapi Server becomes most useful when leveraging them so please follow the How To below.
Current Community Micro Services:
Complete the Setup: How To Run and/or Create Micro Services
"},{"location":"tutorial_gapi_workflows.html#support-contribute","title":"Support / Contribute","text":"Gapi is a project from the GenAI Nerds and hosted on Github.
The Holoscan SDK is part of NVIDIA Holoscan, the AI sensor processing platform that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications, from embedded to edge to cloud.
"},{"location":"tutorial_holoscan.html#holoscan-sdk-core-concepts","title":"Holoscan SDK - Core Concepts","text":"A Holoscan SDK Application
is composed of Fragments
, each of which runs a graph of Operators
. The implementation of that graph is sometimes referred to as a pipeline or workflow, which can be visualized below:
The best place to get started using the Holoscan SDK is the HoloHub repo. This is a central repository for the NVIDIA Holoscan AI sensor processing community to share apps and extensions.
So, let's walk through how to run the Surgical Tool Tracking example application from HoloHub!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
13.7 GB
for efficientvit
container image850 Mb
for Tool Tracking ONNX model + example videoClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build the Holoscan container.
Use the -v
option to mount HoloHub's ./build
and ./data
directories to the local Holoscan package directory so builds and data are cached across runs
jetson-containers run \\\n -v ./packages/holoscan/holohub/data:/opt/nvidia/holohub/data \\\n -v ./packages/holoscan/holohub/build:/opt/nvidia/holohub/build \\\n $(autotag holoscan)\n
"},{"location":"tutorial_holoscan.html#running-holohubs-endoscopy-tool-tracking-app","title":"Running HoloHub's Endoscopy Tool Tracking App","text":"An example application from HoloHub is the Endoscopy Tool Tracking application. This sample application demonstrates how the Holoscan SDK can be used to build an efficient pipeline that streams a video feed, preprocesses the data, runs inference using TensorRT, post-processes the data, and renders the video feed with the inference overlays.
"},{"location":"tutorial_holoscan.html#building-the-app","title":"Building The App","text":"The Holoscan SDK uses CMake to build C++ applications and also leverages CMake to pull and build app dependencies. So, regardless of whether an application is implemented using C++ or Python, many apps will still require that you \"build\" them first.
The Endoscopy Tool Tracking App has both a Python and C++ implementation. Building this app creates the C++ application program, pulls in an example video, and builds the TensorRT engine used for inference.
Go to the HoloHub directory
cd /opt/nvidia/holohub\n
Build the app using the 'run' script ./run build endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#running-the-python-app","title":"Running The Python App","text":"First, add the Holoscan SDK and the HoloHub build directory to your PYTHONPATH environment variable.
export HOLOHUB_BUILD_PATH=/opt/nvidia/holohub/build/endoscopy_tool_tracking\nexport PYTHONPATH=$PYTHONPATH:$HOLOHUB_BUILD_PATH/python/lib:/opt/nvidia/holoscan/python/lib\n
Next, run the application using Python! python3 /opt/nvidia/holohub/applications/endoscopy_tool_tracking/python/endoscopy_tool_tracking.python --data /opt/nvidia/holohub/data/endoscopy/\n
Shortly after launching the application, you will see the HoloViz window that visualizes the sample video and the model's outputs:
"},{"location":"tutorial_holoscan.html#running-the-c-app","title":"Running The C++ App","text":"The C++ app can be run using the run script by specifying the app name:
./run launch endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#next-steps","title":"Next Steps","text":"Congratulations! You've successfully run a Holoscan SDK application!
To dive deeper into the Holoscan SDK, explore the SDK's documentation on Core Concepts, Holoscan by Example, and Creating an Application.
"},{"location":"tutorial_jetson-copilot.html","title":"Tutorial - Jetson Copilot","text":"Jetson Copilot is a reference application for a local AI assistant, which demonstrates;
What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6 GB
for jetrag
container image4 GB
for downloading some default models (llama3
and mxbai-embed-large
)Info
To run Jetson Copilot, you do not need to have jetson-containers
on your system. It uses the jetrag
container image that is being managed and built on jetson-containers
.
If this is your first time to run Jetson Copilot on your Jetson, first run setup.sh
to ensure you have all the necessary software installed and the environment set up.
git clone https://github.com/NVIDIA-AI-IOT/jetson-copilot/\ncd jetson-copilot\n./setup_environment.sh\n
It will install the following, if not yet.
cd jetson-copilot\n./launch_jetson_copilot.sh\n
This will start a Docker container and start a Ollama server and Streamlit app inside the container. It will shows the URL on the console in order to access the web app hosted on your Jetson.
With your web browser on Jetson, open the Local URL (http://localhost:8501
).Or on a PC connected on the same network as on your Jetson, access the Network URL.
Local URL: http://localhost:8501\nNetwork URL: http://10.110.50.252:8501 \n
Info
You will need the Internet connection on Jetson when it launches for the first time, as it will pull the container image (and download the default LLM and embedding model when web UI starts for the first time).
When you access the web UI for the first time, it will dowload the default LLM (llama3
) and the embedding model (mxbai-embed-large
).
Tips
If you are on Ubuntu Desktop, a frameless Chromium window will pop up to access the web app, to make it look like an independent application. You need to close the window as stopping the container on the console won't shutdown Chromium.
"},{"location":"tutorial_jetson-copilot.html#how-to-use-jetson-copilot","title":"\ud83d\udcd6 How to use Jetson Copilot","text":""},{"location":"tutorial_jetson-copilot.html#0-interact-with-the-plain-llama3-8b","title":"0. Interact with the plain Llama3 (8b)","text":"You can use Jetson Copilot just to interact with a LLM withut enabling RAG feature.
By default, Llama3 (8b) model is downloaded when running for the first time and use as the default LLM.
You will be surprized how much a model like Llama3 is capable, but may soon find limitations as it does not have information prior to its cutoff date nor know anything about your specific subject matter.
"},{"location":"tutorial_jetson-copilot.html#1-ask-jetson-related-question-using-pre-built-index","title":"1. Ask Jetson related question using pre-built index","text":"On the side panel, you can toggle \"Use RAG\" on to enable RAG pipeline. The LLM will have an access to a custom knowledge/index that is selected under \"Index\".
As a sample, a pre-build index \"_L4T_README
\" is provided. This is built on all the README text files that supplied in the \"L4T-README\" folder on the Jetson desktop.
It is mounted as /media/<USER_NAME>/L4T-README/
once you execute udisksctl mount -b /dev/disk/by-label/L4T-README
.
You can ask questions like:
What IP address does Jetson gets assigned when connected to a PC via a USB cable in USB Device Mode?\n
"},{"location":"tutorial_jetson-copilot.html#2-build-your-own-index-based-on-your-documents","title":"2. Build your own index based on your documents","text":"You can build your own index based on your local and/or online documents.
First, on the console (or on the desktop) create a directory under Documents
directory to store your documents.
cd jetson-copilot\nmkdir Documents/Jetson-Orin-Nano\ncd Documents/Jetson-Orin-Nano\nwget https://developer.nvidia.com/downloads/assets/embedded/secure/jetson/orin_nano/docs/jetson_orin_nano_devkit_carrier_board_specification_sp.pdf\n
Now back on the web UI, open the side bar, toggle on \"Use RAG\", then click on \"\u2795Build a new index\" to jump to a \"Build Index\" page.
Give a name for the Index you are to build. (e.g. \"JON Carrier Board\") Type in the field and hit Enter
key, then it will check and show what path will be created for your index.
And then from the drop select box under \"Local documents\", select the directory you created and saved your documents in. (e.g. /opt/jetson_copilot/Documents/Jetson-Orin-Nano
).
It will show the summary of files found in the selected directory.
If you want to rather only or additionally supply URLs for the online docuemnts to be ingested, fill the text area with one URL per a line. You can skip this if you are building your index only based on your local documents.
Info
On the sidebar, make sure mxbai-embed-large
is selected for the embedding model.
Use of OpenAI embedding models is not well supported and needs more testing.
Finally, hit \"Build Index\" button. It will show the progress in the drop-down \"status container\", so you can check the status by clicking on it. Once done, it will show the summary of your index and time it took.
You can go back to the home screen to now select the index you just built.
"},{"location":"tutorial_jetson-copilot.html#3-test-different-llm-or-embedding-model","title":"3. Test different LLM or Embedding model","text":"TODO
"},{"location":"tutorial_jetson-copilot.html#development","title":"\ud83c\udfd7\ufe0f Development","text":"Streamlit based web app is very easy to develop.
On web UI, at the top-right of the screen to choose \"Always rerun\" to automatically update your app every time you change the source codes.
See Streamlit Documentation for the detail.
"},{"location":"tutorial_jetson-copilot.html#manually-run-streamlit-app-inside-the-container","title":"Manually run streamlit app inside the container","text":"In case you make more fundamental changes, you can also manually run streamlit app.
cd jetson-copilot\n./launch_dev.sh\n
Once in container;
streamlit run app.py\n
"},{"location":"tutorial_jetson-copilot.html#directory-structure","title":"\ud83e\uddf1 Directory structure","text":"\u2514\u2500\u2500 jetson-copilot\n \u251c\u2500\u2500 launch_jetson_copilot.sh\n \u251c\u2500\u2500 setup_environment.sh\n \u251c\u2500\u2500 Documents \n \u2502 \u2514\u2500\u2500 your_abc_docs\n \u251c\u2500\u2500 Indexes\n \u2502 \u251c\u2500\u2500 _L4T_README\n \u2502 \u2514\u2500\u2500 your_abc_index\n \u251c\u2500\u2500 logs\n \u2502 \u251c\u2500\u2500 container.log\n \u2502 \u2514\u2500\u2500 ollama.log\n \u251c\u2500\u2500 ollama_models\n \u2514\u2500\u2500 Streamlit_app\n \u251c\u2500\u2500 app.py\n \u251c\u2500\u2500 build_index.py\n \u2514\u2500\u2500 download_model.py\n
Following directories inside the jetson-copilot
directory are mounted in the Docker container.
Docuemtns
Directory to store your documents to be indexed Indexes
Directory to store pre-built (or built-by-you) indexes for LLM to perform RAG on logs
Directory for the app to store log files ollama_models
Directory for the ollama server to store download models stremlit_app
Directory for Python scripts to make up the web app"},{"location":"tutorial_jetson-copilot.html#troubleshooting","title":"\ud83d\udcab Troubleshooting","text":"If you find any issue, please check GitHub Issues of the Jetson Copilot repo.
"},{"location":"tutorial_jps.html","title":"Tutorial - Jetson Platform Services","text":"Jetson Plaform Services (JPS) provide a platform to simplify development, deployment and management of Edge AI applications on NVIDIA Jetson. JPS is a modular & extensible architecture for developers to distill large complex applications into smaller modular microservice with APIs to integrate into other apps & services. At its core are a collection of AI services leveraging generative AI, deep learning, and analytics, which provide state of the art capabilities including video analytics, video understanding and summarization, text based prompting, zero shot detection and spatio temporal analysis of object movement.
VLM Alert Workflow built with JPS
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
To get started with Jetson Platform Services, follow the quickstart guide to install and setup JPS. Then explore the reference workflows to learn how to use DeepStream, Analytics, Generative AI and more with JPS:
1) Quick Start Guide 2) AI Powered Network Video Recorder 3) Zero Shot Detection with NanoOWL 4) Visual Language Model Alerts
The reference workflows demonstrate how to use the microservices provided in JPS to build full end to end systems on your Jetson.
VLM Alert Workflow Architecture
View the links below to learn more about Jetson Platform Services: VLM Alert Blog JPS Product Page JPS Documentation VLM Alert Demo Video
"},{"location":"tutorial_live-llava.html","title":"Tutorial - Live LLaVA","text":"Recommended
Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for in interactive pipeline editor built from live VLMs.
This multimodal agent runs a vision-language model on a live camera feed or video stream, repeatedly applying the same prompts to it:
It uses models like LLaVA or VILA and has been quantized with 4-bit precision. This runs an optimized multimodal pipeline from the NanoLLM
library, including running the CLIP/SigLIP vision encoder in TensorRT, event filters and alerts, and multimodal RAG (see the NanoVLM page for benchmarks)
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>10GB
)Follow the chat-based LLaVA and NanoVLM tutorials first.
Supported vision/language models:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GBThe VideoQuery agent applies prompts to the incoming video feed with the VLM. Navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera (Chrome is recommended with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled)
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
This uses jetson_utils
for video I/O, and for options related to protocols and file formats, see Camera Streaming and Multimedia. In the example above, it captures a V4L2 USB webcam connected to the Jetson (under the device /dev/video0
) and outputs a WebRTC stream.
The example above was running on a live camera, but you can also read and write a video file or network stream by substituting the path or URL to the --video-input
and --video-output
command-line arguments like this:
jetson-containers run \\\n -v /path/to/your/videos:/mount\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /mount/my_video.mp4 \\\n --video-output /mount/output.mp4 \\\n --prompt \"What does the weather look like?\"\n
This example processes and pre-recorded video (in MP4, MKV, AVI, FLV formats with H.264/H.265 encoding), but it also can input/output live network streams like RTP, RTSP, and WebRTC using Jetson's hardware-accelerated video codecs.
"},{"location":"tutorial_live-llava.html#nanodb-integration","title":"NanoDB Integration","text":"If you launch the VideoQuery agent with the --nanodb
flag along with a path to your NanoDB database, it will perform reverse-image search on the incoming feed against the database by re-using the CLIP embeddings generated by the VLM.
To enable this mode, first follow the NanoDB tutorial to download, index, and test the database. Then launch VideoQuery like this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output \\\n --nanodb /data/nanodb/coco/2017\n
You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:
"},{"location":"tutorial_live-llava.html#video-vila","title":"Video VILA","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. The vision/video.py
example keeps a rolling history of frames:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_live-llava.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_live-llava.html#walkthrough-videos","title":"Walkthrough Videos","text":""},{"location":"tutorial_llamaindex.html","title":"Tutorial - LlamaIndex","text":"Let's use LlamaIndex, to realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents!
What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.5 GB
for llama-index
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag llama-index:samples)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
You can follow along LlamaIndex_Local-Models_L4T.ipynb
(which is based on the official LlamaIndex tutorial).
Talk live with Llama using streaming ASR/TTS, and chat about images with Llava!
NanoLLM
library provides optimized inference for LLM and speech models.The WebChat
agent has responsive conversational abilities and multimodal support for chatting about images with vision/language models, including overlapping ASR/LLM/TTS generation and verbal interruptability.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>10GB
)Start the Riva server first and test the ASR examples.
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --asr=riva --tts=piper\n
This will start llamaspeak with text LLM and ASR/TTS enabled. You can then navigate your browser to https://IP_ADDRESS:8050
--web-port
(and --ws-port
for the websocket port)The code and docs for the WebAgent
that runs llamaspeak can be found in the NanoLLM library. This block diagram shows the speech pipeline with interleaved model generation, user interruption, and streaming I/O:
If you load a multimodal vision/language model instead, you can drag images into the chat and ask questions about them:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model Efficient-Large-Model/VILA-7b \\\n --asr=riva --tts=piper\n
For more info about the supported vision/language models, see the NanoVLM page.
"},{"location":"tutorial_llamaspeak.html#function-calling","title":"Function Calling","text":"There's the ability to define functions from Python that the bot has access to and can invoke based on the chat flow:
This works by using the bot_function()
decorator and adding the API description's to the system prompt:
from nano_llm import NanoLLM, ChatHistory, BotFunctions, bot_function\nfrom datetime import datetime\n\n@bot_function\ndef DATE():\n \"\"\" Returns the current date. \"\"\"\n return datetime.now().strftime(\"%A, %B %-m %Y\")\n\n@bot_function\ndef TIME():\n \"\"\" Returns the current time. \"\"\"\n return datetime.now().strftime(\"%-I:%M %p\")\n\nsystem_prompt = \"You are a helpful and friendly AI assistant.\" + BotFunctions.generate_docs()\n
The system prompt can be autogenerated from the Python docstrings embedded in the functions themselves, and can include parameters that the bot can supply (for example, selectively saving relevant user info to a vector database for RAG like is shown in the video).
For more information about this topic, see the Function Calling section of the NanoLLM documentation.
"},{"location":"tutorial_llava.html","title":"Tutorial - LLaVA","text":"LLaVA is a popular multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries. Llava uses the CLIP vision encoder to transform images into the same embedding space as its LLM (which is the same as Llama architecture). Below we cover different methods to run Llava on Jetson, with increasingly optimized performance:
text-generation-webui
llava.serve.cli
llama.cpp
NanoVLM
text-generation-webui
4-bit (GPTQ) 2.3 9.7 GB llava.serve.cli
FP16 (None) 4.2 27.7 GB llama.cpp
4-bit (Q4_K) 10.1 9.2 GB NanoVLM
4-bit (MLC) 21.1 8.7 GB In addition to Llava, the NanoVLM
pipeline supports VILA and mini vision models that run on Orin Nano as well.
text-generation-webui
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for text-generation-webui
container image1.7GB
7.25GB
Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 download-model.py --output=/data/models/text-generation-webui \\\n TheBloke/llava-v1.5-13B-GPTQ\n
"},{"location":"tutorial_llava.html#start-web-ui-with-multimodal-extension","title":"Start Web UI with Multimodal Extension","text":"jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 server.py --listen \\\n --model-dir /data/models/text-generation-webui \\\n --model TheBloke_llava-v1.5-13B-GPTQ \\\n --multimodal-pipeline llava-v1.5-13b \\\n --loader autogptq \\\n --disable_exllama \\\n --verbose\n
Go to Chat tab, drag and drop an image into the Drop Image Here area, and your question in the text area and hit Generate:
"},{"location":"tutorial_llava.html#result","title":"Result","text":""},{"location":"tutorial_llava.html#2-run-from-the-terminal-with-llavaservecli","title":"2. Run from the terminal withllava.serve.cli
","text":"What you need
One of the following Jetson:
Jetson AGX Orin 64GB Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1GB
for llava
container14GB
for Llava-7B (or 26GB
for Llava-13B)This example uses the upstream Llava repo to run the original, unquantized Llava models from the command-line. It uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging. See the Llava container readme for more info.
"},{"location":"tutorial_llava.html#llava-v15-7b","title":"llava-v1.5-7b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-7b \\\n --image-file /data/images/hoover.jpg\n
"},{"location":"tutorial_llava.html#llava-v15-13b","title":"llava-v1.5-13b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-13b \\\n --image-file /data/images/hoover.jpg\n
Unquantized 13B may run only on Jetson AGX Orin 64GB due to memory requirements.
"},{"location":"tutorial_llava.html#3-quantized-gguf-models-with-llamacpp","title":"3. Quantized GGUF models withllama.cpp
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
llama.cpp is one of the faster LLM API's, and can apply a variety of quantization methods to Llava to reduce its memory usage and runtime. Despite its name, it uses CUDA. There are pre-quantized versions of Llava-1.5 available in GGUF format for 4-bit and 5-bit:
jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/hoover.jpg \\\n --prompt \"What does the sign say\"'\n
Quantization Bits Response Tokens/sec Memory Q4_K
4 The sign says \"Hoover Dam, Exit 9.\" 10.17 9.2 GB Q5_K
5 The sign says \"Hoover Dam exit 9.\" 9.73 10.4 GB A lower temperature like 0.1 is recommended for better quality (--temp 0.1
), and if you omit --prompt
it will describe the image:
jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/lake.jpg'\n\nIn this image, a small wooden pier extends out into a calm lake, surrounded by tall trees and mountains. The pier seems to be the only access point to the lake. The serene scene includes a few boats scattered across the water, with one near the pier and the others further away. The overall atmosphere suggests a peaceful and tranquil setting, perfect for relaxation and enjoying nature.\n
You can put your own images in the mounted jetson-containers/data
directory. The C++ code for llava-cli can be found here. The llama-cpp-python bindings also support Llava, however they are slower from Python (potentially handling of the tokens)
NanoVLM
","text":"What's Next
This section got too long and was moved to the NanoVLM page - check it out there for performance optimizations, mini VLMs, and live streaming!
"},{"location":"tutorial_minigpt4.html","title":"Tutorial - MiniGPT-4","text":"Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.8GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
minigpt4
container with models","text":"To start the MiniGPT4 container and webserver with the recommended models, run this command:
jetson-containers run $(autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 && python3 webui.py \\\n $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \\\n $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'\n
Then, open your web browser and access http://<IP_ADDRESS>:7860
.
17# First steps with Metropolis Microservices for Jetson
NVIDIA Metropolis Microservices for Jetson simplifies the development of vision AI applications, offering a suite of customizable, cloud-native tools. Before diving into this tutorial, ensure you've filled out the Metropolis Microservices for Jetson Early Access form to gain the necessary access to launch the services. This step is crucial as it enables you to utilize all the features and capabilities discussed in this guide.
Perfect for both newcomers and experts, this tutorial provides straightforward steps to kick-start your edge AI projects. Whether you're a student or an ecosystem partner working on a use case, this guide offers a straightforward start for every skill level.
"},{"location":"tutorial_mmj.html#0-install-nvidia-jetson-services","title":"0. Install NVIDIA Jetson Services:","text":"Ok, let's start by installing NVIDIA Jetson Services:
sudo apt install nvidia-jetson-services\n
Let's add some performance hacks that will be needed to run the demo faster and without streaming artifacts:
If you don't have the Orin at max performance, you can use these two commands, a reboot is needed after:
sudo nvpmodel -m 0 \nsudo jetson_clocks\n
After these two commands, a reboot is needed if your Jetson wasn't already in high-performance mode. These are optional, but they fine-tune your network buffers to ensure smoother streaming by optimizing how much data can be sent and received:
sudo sysctl -w net.core.rmem_default=2129920\nsudo sysctl -w net.core.rmem_max=10000000\nsudo sysctl -w net.core.wmem_max=2000000\n
Download NGC for ARM64 from the NGC for CLI site:
unzip ngccli_arm64.zip\nchmod u+x ngc-cli/ngc\necho \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" >> ~/.bash_profile && source ~/.bash_profile\nngc config set\n
Here it will ask for your API Key, and the organization name, to get those you need to login into NGC and generate an API key here. You should then paste the API key and use the organization name you are using. You can also press [Enter] to select the default values for the remaining options. After this, you should get the message:
Successfully saved NGC configuration to /home/jetson/.ngc/config\n
Then, login with the same API key:
sudo docker login nvcr.io -u \"\\$oauthtoken\" -p <NGC-API-KEY>\n
Now launch the Redis and Ingress services, as we need them for this tutorial.
sudo systemctl start jetson-redis\nsudo systemctl start jetson-ingress\n
"},{"location":"tutorial_mmj.html#2-download-and-launch-nvstreamer","title":"2. Download and launch NVStreamer","text":""},{"location":"tutorial_mmj.html#_1","title":"Tutorial mmj","text":"First, we need to install NVStreamer, an app that streams the videos MMJs will need to run AI on them. Follow this NVStreamer Link (In the top-left, click Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf nvstreamer.tar.gz\ncd nvstreamer\n
Launch it: sudo docker compose -f compose_nvstreamer.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#3-download-ai_nvr-and-launch","title":"3. Download AI_NVR and launch:","text":""},{"location":"tutorial_mmj.html#_2","title":"Tutorial mmj","text":"AI NVR (NGC) Link (Top-left -> Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf ai_nvr.tar.gz\nsudo cp ai_nvr/config/ai-nvr-nginx.conf /opt/nvidia/jetson/services/ingress/config/\ncd ai_nvr\nsudo docker compose -f compose_agx.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#4-download-some-sample-videos-and-upload-them-to-nvstreamer","title":"4. Download some sample videos and upload them to NVStreamer","text":"Download them from here.
unzip files.zip\n
Ok, now, this is important to understand, there are 2 web interfaces: So, first we need to upload the file in the Streamer interface, it looks like this:
There, go to File Upload, and drag and drop the file in the upload squared area.
After uploading it, go to the Dashboad option of the left menu, and copy the RTSP URL of the video you just uploaded, you will need it for the Camera Management Dashboard.
Now jump to the Camera Management Dashboard (http://localhost:30080/vst), it looks like this:
Go to the Camera Management option of the menu, then use the Add device manually option, and paste the RTSP URL, add the name of your video to the Name and Location text boxes, so it will be displayed on top of the stream.
Finally, click in the Live Streams option of the left menu, and you should be able to watch your video stream.
"},{"location":"tutorial_mmj.html#5-watch-rtsp-ai-processed-streaming-from-vlc","title":"5. Watch RTSP AI processed streaming from VLC","text":"Open VLC from another computer (localhost doesn't work here), and point to your Jetson Orin's IP address (you should be in the same network, or not having a firewal to access).
The easiest way to get Jetson's ip is launching:
ifconfig\n
And checking the IP of the interface (usually wlan0, inet IP). Then go to rtsp://[JETSON_IP]:8555/ds-test using VLC like this:
"},{"location":"tutorial_mmj.html#6-android-app","title":"6. Android app","text":"There is an Android app that allows you to track events and create areas of interest to monitor, you can find it on Google Play as AI NVR.
Here is a quick walkthough where you can see how to:
NanoLLM
is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM\u2019s, multimodality, speech services, vector databases with RAG, and web frontends like Agent Studio.
It provides similar APIs to HuggingFace, backed by highly-optimized inference libraries and quantization tools:
NanoLLM Reference Documentationfrom nano_llm import NanoLLM\n\nmodel = NanoLLM.from_pretrained(\n \"meta-llama/Meta-Llama-3-8B-Instruct\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n api_token='hf_abc123def', # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n\nresponse = model.generate(\"Once upon a time,\", max_new_tokens=128)\n\nfor token in response:\n print(token, end='', flush=True)\n
"},{"location":"tutorial_nano-llm.html#containers","title":"Containers","text":"To test a chat session with Llama from the command-line, install jetson-containers
and run NanoLLM like this:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Llama CLIAgent Studio jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --prompt \"Can you tell me a joke about llamas?\"\n
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
If you haven't already, request access to the Llama models on HuggingFace and substitute your account's API token above.
"},{"location":"tutorial_nano-llm.html#resources","title":"Resources","text":"Here's an index of the various tutorials & examples using NanoLLM on Jetson AI Lab:
Benchmarks Benchmarking results for LLM, SLM, VLM using MLC/TVM backend. API Examples Python code examples for chat, completion, and multimodal. Documentation Reference documentation for the NanoLLM model and agent APIs. Llamaspeak Talk verbally with LLMs using low-latency ASR/TTS speech models. Small LLM (SLM) Focus on language models with reduced footprint (7B params and below) Live LLaVA Realtime live-streaming vision/language models on recurring prompts. Nano VLM Efficient multimodal pipeline with one-shot image tagging and RAG support. Agent Studio Rapidly design and experiment with creating your own automation agents. OpenVLA Robot learning with Vision/Language Action models and manipulation in simulator."},{"location":"tutorial_nano-vlm.html","title":"NanoVLM - Efficient Multimodal Pipeline","text":"We saw in the previous LLaVA tutorial how to run vision-language models through tools like text-generation-webui
and llama.cpp
. In a similar vein to the SLM page on Small Language Models, here we'll explore optimizing VLMs for reduced memory usage and higher performance that reaches interactive levels (like in Liva LLava). These are great for fitting on Orin Nano and increasing the framerate.
There are 3 model families currently supported: Llava, VILA, and Obsidian (mini VLM)
"},{"location":"tutorial_nano-vlm.html#vlm-benchmarks","title":"VLM Benchmarks","text":"This FPS measures the end-to-end pipeline performance for continuous streaming like with Live Llava (on yes/no question)
"},{"location":"tutorial_nano-vlm.html#multimodal-chat","title":"Multimodal Chat","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>10GB
)Supported VLM models in NanoLLM
:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GBThe optimized NanoLLM
library uses MLC/TVM for quantization and inference provides the highest performance. It efficiently manages the CLIP embeddings and KV cache. You can find Python code for the chat program used in this example here.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32\n
This starts an interactive console-based chat with Llava, and on the first run the model will automatically be downloaded from HuggingFace and quantized using MLC and W4A16 precision (which can take some time). See here for command-line options.
You'll end up at a >> PROMPT:
in which you can enter the path or URL of an image file, followed by your question about the image. You can follow-up with multiple questions about the same image. Llava does not understand multiple images in the same chat, so when changing images, first reset the chat history by entering clear
or reset
as the prompt. VILA supports multiple images (area of active research)
During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'what does the road sign say?' \\\n --prompt 'what kind of environment is it?' \\\n --prompt 'reset' \\\n --prompt '/data/images/lake.jpg' \\\n --prompt 'please describe the scene.' \\\n --prompt 'are there any hazards to be aware of?'\n
You can also use --prompt /data/prompts/images.json
to run the test sequence, the results of which are in the table below.
\u2022 \u00a0 The model responses are with 4-bit quantization enabled, and are truncated to 128 tokens for brevity. \u2022 \u00a0 These chat questions and images are from /data/prompts/images.json
(found in jetson-containers)
When prompted, these models can also output in constrained JSON formats (which the LLaVA authors cover in their LLaVA-1.5 paper), and can be used to programatically query information about the image:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model liuhaotian/llava-v1.5-13b \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'extract any text from the image as json'\n\n{\n \"sign\": \"Hoover Dam\",\n \"exit\": \"2\",\n \"distance\": \"1 1/2 mile\"\n}\n
"},{"location":"tutorial_nano-vlm.html#web-ui","title":"Web UI","text":"To use this through a web browser instead, see the llamaspeak tutorial:
"},{"location":"tutorial_nano-vlm.html#live-streaming","title":"Live Streaming","text":"These models can also be used with the Live Llava agent for continuous streaming - just substitute the desired model name below:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
Then navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera. Using Chrome or Chromium is recommended for a stable WebRTC connection, with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled.
The Live Llava tutorial shows how to enable additional features like vector database integration, image tagging, and RAG.
"},{"location":"tutorial_nano-vlm.html#video-sequences","title":"Video Sequences","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model. The vision/video.py
example shows how to use this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_nano-vlm.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_nanodb.html","title":"Tutorial - NanoDB","text":"Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
16GB
for container image40GB
for MS COCO datasetClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Just for an example, let's use MS COCO dataset:
cd jetson-containers\nmkdir -p data/datasets/coco/2017\ncd data/datasets/coco/2017\n\nwget http://images.cocodataset.org/zips/train2017.zip\nwget http://images.cocodataset.org/zips/val2017.zip\nwget http://images.cocodataset.org/zips/unlabeled2017.zip\n\nunzip train2017.zip\nunzip val2017.zip\nunzip unlabeled2017.zip\n
"},{"location":"tutorial_nanodb.html#download-index","title":"Download Index","text":"You can download a pre-indexed NanoDB that was already prepared over the COCO dataset from here:
cd jetson-containers/data\nwget https://nvidia.box.com/shared/static/icw8qhgioyj4qsk832r4nj2p9olsxoci.gz -O nanodb_coco_2017.tar.gz\ntar -xzvf nanodb_coco_2017.tar.gz\n
This allow you to skip the indexing process in the next step, and jump to starting the Web UI.
"},{"location":"tutorial_nanodb.html#indexing-data","title":"Indexing Data","text":"If you didn't download the NanoDB index for COCO from above, we need to build the index by scanning your dataset directory:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --scan /data/datasets/coco/2017 \\\n --path /data/nanodb/coco/2017 \\\n --autosave --validate \n
This will take a few hours on AGX Orin. Once the database has loaded and completed any start-up operations , it will drop down to a >
prompt from which the user can run search queries. You can quickly check the operation by typing your query on this prompt:
> a girl riding a horse\n\n* index=80110 /data/datasets/coco/2017/train2017/000000393735.jpg similarity=0.29991915822029114\n* index=158747 /data/datasets/coco/2017/unlabeled2017/000000189708.jpg similarity=0.29254037141799927\n* index=123846 /data/datasets/coco/2017/unlabeled2017/000000026239.jpg similarity=0.292171448469162\n* index=127338 /data/datasets/coco/2017/unlabeled2017/000000042508.jpg similarity=0.29118549823760986\n* index=77416 /data/datasets/coco/2017/train2017/000000380634.jpg similarity=0.28964102268218994\n* index=51992 /data/datasets/coco/2017/train2017/000000256290.jpg similarity=0.28929752111434937\n* index=228640 /data/datasets/coco/2017/unlabeled2017/000000520381.jpg similarity=0.28642547130584717\n* index=104819 /data/datasets/coco/2017/train2017/000000515895.jpg similarity=0.285491943359375\n
You can press Ctrl+C to exit. For more info about the various options available, see the NanoDB container documentation.
"},{"location":"tutorial_nanodb.html#interactive-web-ui","title":"Interactive Web UI","text":"Spin up the Gradio server:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --path /data/nanodb/coco/2017 \\\n --server --port=7860\n
Then navigate your browser to http://<IP_ADDRESS>:7860
, and you can enter text search queries as well as drag/upload images:
To use the dark theme, navigate to http://<IP_ADDRESS>:7860/?__theme=dark
instead"},{"location":"tutorial_ollama.html","title":"Tutorial - Ollama","text":"
Ollama is a popular LLM tool that's easy to get started with, and includes a built-in model library of pre-quantized weights that will automatically be downloaded and run using llama.cpp underneath for inference. The ollama container was compiled with CUDA support.
ollama
client can run inside or outside container after starting the server.What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7GB
for ollama
container image>5GB
)# models cached under jetson-containers/data\njetson-containers run --name ollama $(autotag ollama)\n\n# models cached under your user's home directory\ndocker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0\n
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted jetson-containers/data/models/ollama
directory (or another directory that you override with OLLAMA_MODELS
)
Start the Ollama command-line chat client with your desired model (for example: llama3
, phi3
, mistral
)
# if running inside the same container as launched above\n/bin/ollama run phi3\n\n# if launching a new container for the client in another terminal\njetson-containers run $(autotag ollama) /bin/ollama run phi3\n
Or you can install Ollama's binaries for arm64 outside of container (without CUDA, which only the server needs)
# download the latest ollama release for arm64 into /bin\nsudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort=\"version:refname\" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama\nsudo chmod +x /bin/ollama\n\n# use the client like normal outside container\n/bin/ollama run phi3\n
"},{"location":"tutorial_ollama.html#open-webui","title":"Open WebUI","text":"To run an Open WebUI server for client browsers to connect to, use the open-webui
container:
docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main\n
You can then navigate your browser to http://JETSON_IP:8080
, and create a fake account to login (these credentials are only local)
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the Llava page. It gets roughly half of peak performance versus the faster APIs like NanoLLM, but is generally considered fast enough for text chat.
"},{"location":"tutorial_slm.html","title":"Tutorial - Small Language Models (SLM)","text":"Small Language Models (SLMs) represent a growing class of language models that have <7B parameters - for example StableLM, Phi-2, and Gemma-2B. Their smaller memory footprint and faster performance make them good candidates for deploying on Jetson Orin Nano. Some are very capable with abilities at a similar level as the larger models, having been trained on high-quality curated datasets.
This tutorial shows how to run optimized SLMs with quantization using the NanoLLM
library and MLC/TVM backend. You can run these models through tools like text-generation-webui
and llama.cpp as well, just not as fast - and since the focus of SLMs is reduced computational and memory requirements, here we'll use the most optimized path available. Those shown below have been profiled:
\u2022 \u00a0 The HuggingFace Open LLM Leaderboard is a collection of multitask benchmarks including reasoning & comprehension, math, coding, history, geography, ect. \u2022 \u00a0 The model's memory footprint includes 4-bit weights and KV cache at full context length (factor in extra for process overhead, library code, ect) \u2022 \u00a0 The Chat Model
is the instruction-tuned variant for chatting with in the commands below, as opposed to the base completion model.
Based on user interactions, the recommended models to try are stabilityai/stablelm-zephyr-3b
and princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT
, for having output quality on par with Llama-2-7B and well-optimized neural architectures. These models have also been used as the base for various fine-tunes (for example Nous-Capybara-3B-V1.9
) and mini VLMs. Others may not be particularly coherent.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image>5GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
The nano_llm.chat
program will automatically download and quantize models from HuggingFace like those listed in the table above:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT\n
\u2022 \u00a0 For models requiring authentication, use --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN>
\u2022 \u00a0 Press Ctrl+C twice in succession to exit (once will interrupt bot output)
This will enter into interactive mode where you chat back and forth using the keyboard (entering reset
will clear the chat history)
During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model stabilityai/stablelm-zephyr-3b \\\n --max-new-tokens 512 \\\n --prompt 'hi, how are you?' \\\n --prompt 'whats the square root of 900?' \\\n --prompt 'can I get a recipie for french onion soup?'\n
You can also load JSON files containing prompt sequences, like with --prompt /data/prompts/qa.json
(the output of which is below)
\u2022 \u00a0 The model responses are with 4-bit quantization, and are truncated to 256 tokens for brevity. \u2022 \u00a0 These chat questions are from /data/prompts/qa.json
(found in jetson-containers)
Nemotron-Mini-4B-Instruct is a 4B SLM tuned for on-device deployment, RAG, and function calling and is based on Minitron-4B (pruned and distilled from Nemotron4 15B). Inference on Jetson is available through HuggingFace Transformers and llama.cpp for quantization. Here's how to run a local OpenAI-compatible server with llama.cpp and 4-bit quantized GGUF:
jetson-containers run $(autotag llama_cpp) \\\n llama-server \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --host 0.0.0.0 \\\n --port 8080\n
For a quick test, you can navigate your browser to http://JETSON_IP:8080
, connect other clients like Open WebUI, or have applications send requests to your server's OpenAI chat completion endpoints (i.e. from openai-python, REST, JavaScript, ect)
You can more easily see the performance with the llama-cli
tool:
jetson-containers run $(autotag llama_cpp) \\\n llama-cli \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --ignore-eos \\\n -n 128 \\\n -p \"The meaning to life and the universe is\"\n
# Jetson AGX Orin\nllama_print_timings: load time = 1408.27 ms\nllama_print_timings: sample time = 70.05 ms / 128 runs ( 0.55 ms per token, 1827.32 tokens per second)\nllama_print_timings: prompt eval time = 120.08 ms / 9 tokens ( 13.34 ms per token, 74.95 tokens per second)\nllama_print_timings: eval time = 3303.93 ms / 127 runs ( 26.02 ms per token, 38.44 tokens per second)\nllama_print_timings: total time = 3597.17 ms / 136 tokens\n
The model can also be previewed at build.nvidia.com (example client requests for OpenAI API are also there)
"},{"location":"tutorial_slm.html#llama-32","title":"Llama 3.2","text":"Meta has released multilingual 1B and 3B SLMs in the latest additions to the Llama family with Llama-3.2-1B
and Llama-3.2-3B
. These can be run with INT4 quantization using the latest MLC container for Jetson (dustynv/mlc:0.1.2-r36.3.0
). After having requested access to the models from Meta with your HuggingFace API key, you can download, quantize, and benchmark them with these commands:
HUGGINGFACE_KEY=YOUR_API_KEY \\\nMLC_VERSION=0.1.2 \\\n jetson-containers/packages/llm/mlc/benchmark.sh \\\n meta-llama/Llama-3.2-1B\n
Llama-3.2-1B
\u00a0 Jetson Orin Nano 54.8 tokens/sec, Jetson AGX Orin 163.9 tokens/secLlama-3.2-3B
\u00a0 Jetson Orin Nano 27.7 tokens/sec, Jetson AGX Orin 80.4 tokens/secThe Llama-3.2 SLMs use the same core Llama architecture as previous Llama releases (except tie_word_embeddings=True
), so it is already supported with quantization and full performance on edge devices. Thanks to Meta for continuing to advance open generative AI models with Llama.
Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities. All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.
What you need
One of the following Jetson devices (SDXL requires >= ~13GB memory)
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image12.4GB
for SDXL modelsHave followed the previous stable-diffusion-webui
tutorial and have the webserver container running.
Stable Diffusion XL is supported through AUTOMATIC1111's stable-diffusion-webui
with some additional settings. First you need to download the SDXL models to your jetson-containers
data directory (which is automatically mounted into the container)
# run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device\nCONTAINERS_DIR=/path/to/your/jetson-containers\nMODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/\n\nsudo chown -R $USER $MODEL_DIR\n\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors\n
It's assumed that you already have the stable-diffusion-webui
container and webserver running from the previous tutorial.
After the models have finished downloading, click the \ud83d\udd04 button to refresh the model list in the web UI. Select sd_xl_base_1.0.safetensors
from the Stable Diffusion checkpoint drop-down:
Then under the Generation tab, expand the Refiner section, and select sd_xl_refiner_1.0.safetensors
from the drop-down:
Guidance on other relevant settings:
Switch at
setting dictates the step at which the refiner takes over from the base model. At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image.Sampling steps
are between 20-40 and Switch at
is between 0.7-0.9. This takes experimentation to find the best combination for the characteristics of your desired output.low quality, jpeg artifacts, blurry, ect
)CFG Scale
controls how closely the model conforms to your prompt versus how creative it is.When you get a good image, remember to save your random seed and settings so you can re-generate it later!
"},{"location":"tutorial_stable-diffusion-xl.html#results","title":"Results","text":"photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)
a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)
small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)
small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Let's run AUTOMATIC1111's stable-diffusion-webui
on NVIDIA Jetson to generate images from our prompts!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image4.1GB
for SD 1.5 modelThe jetson-containers project provides pre-built Docker images for stable-diffusion-webui
. You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag stable-diffusion-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/stable-diffusion-webui && python3 launch.py \\\n --data=/data/models/stable-diffusion \\\n --enable-insecure-extension-access \\\n --xformers \\\n --listen \\\n --port=7860\n
You should see it downloading the model checkpoint on the first run.
Open your browser and access http://<IP_ADDRESS>:7860
To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui
on NVIDIA Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for container imageThe jetson-containers project provides pre-built Docker images for text-generation-webui
along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect). You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag text-generation-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/text-generation-webui && python3 server.py \\\n --model-dir=/data/models/text-generation-webui \\\n --chat \\\n --listen\n
Open your browser and access http://<IP_ADDRESS>:7860
.
See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using download-model.py
jetson-containers run --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \\\n 'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'\n
From within the web UI, select Model tab and navigate to \"Download model or LoRA\" section.
You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub). Then click the Download button.
"},{"location":"tutorial_text-generation.html#gguf-models","title":"GGUF models","text":"The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.
You can download a single model file for a particular quantization, like *.Q4_K_M.bin
. Input the file name and hit Download button.
TheBloke/Llama-2-7b-Chat-GGUF
llama-2-7b-chat.Q4_K_M.gguf
5,268 TheBloke/Llama-2-13B-chat-GGUF
llama-2-13b-chat.Q4_K_M.gguf
8,609 TheBloke/LLaMA-30b-GGUF
llama-30b.Q4_K_S.gguf
19,045 TheBloke/Llama-2-70B-chat-GGUF
llama-2-70b-chat.Q4_K_M.gguf
37,655 Info
"},{"location":"tutorial_text-generation.html#model-selection-for-jetson-orin-nano","title":"Model selection for Jetson Orin Nano","text":"Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below. The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano. Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.
"},{"location":"tutorial_text-generation.html#load-a-model","title":"Load a model","text":"After you have downloaded a model, click the \ud83d\udd04 button to refresh your model list, and select the model you want to use.
For a GGUF model, remember to
n-gpu-layers
to 128
n_gqa
to 8
if you using Llama-2-70B (on Jetson AGX Orin 64GB)Then click the Load button.
"},{"location":"tutorial_text-generation.html#chat-template","title":"Chat Template","text":"If you're using a Llama model fine-tuned for chat, like the models listed above (except for LLaMA-30b
), you need to use the oobabooga Instruct mode and set the template. On the Parameters tab, go to the Instruction Template sub-tab, then select Llama-v2
from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)
For the base text completion models (like LLaMA-30b
), use the Default or Notebook tab.
Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior. There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like temperature
and top_p
for controlling randomness.
Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode). Then you can start chatting with the LLM!
"},{"location":"tutorial_text-generation.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html#things-to-do-with-your-llm","title":"Things to do with your LLM","text":"Here are some common test prompts for coding, math, history ect. You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else. Also Llama-2 is quite playful and likes to play games to test its logic abilities!
>> What games do you like to play?\n\nI'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:\n\n- 20 Questions: You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions. The goal is to guess the object in as few questions as possible.\n- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!\n- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!\n- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!\n- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own\n- Jokes: I love to tell jokes and make people laugh with my \"Dad humor\"! Knock knock! *giggles*\n
"},{"location":"tutorial_text-generation.html#model-size-tested","title":"Model size tested","text":"With llama.cpp, GGUF model, 4-bit quantization:
Model size Jetson AGX Orin 64GB Jetson AGX Orin 32GB Jetson Orin Nano 8GB 70B model \u2705 30B model \u2705 \u2705 13B model \u2705 \u2705 7B model \u2705 \u2705 \u2705Want to explore using Python APIs to run LLMs directly? See jetson-containers
for its LLM related packages and containers.
Limited to 7B model (4-bit quantized).\u00a0\u21a9
Let's run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT.
Ultralytics YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Nano (4GB)
Running one of the following versions of JetPack:
JetPack 4 (L4T r32.x) JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.
JetPack 4JetPack 5JetPack 6t=ultralytics/ultralytics:latest-jetson-jetpack4\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack5\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack6\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
"},{"location":"tutorial_ultralytics.html#convert-model-to-tensorrt-and-run-inference","title":"Convert model to TensorRT and run inference","text":"The YOLOv8n model in PyTorch format is converted to TensorRT to run inference with the exported model.
Example
PythonCLIfrom ultralytics import YOLO\n\n# Load a YOLOv8n PyTorch model\nmodel = YOLO(\"yolov8n.pt\")\n\n# Export the model\nmodel.export(format=\"engine\") # creates 'yolov8n.engine'\n\n# Load the exported TensorRT model\ntrt_model = YOLO(\"yolov8n.engine\")\n\n# Run inference\nresults = trt_model(\"https://ultralytics.com/images/bus.jpg\")\n
# Export a YOLOv8n PyTorch model to TensorRT format\nyolo export model=yolov8n.pt format=engine # creates 'yolov8n.engine'\n\n# Run inference with the exported model\nyolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'\n
Manufacturing Sports Wildlife Vehicle Spare Parts Detection Football Player Detection Tiger pose Detection Note
Visit the Export page to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when dynamic=False
. To change the input source for inference, please refer to Model Prediction page.
Benchmarks of the YOLOv8 variants with TensorRT were run by Seeed Studio on their reComputer systems:
Xavier NX 8GBOrin NX 16GBAGX Orin 32GB Model PyTorch FP32 FP16 INT8 YOLOv8n 32 63 120 167 YOLOv8s 25 26 69 112 YOLOv8m 11 11 33 56 YOLOv8l 6 6 20 38 Model PyTorch FP32 FP16 INT8 YOLOv8n 56 115 204 256 YOLOv8s 53 67 128 196 YOLOv8m 26 31 63 93 YOLOv8l 16 20 42 69 Model PyTorch FP32 FP16 INT8 YOLOv8n 77 192 323 385 YOLOv8s 67 119 213 303 YOLOv8m 40 56 105 145 YOLOv8l 27 38 73.5 114To learn more, visit our comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson including benchmarks!
Note
Ultralytics YOLOv8 models are offered under AGPL-3.0 License
which is an OSI-approved open-source license and is ideal for students and enthusiasts, promoting open collaboration and knowledge sharing. See the LICENSE file for more details.
Install the package:
pip install jetson-examples\n
Restart your reComputer:
sudo reboot\n
Run Ultralytics YOLO on Jetson with one command:
reComputer run ultralytics-yolo\n
http://127.0.0.1:5001
or http://device_ip:5001
in your browser to access the Web UI.
For more details, please read: Jetson-Example: Run Ultralytics YOLO Platform Service on NVIDIA Jetson Orin .
"},{"location":"tutorial_ultralytics.html#follow-ultralytics-to-stay-updated","title":"Follow Ultralytics to stay updated!","text":""},{"location":"tutorial_voicecraft.html","title":"Tutorial - VoiceCraft","text":"Let's run VoiceCraft, a Zero-Shot Speech Editing and Text-to-Speech in the Wild!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
15.6 GB
for voicecraft
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag voicecraft)\n
The container has a default run command (CMD
) that will automatically start the Gradio app.
Open your browser and access http://<IP_ADDRESS>:7860
.
VoiceCraft repo comes with Gradio demo app.
/data
folder, where are saved to from previous runsWarning
For TTS it's okay to use only first few seconds of audio as prompt, since it consumes a lot of memory. On AGX 32GB Orin the maximal TTS length of generated audio is around ~16 seconds in headless mode.
"},{"location":"tutorial_voicecraft.html#resources","title":"Resources","text":"If you want to know how it works under the hood, you can read following papers:
Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1 GB
for whisper
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag whisper)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server, with SSL enabled.
Open your browser and access https://<IP_ADDRESS>:8888
.
Attention
Note it is https
(not http
).
HTTPS (SSL) connection is needed to allow ipywebrtc
widget to have access to your microphone (for record-and-transcribe.ipynb
).
You will see a warning message like this.
Press \"Advanced\" button and then click on \"Proceed to (unsafe)\" link to proceed to the Jupyter Lab web interface.
The default password for Jupyter Lab is nvidia
.
Whisper repo comes with demo Jupyter notebooks, which you can find under /notebooks/
directory.
jetson-containers
also adds one convenient notebook (record-and-transcribe.ipynb
) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.
record-and-transcribe.ipynb
","text":"This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's medium
model to transcribe the audio sample.
It uses Jupyter notebook/lab's ipywebrtc
extension to record an audio sample on your web browser.
Attention
When you click the \u23fa botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.
Final checkOnce done, if you click on the \"\u26a0 Not secure\" part in the URL bar, you should see something like this.
"},{"location":"tutorial_whisper.html#result","title":"Result","text":"Once you go through all the steps, you should see the transcribe result in text like this.
"},{"location":"vit/index.html","title":"Vision Transformers & Computer Vision","text":"You can quickly run these realtime Vision Transformers (ViT) and computer vision models onboard your Jetson:
"},{"location":"vit/index.html#efficient-vit","title":"Efficient ViT","text":""},{"location":"vit/index.html#nanosam","title":"NanoSAM","text":""},{"location":"vit/index.html#nanoowl","title":"NanoOWL","text":""},{"location":"vit/index.html#sam","title":"SAM","text":""},{"location":"vit/index.html#tam","title":"TAM","text":""},{"location":"vit/index.html#ultralytics-yolov8","title":"Ultralytics YOLOv8","text":""},{"location":"vit/tutorial_efficientvit.html","title":"Tutorial - EfficientViT","text":"Let's run MIT Han Lab's EfficientViT on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
10.9 GB
for efficientvit
container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag efficientvit)\n
"},{"location":"vit/tutorial_efficientvit.html#usage-of-efficientvit","title":"Usage of EfficientViT","text":"The official EfficientViT repo shows the complete usage information: https://github.com/mit-han-lab/efficientvit#usage
Inside the container, a small benchmark script benchmark.py
is added under /opt/efficientvit
directory by the jetson-container build process.
It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.
"},{"location":"vit/tutorial_efficientvit.html#download-l2pt-model","title":"Downloadl2.pt
model","text":"mkdir -p /data/models/efficientvit/sam/\ncd /data/models/efficientvit/sam/\nwget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt\n
The downloaded checkpoint file is stored on the /data/
directory that is mounted from the Docker host.
cd /opt/efficientvit\npython3 ./benchmark.py\n
At the end you should see a summary like the following.
AVERAGE of 2 runs:\n encoder --- 0.062 sec\n latency --- 0.083 sec\nMemory consumption : 3419.68 MB\n
"},{"location":"vit/tutorial_efficientvit.html#check-the-outputresult","title":"Check the output/result","text":"The output image file (of the last inference result) is stored as /data/benchmarks/efficientvit_sam_demo.png
.
It is stored under /data/
directory that is mounted from the Docker host. So you can go back to your host machine, and check jetson-containers/data/benchmark/
directory.
You should find the output like this.
"},{"location":"vit/tutorial_nanoowl.html","title":"Tutorial - NanoOWL","text":"Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7.2 GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl)\n
"},{"location":"vit/tutorial_nanoowl.html#how-to-run-the-tree-prediction-live-camera-example","title":"How to run the tree prediction (live camera) example","text":"Ensure you have a camera device connected
ls /dev/video*\n
If no video device is found, exit from the container and check if you can see a video device on the host side.
Launch the demo
cd examples/tree_demo\npython3 tree_demo.py ../../data/owl_image_encoder_patch32.engine\n
Info
If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.
python3 -m nanoowl.build_image_encoder_engine \\\n data/owl_image_encoder_patch32.engine\n
Second, open your browser to http://<ip address>:7860
Type whatever prompt you like to see what works!
Here are some examples
[a face [a nose, an eye, a mouth]]
[a face (interested, yawning / bored)]
(indoors, outdoors)
Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.3GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nanosam)\n
"},{"location":"vit/tutorial_nanosam.html#run-examples","title":"Run examples","text":"Inside the container, you can move to /opt/nanosam
directory, to go through all the examples demonstrated on the repo.
cd /opt/nanosam\n
To run the \"Example 1 - Segment with bounding box\":
python3 examples/basic_usage.py \\\n --image_encoder=\"data/resnet18_image_encoder.engine\" \\\n --mask_decoder=\"data/mobile_sam_mask_decoder.engine\"\n
The result is saved under /opt/nanosam/data/basic_usage_out.jpg
.
To check on your host machine, you can copy that into /data
directory of the container where that is mounted from the host.
cp data/basic_usage_out.jpg /data/\n
Then you can go to your host system, and find the file under jetson-containers/data/basic_usage_out.jpg
Let's run Meta's SAM
on NVIDIA Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag sam)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
In Jupyter Lab, navigate to notebooks
and open automatic_mask_generator_example.py
notebook.
Create a new cell at the top, insert the model download command below and run the cell.
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\n
Then go through executing all the cells below Set-up.
"},{"location":"vit/tutorial_sam.html#results","title":"Results","text":"The biggest vit_h
(2.4GB) model may not ran due to OOM, but vit_l
(1.1GB) runs on Jetson Orin Nano.\u00a0\u21a9
Let's run TAM
to perform Segment Anything on videos on NVIDIA Jetson.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container imageClone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag tam)\n
The container has a default run command (CMD
) that will automatically start TAM's web server.
Open your browser and access http://<IP_ADDRESS>:12212
.
Check out the official tutorial to learn how to operate the web UI.
"},{"location":"vit/tutorial_tam.html#results","title":"Results","text":""},{"location":"vit/tutorial_tam.html#troubleshooting","title":"Troubleshooting","text":""},{"location":"vit/tutorial_tam.html#filenotfounderror-errno-2-no-such-file-or-directory-checkpointse2fgvi-hq-cvpr22pth","title":"FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/E2FGVI-HQ-CVPR22.pth'
","text":"You may find the TAM app fails to download a checkpoint file E2FGVI-HQ-CVPR22.pth
.
Downloading checkpoints from Google Drive... tips: If you cannot see the progress bar, please try to download it manuall and put it in the checkpointes directory. E2FGVI-HQ-CVPR22.pth: https://github.com/MCG-NKU/E2FGVI(E2FGVI-HQ model)\nAccess denied with the following error:\n\n Cannot retrieve the public link of the file. You may need to change\n the permission to 'Anyone with the link', or have had many accesses. \n\nYou may still be able to access the file from the browser:\n\n https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \n
You can manually download the checkpoint file on your Docker host machine.
cd jetson-containers/\npip install gdown\nsource ~/.profile\ngdown https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \nmv E2FGVI-HQ-CVPR22.pth ./data/models/tam/\n
And you can try running the TAM container.
jetson-containers run $(autotag tam)\n
"}]}
\ No newline at end of file