diff --git a/search/search_index.json b/search/search_index.json
index 2d338f2..b75c3f1 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"agent_studio.html","title":"Agent Studio","text":"Rapidly design and experiment with creating your own automation agents, personal assistants, and edge AI systems in an interactive sandbox for connecting multimodal LLMs, speech and vision transformers, vector databases, prompt templating, and function calling to live sensors and I/O. Optimized for deployment onboard Jetson with on-device compute, low-latency streaming, and unified memory.
\u2022 Edge LLM inference with quantization and KV caching (NanoLLM) \u2022 Realtime vision/language models (ala Live Llava and Video VILA) \u2022 Speech recognition and synthesis (Whisper ASR, Piper TTS, Riva) \u2022 Multimodal vector database from NanoDB \u2022 Audio and video streaming (WebRTC, RTP, RTSP, V4L2) \u2022 Performance monitoring and profiling \u2022 Native bot-callable functions and agent tools \u2022 Extensible plugins with auto-generated UI controls \u2022 Save, load, and export pipeline presets \u2022 Native bot-callable functions and agent tools"},{"location":"agent_studio.html#running-agent-studio","title":"Running Agent Studio","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>5GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
This will start the server running on your device. You can then navigate your browser to https://IP_ADDRESS:8050
You can load a preset at startup with the --load
flag (like --load /data/nano_llm/presets/xyz.json
) The default port is 8050, but can be changed with --web-port
(and --ws-port
for the websocket port) Request access to the Llama models on HuggingFace and substitute your account's API token above. "},{"location":"agent_studio.html#dev-mode","title":"Dev Mode","text":"To make code changes without rebuilding the container, clone the NanoLLM sources and then mount them over /opt/NanoLLM
git clone https://github.com/dusty-nv/NanoLLM\njetson-containers run \\\n -v ${PWD}/NanoLLM:/opt/NanoLLM \\\n $(autotag nano_llm)\n
You can then edit the source from outside the container. And in the terminal that starts, you can install other packages from apt/pip/ect.
"},{"location":"agent_studio.html#plugins","title":"Plugins","text":"The models, I/O, and logic blocks share a lightweight plugin interface, exposing a process()
function and any runtime properties:
Custom Pluginfrom nano_llm import Plugin\n\nclass TextFilter(Plugin):\n def __init__(self, search: str='bad word', replace: str='censored', **kwargs):\n \"\"\"\n A simple plugin performing text replacement. Write docs because they're extracted for the UI,\n and for the tool descriptions that go to the bot for function calling (and use the type hints) \n\n Args:\n search: The string to look for an replace in incoming text.\n replace: What to replace occurances of the search string with.\n \"\"\"\n super().__init__(inputs=['text'], outputs=['text'])\n\n # set these attributes and register them as configurable\n self.add_parameters(search=search, replace=replace)\n\n def process(self, input, **kwargs):\n \"\"\" Substitute the text, and the return value will be sent to the plugins connected to this. \"\"\"\n return input.replace(self.search, self.replace)\n
Typically each plugin runs asynchronously its own thread and queue of data, although some lightweight routines are executed inline. Usually they're in the same process, but could make requests to microservices. Websockets are used for clients and the web UI.
Below are descriptions of commonly-used components. Help text for these is extracted from their Python docs and shown in the UI, along with settings that you can control. In the node editor, click on a plugin to open its grid widget or configuration window.
"},{"location":"agent_studio.html#llm","title":"LLM","text":"NanoLLMAutoPromptUserPromptTextStream > nano_llm.plugins.NanoLLM
(source)
Load quantized LLM/VLM with MLC (speed), AWQ (quality), or HF Transformers (compatability)\n\n - Inputs\n * str, list[str], np.ndarray, torch.Tensor, cudaImage, PIL.Image \n\n - Commands \n * /reset /refresh /generate (these inputs can control the chat)\n\n - Outputs\n * delta The latest detokenized text to be generated\n * partial All of the text generated in the response so far\n * final The complete response sent at the end of the request\n * words Similar to 'delta', but with at least one word at a time\n * history A list of the messages in the chat history (as dictionaries)\n * tools Enable the bot to call functions in plugins connected on this channel\n
> nano_llm.plugins.AutoPrompt
(source)
Apply a template each time new data comes in, for example '<image> Describe the image' \n would tag each incoming image along with that text prompt (with images, usually for VLM)\n\n AutoPrompt can be used to compose structured messages that reference prior inputs, \n like `Image 1: <image> Image 2: <image>` or using <text> for last text recieved: \n\n The three most recent replies were:\n * <text>\n * <text>\n * <text> \n\nThese most recent inputs are used in newest to oldest order from a LIFO queue.\n
> nano_llm.plugins.UserPrompt
(source)
This plugin is for inputting text on the keyboard, either from the terminal or text box in the UI.\n\n It can also load prompts from a text or json file, and those files can reference other files to load.\n
> nano_llm.plugins.TextStream
(source)
Simple plugin for viewing any stream of text from the system in a text box in the UI. \n\n For sources emanating from ASR or LLM, can apply color highlighting to partial/final responses.\n
"},{"location":"agent_studio.html#speech","title":"Speech","text":"Whisper ASRPiper TTSVAD FilterRiva > nano_llm.plugins.WhisperASR
(source)
Whisper streaming speech-to-text with TensorRT (github.com/NVIDIA-AI-IOT/whisper_trt)\n\n Supports these Whisper models: 'tiny' (39M), 'base' (74M), 'small' (244M)\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * final The text transcript of the entire segment spoken in the input\n * partial Rolling partial transcript of the words spoken so far in the audio segment\n
The Whisper plugin is designed to be used in a pipeline with the VAD Filter to reduce erroneous output from background noise:
> nano_llm.plugins.PiperTTS
(source)
Piper text-to-speech using CUDA and onnxruntime (github.com/rhasspy/piper)\n\n Populates a list of the available Piper models that it can download, along with the speaker voices \n for multi-speaker models. The default model is 'en_US-libritts-high'. Has controls for voice speed.\n\n - Inputs\n * str either individual words, sentences, or an entire passage to synthesize\n\n - Outputs\n * audio np.ndarray (int16) at 16KHz sample rate (for low-quality models) or 22.5KHz (for medium/high)\n
This PiperTTS preset is available, which uses RateLimit to stream the output at 1x realtime factor (so it can be muted later)
> nano_llm.plugins.VADFilter
(source)
Voice activity detection model using Silero. Drops incoming audio unless it exceeds the VAD threshold.\n\n Use this on incoming audio before ASR plugins to reduce spurious transcripts produced from background noise.\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * audio np.ndarray or torch.Tensor (only when VAD confidence > threshold)\n
Riva ASR / TTS services - these run in another container over GRPC. See these resources for setup:
NGC - riva-embedded quickstart guide
jetson-containers - riva container setup
JetsonHacks - Speech AI on NVIDIA Jetson Tutorial
"},{"location":"agent_studio.html#audio","title":"Audio","text":"AudioInputDevice
USB/I2S soundcard input via PulseAudio/PyAudio (for microphone directly attached to system) AudioOutputDevice
USB/I2S soundcard output via PulseAudio/PyAudio (for speakers directly attached to system) AudioRecorder
Save an audio stream to WAV file on the server WebAudioIn
Recieve audio samples streamed from the client over websockets WebAudioOut
Transmit audio samples to the client over websockets"},{"location":"agent_studio.html#video","title":"Video","text":"VideoSource
Capture images from camera device (V4L2/CSI), network stream (RTP, RTSP), video file (MP4, MKV, AVI, FLV) VideoOutput
Output H264/H265-encoded video to network stream (RTP, RTSP, WebRTC), display, or file (MP4, MKV, AVI, FLV) VideoOverlay
Draw text on top of video streams for HUD or OSD-style display RateLimit
Throttle transmission to a specified rate (can also be used with audio)"},{"location":"agent_studio.html#database","title":"Database","text":"NanoDB
Optimized in-memory multimodal vectorDB for txt2img/img2img similarity search and image tagging [[\u2197]](tutorial_nanodb.md) Deduplicate
Filter incoming text against previous entries with embedding model and cosine similarity over a timeout period EventFilter
Look for keywords or search strings in streams of text, keeping track of the begin/end times that these occur."},{"location":"agent_studio.html#tips-tricks","title":"Tips & Tricks","text":"Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recreated in the dynamic environment of Agent Studio without needing to touch any code. Here are some practical tips as you go about trying different combinations of model pipelines:
"},{"location":"agent_studio.html#ui","title":"UI","text":" To add an element to the agent pipeline, either use the menus in the node editor title bar, or right click inside the editor. Inserting elements via the context menu (right click) will position them at your cursor, so you needn't reposition them. To remove a component from the pipeline, right click on it and then select the little X button in the top right. To remove a connection between components, first select the link and then right click for the X button to appear. Click on a node in the graph to open its grid widget if it implements one. Click again to open its settings dialog. "},{"location":"agent_studio.html#save-load","title":"Save & Load","text":" Under the Agent menu, there is a Save button which will serialize the current pipeline to JSON. These presets are saved in a mounted directory on the host, under jetson-containers/data/nano_llm/presets
Some commonly-used subgraphs are already included to quickly add and combine (like ASR, TTS, and video VLM) You can load these via the Agent menu, by right-clicking inside the node editor, or with the --load
flag at startup. "},{"location":"agent_studio.html#memory","title":"Memory","text":" As you are adding models, keep an eye on the system resources in the top right to make sure you don't run out of memory. Due to loading times, the models are cached in memory even after you remove them from the node editor. To actually free the memory, use the Clear Cache
button in the Agent menu. "},{"location":"agent_studio.html#performance","title":"Performance","text":" You can monitor the system's CPU and GPU usage in the top right corner. Various statistics are refreshed in the nodes to see performance of each component. Use the RateLimiter
plugin to throttle data sources and balance resources. The drop_inputs
setting that some plugins expose will always keep them up with the latest request. "},{"location":"agent_studio.html#templating","title":"Templating","text":" \ud83e\udd16 Have fun bot building! If you need help, reach out on the Jetson Forums or GitHub Issues.
"},{"location":"benchmarks.html","title":"Benchmarks","text":"Below are AI inferencing benchmarks for Jetson Orin Nano Super and Jetson AGX Orin.
"},{"location":"benchmarks.html#jetson-orin-nano-super","title":"Jetson Orin Nano Super","text":"LLM / SLMVision / Language ModelsVision Transformers Model Jetson Orin Nano (original) Jetson Orin Nano Super Perf Gain (X) Llama 3.1 8B 14 19.14 1.37 Llama 3.2 3B 27.7 43.07 1.55 Qwen2.5 7B 14.2 21.75 1.53 Gemma 2 2B 21.5 34.97 1.63 Gemma 2 9B 7.2 9.21 1.28 Phi 3.5 3B 24.7 38.1 1.54 SmolLM2 41 64.5 1.57 For running these benchmarks, this script will launch a series of containers that download/build/run the models with MLC and INT4 quantization.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\nbash jetson-containers/packages/llm/mlc/benchmarks.sh\n
Model Jetson Orin Nano (original) Jetson Orin Nano Super Perf Gain (X) VILA 1.5 3B 0.7 1.06 1.51 VILA 1.5 8B 0.574 0.83 1.45 LLAVA 1.6 7B 0.412 0.57 1.38 Qwen2 VL 2B 2.8 4.4 1.57 InternVL2.5 4B 2.5 5.1 2.04 PaliGemma2 3B 13.7 21.6 1.58 SmolVLM 2B 8.1 12.9 1.59
Model Jetson Orin Nano (original) Jetson Orin Nano Super Perf Gain (X) clip-vit-base-patch32 196 314 1.60 clip-vit-base-patch16 95 161 1.69 DINOv2-base-patch14 75 126 1.68 SAM2 base 4.42 6.34 1.43 Grounding DINO 4.11 6.23 1.52 vit-base-patch16-224 98 158 1.61 vit-base-patch32-224 171 273 1.60"},{"location":"benchmarks.html#jetson-agx-orin","title":"Jetson AGX Orin","text":"Large Language Models (LLM)Small Language Models (SLM)Vision Language Models (VLM)Vision Transformers (ViT)Stable DiffusionRivaVector Database For running LLM benchmarks, see the MLC
container documentation.
Small language models are generally defined as having fewer than 7B parameters (Llama-7B shown for reference) For more data and info about running these models, see the SLM
tutorial and MLC
container documentation.
This measures the end-to-end pipeline performance for continuous streaming like with Live Llava. For more data and info about running these models, see the NanoVLM
tutorial.
VIT performance data from [1] [2] [3]
For running Riva benchmarks, see ASR Performance and TTS Performance.
For running vector database benchmarks, see the NanoDB
container documentation.
"},{"location":"community_articles.html","title":"Community Projects","text":"Below, you'll find a collection of guides, tutorials, and articles contributed by the community showcasing the implementation of generative AI on the Jetson platform.
"},{"location":"community_articles.html#github-jetbot-voice-activated-copilot-tools-with-ros2-riva-and-nanollm-9-21-2024","title":"GitHub Jetbot Voice-Activated Copilot Tools with ROS2, RIVA, and NanoLLM (9-21-2024) ","text":"Jen Hung Ho created ROS2 nodes for ASR/TTS and LLM/VLM on Jetson that can can be used to control JetBot, including customizable voice commands and the execution of advanced actions. Check it out on GitHub under Jen-Hung-Ho/ros2_jetbot_tools
and Jen-Hung-Ho/ros2_jetbot_voice
and on the forums here.
"},{"location":"community_articles.html#hackster-ai-powered-photorealistic-talking-avatar-4-26-2024","title":"Hackster AI-Powered Photorealistic Talking Avatar (4-26-2024) ","text":"Nurgaliyev Shakhizat creates an interactive talking avatar using ASR, TTS, LLM, and Audio2Face with NVIDIA Omniverse and Unreal Engine:
"},{"location":"community_articles.html#hackster-an-eye-for-an-item-4-26-2024","title":"Hackster An Eye for an Item (4-26-2024) ","text":"Allen Tao builds an indoor robot with Isaac ROS that maps your home and remembers where things are using SLAM and NanoDB!
"},{"location":"community_articles.html#hackster-escalator-people-tracker-4-2-2024","title":"Hackster Escalator People Tracker (4-2-2024) ","text":"George Profenza & team install local CV solution for detecting and tracking people in large retail spaces to drive live Generative AI graphics:
"},{"location":"community_articles.html#hackster-edge-style-fashion-preview-at-the-edge-4-1-2024","title":"Hackster Edge Style: Fashion Preview at the Edge (4-1-2024) ","text":"AI-powered app from Andrei Ciobanu shows virtual try-ons with customer images, enhancing retail shopping using Jetson Orin for speed and privacy:
"},{"location":"community_articles.html#hackster-cooking-meals-with-a-local-ai-assistant-on-jetson-axg-orin-4-1-2024","title":"Hackster Cooking meals with a local AI assistant on Jetson AXG Orin (4-1-2024) ","text":"Dimiter Kendri builds a multimodal, multi AI agent, fully local, conversational chatbot with multi agent research capabilities via speech queries:
"},{"location":"community_articles.html#hackster-realtime-language-segment-anything-on-jetson-orin-3-4-2024","title":"Hackster Realtime Language-Segment-Anything on Jetson Orin (3-4-2024) ","text":"Huy Mai enables Segment Anything (SAM) with natural language prompting using GroundingDINO for object detection:
"},{"location":"community_articles.html#github-japanese-nmt-translation-for-stable-diffusion-2-23-2024","title":"GitHub Japanese NMT Translation for Stable Diffusion (2-23-2024) ","text":"Toshihiko Aoki has created a prompt generator for stable-diffusion-webui
that translates Japanese queries into English using a fine-tuned GPT-2 NMT model before feeding them into Stable Diffusion. Check out the full guide on GitHub under to-aoki/ja-tiny-sd-webui
, including the training dataset and LoRA building!
Your browser does not support the video tag."},{"location":"community_articles.html#hackster-clearwater-underwater-image-enhancement-with-generative-ai-2-16-2024","title":"Hackster ClearWater: Underwater Image Enhancement with Generative AI (2-16-2024) ","text":"Vy Pham has created a novel denoising pipeline using a custom trained Transformer-based diffusion model and GAN upscaler for image enhancement, running on Jetson AGX Orin. It runs interactively in a Streamlit web UI for photo capturing and the processing of images and videos. Great work!
Your browser does not support the video tag."},{"location":"community_articles.html#hackster-ai-powered-application-for-the-blind-and-visually-impaired-12-13-2023","title":"Hackster AI-Powered Application for the Blind and Visually Impaired (12-13-2023) ","text":"Nurgaliyev Shakhizat demonstrates a locally-hosted Blind Assistant Device running on Jetson AGX Orin 64GB Developer Kit for realtime image-to-speech translation:
\u00a0 Find more resources about this project here: [Hackster] [GitHub]
"},{"location":"community_articles.html#daves-armoury-bringing-glados-to-life-with-robotics-and-ai-2-8-2024","title":"Dave's Armoury Bringing GLaDOS to life with Robotics and AI (2-8-2024) ","text":"See how DIY robotics legend Dave Niewinski from davesarmoury.com brings GLaDOS to life using Jetson AGX Orin, running LLMs onboard alongside object + depth tracking, and RIVA ASR/TTS with a custom-trained voice model for speech recognition and synthesis! Using Unitree Z1 arm with 3D printing and StereoLabs ZED2.
\u00a0 Find more resources about this project here: [Forums] [GitHub]
"},{"location":"community_articles.html#hackster-seeed-studios-local-voice-chatbot-puts-a-speech-recognizing-llama-2-llm-on-your-jetson-2-7-2024","title":"Hackster Seeed Studio's Local Voice Chatbot Puts a Speech-Recognizing LLaMa-2 LLM on Your Jetson (2-7-2024) ","text":"Seeed Studio has announced the launch of the Local Voice Chatbot, an NVIDIA Riva- and LLaMa-2-based large language model (LLM) chatbot with voice recognition capabilities \u2014 running entirely locally on NVIDIA Jetson devices, including the company's own reComputer range. Follow the step-by-step guide on the Seeed Studio wiki.
"},{"location":"community_articles.html#youtube-genai-nerds-react-insider-look-at-nvidias-newest-generative-ai-2-6-2024","title":"YouTube GenAI Nerds React - Insider Look at NVIDIA's Newest Generative AI (2-6-2024) ","text":"Watch this panel about the latest trends & tech in edge AI, featuring Kerry Shih from OStream, Jim Benson from JetsonHacks, and Dusty from NVIDIA.
"},{"location":"community_articles.html#nvidia-bringing-generative-ai-to-life-with-nvidia-jetson-11-7-2023","title":"NVIDIA Bringing Generative AI to Life with NVIDIA Jetson (11-7-2023) ","text":"Watch this webinar about deploying LLMs, VLMs, ViTs, and vector databases onboard Jetson Orin for building next-generation applications using Generative AI:
"},{"location":"community_articles.html#jetsonhacks-jetson-ai-labs-generative-ai-playground-10-31-2023","title":"JetsonHacks Jetson AI Labs \u2013 Generative AI Playground (10-31-2023) ","text":"JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model.
"},{"location":"community_articles.html#hackster-vision2audio-giving-the-blind-an-understanding-through-ai-10-15-2023","title":"Hackster Vision2Audio - Giving the blind an understanding through AI (10-15-2023) ","text":"Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people:
"},{"location":"community_articles.html#nvidia-generative-ai-models-at-the-edge-10-19-2023","title":"NVIDIA Generative AI Models at the Edge (10-19-2023) ","text":"Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:
\u00a0 Technical Blog - https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/
"},{"location":"community_articles.html#medium-how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-10-02-2023","title":"Medium How to set up your Jetson device for LLM inference and fine-tuning (10-02-2023) ","text":"Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.
https://medium.com/@michaelyuan_88928/how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-682e36444d43
"},{"location":"community_articles.html#hackster-getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-09-16-2023","title":"Hackster Getting Started with AI on Nvidia Jetson AGX Orin Dev Kit (09-16-2023) ","text":"Nurgaliyev Shakhizat demonstrates llamaspeak
on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
"},{"location":"community_articles.html#hackster-new-ai-tool-is-generating-a-lot-of-buzz-09-13-2023","title":"Hackster New AI Tool Is Generating a Lot of Buzz (09-13-2023) ","text":"Nick Bild provides an insightful introduction to the Jetson Generative AI Playground:
https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598
"},{"location":"community_articles.html#jetsonhacks-use-these-jetson-docker-containers-tutorial-09-04-2023","title":"JetsonHacks Use These! Jetson Docker Containers Tutorial (09-04-2023) ","text":"JetsonHacks has a in-depth tutorial on how to use jetson-containers
and even show text-generation-webui
and stable-diffusion-webui
containers in action!
"},{"location":"community_articles.html#hackster-llama-2-llms-w-nvidia-jetson-and-textgeneration-web-ui-08-17-2023","title":"Hackster LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui (08-17-2023) ","text":"Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
"},{"location":"community_articles.html#hackster-running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-08-14-2023","title":"Hackster Running a ChatGPT-Like LLM-LLaMA2 on a Nvidia Jetson Cluster (08-14-2023) ","text":"Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat:
"},{"location":"community_articles.html#jetsonhacks-speech-ai-on-nvidia-jetson-tutorial-08-07-2023","title":"JetsonHacks Speech AI on NVIDIA Jetson Tutorial (08-07-2023) ","text":"JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.
"},{"location":"community_articles.html#hackster-llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-07-12-2023","title":"Hackster LLM based Multimodal AI w/ Azure Open AI & NVIDIA Jetson (07-12-2023) ","text":"Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.
"},{"location":"community_articles.html#hackster-how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-06-13-2023","title":"Hackster How to Run a ChatGPT-Like LLM on NVIDIA Jetson board (06-13-2023) ","text":"Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.
"},{"location":"getting-started.html","title":"Getting started","text":""},{"location":"hello_ai_world.html","title":"Hello AI World","text":"Hello AI World is an in-depth tutorial series for DNN-based inference and training of image classification, object detection, semantic segmentation, and more. It is built on the jetson-inference
library using TensorRT for optimized performance on Jetson.
It's highly recommended to familiarize yourself with the concepts of machine learning and computer vision before diving into the more advanced topics of generative AI here on the Jetson AI Lab. Many of these models will prove useful to have during your development.
HELLO AI WORLD >> https://github.com/dusty-nv/jetson-inference
"},{"location":"initial_setup_jon.html","title":"Initial Setup Guide for Jetson Orin Nano Developer Kit","text":"Note
This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.
The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.
With the December 2024 software update (JetPack 6.1 (rev.1)), this advanced edge computer delivers up to 70% more performance, making it an even more powerful platform for the era of generative AI.
This guide explains the complete flow from opening the box, updating the firmware if needed, flashing the latest JetPack 6.1 (rev. 1) image on SD card, and the initial software setup, so that you will be ready for tutorials listed on this site and other AI projects.
"},{"location":"initial_setup_jon.html#check-your-inventory","title":"Check your inventory","text":"The following item is needed or highly desired to set up your Jetson Orin Nano Developer Kit. If you don't have them in your inventory, you want to arrange them and return to this guide once they are available.
What not come in the box - What you need/want to prepare
"},{"location":"initial_setup_jon.html#storage","title":"Storage","text":" microSD card (64GB or bigger) NVMe SSD (Optional, but highly recommended for following tutorials on this site) "},{"location":"initial_setup_jon.html#mean-to-access-terminal","title":"Mean to access terminal","text":"You need either of the following set:
DisplayPort cable, DisplayPort capable monitor and a USB keyboard DisplayPort to HDMI cable and HDMI capable monitor (or TV) and a USB keyboard USB to TTL Serial cable (Advanced) "},{"location":"initial_setup_jon.html#alternative-method-sdk-manager","title":"\ud83d\udef8 Alternative method : SDK Manager","text":"In case you have an x86 PC running Ubuntu 22.04 or 20.04, then you can flash your Jetson Orin Nano Developer Kit with the latest firmware and JetPack all at once using NVIDIA SDK Manager.
Also, if you wish to not use a microSD card but rather use a large NVMe SSD for the OS and data, then you need to use SDK Manager to flash the latest JetPack on the NVMe SSD.
flowchart LR\n A(start) --> B{Want to ditch microSD<br>and only use NVMe SSD?}\n B --[YES] --> S[\ud83d\udef8 SDK Manager method]\n B --[No] --> C{Have x86 PC running<br> Ubuntu 22.04/20.04?}\n C --[YES] --> S\n C --[No] --> U[\ud83d\ude80 microSD-only method]\n\n style S stroke-width:3px, fill:#f3e9f2,stroke:#b544c4\n style U stroke-width:3px, fill:#d2e9e5,stroke:#0e7a71
Click the button below to jump to a page that explains the alternative setup method using SDK Manager if you want to set your Jetson Orin Nano Developer Kit with an NVMe SSD or just want to flash all at once with your Ubuntu PC.
\ud83d\udef8 SDK Manager method
Otherwise, continue reading on this page for the microSD-only setup method.
"},{"location":"initial_setup_jon.html#overall-flow-microsd-only-method","title":"Overall flow (microSD-only method)","text":"Jetson Orin Nano Initial Setup Flowchart (microSD-only method)
flowchart\n A(start) --> B{1\ufe0f\u20e3 Check if Jetson UEFI Firmware<br>is newer than version 36.0}\n B --[YES] --> O[6\ufe0f\u20e3 Boot with JetPack 6.1 rev.1 microSD card<br> to schedule firmware update]\n B --[No] --> C[2\ufe0f\u20e3 Boot with JetPack 5.1.3 microSD card<br>to schedule firmware update]\n C --> D[3\ufe0f\u20e3 Reboot] --> E{{Firmware update to 5.0 during reboot}}\n E --> F[4\ufe0f\u20e3 Run QSPI updater] --> G[5\ufe0f\u20e3 Reboot] --> H{{QSPI update during reboot - Firmware 36.4.0}}\n H --> O\n O --> P[7\ufe0f\u20e3 Reboot]\n P --> Q{{Firmware update to 36.4.2 during reboot}}\n Q --> R[8\ufe0f\u20e3 Unlock super performance] \n R --> S(\ud83d\udc4d Start developing on JetPack 6.x) \n\n\n style C fill:#fee\n style D fill:#DEE,stroke:#333\n style E stroke-width:2px,stroke-dasharray: 5 5\n style F stroke-width:4px\n style G fill:#DEE,stroke:#333\n style H stroke-width:2px,stroke-dasharray: 5 5\n style O fill:#fee\n style P fill:#DEE,stroke:#333\n style Q stroke-width:2px,stroke-dasharray: 5 5\n style R fill:#f2d5ff
Note that it will undergo a total of three (3) reboot cycles.
"},{"location":"initial_setup_jon.html#walk-through-video","title":"Walk-through Video","text":"Click here to expand and watch video"},{"location":"initial_setup_jon.html#1-check-if-jetson-uefi-firmware-version-360","title":"1\ufe0f\u20e3 Check if Jetson UEFI Firmware version > 36.0
","text":"Your Jetson Orin Nano Developer Kit may have the latest firmware (\"Jetson UEFI firmware\" on QSPI-NOR flash memory) flashed at the factory.
If not, we need to go through a set of procedures to upgrade to the latest firmware. (Luckily, we can now do this all just on Jetson, meaning we don't need to use a host Ubuntu PC any more!)
So let's first check the version of your Jetson UEFI Firmware. You can take one of the following methods.
Monitor-attached Headless\ud83d\ude01I'm feeling lucky Connect your monitor and USB keyboard to your developer kit. Turn on the developer kit by plugging in the bundled DC power supply Repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor You should see UEFI setup menu screen Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware Below image shows that this Jetson Orin Nano Developer Kit unit has the old UEFI firmware, indicating that the unit first needs its firmware/QSPI updated in order to run JetPack 6.x.
Connect USB to TTL Serial cable onto the following pins on J14
\"button\" header of carrier board located under the Jetson module. On your PC, run your console monitor program and open the USB serial port. Power on the developer kit by plugging in the bundled DC power supply On the PC console, repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor You should see UEFI setup menu screen Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware Warning
Avoid leaving the device in a black screen state for extended periods or attempting to boot repeatedly with an SD card containing an incompatible JetPack version.
There is a chance that accumulated boot failures may trigger the L4T launcher to always boot into recovery kernel.
If this happens, even after switching to the proper JetPack 5.1.3 SD card, it won't boot from the SD card until we manually change the L4T Launcher setting.
Although we don't recommend for the above stated reason, you could skip to 6\ufe0f\u20e3 Boot with JetPack 6.x SD card, and try your luck to see if your Jetson just boots with the JetPack 6.x SD card.
If the Ubuntu desktop does not appear within 3 minutes and the screen remains black (as shown below), it indicates that the unit is running an older UEFI firmware version that is incompatible with the JetPack 6.x SD card.
Turn the unit off, and skip to the next section (2\ufe0f\u20e3 Boot with JetPack 5.1.3 SD card to schedule firmware update).
"},{"location":"initial_setup_jon.html#determine-qspi-update-is-necessary-or-not","title":"Determine QSPI update is necessary or not","text":"Attention
Select the appropriate tab below based on your firmware version you found in the above step.
If you found your Jetson Orin Nano needs its firmware updated to run JetPack 6.x, click \" Firmware < 36.0\" tab, and then additional step 2 to 5 will appear for you to follow.
If you know your Jetson Orin Nano has the latest firmware, stay on \" Firmware 36.x\" tab, and skip to the next section (6\ufe0f\u20e3 Boot with JetPack 6.x SD card)
Firmware < 36.0 Firmware 36.x Your Jetson Orin Nano needs its firmware updated in order to make JetPack 6.x SD card work.
Perform the following steps (2 to 5).
Your Jetson Orin Nano has the latest firmware that is ready for JetPack 6.x SD card.
Skip to the next section (6\ufe0f\u20e3 Boot with JetPack 6.x SD card)
"},{"location":"initial_setup_jon.html#2-boot-with-jetpack-513-sd-card-to-schedule-firmware-update","title":"2\ufe0f\u20e3 Boot with JetPack 5.1.3 SD card to schedule firmware update","text":"First, we need to run JetPack 5.1.3 in order to let its nvidia-l4t-bootloader
package get its bootloader/firmware updater activated, so that the firmware update automatically runs the next time it reboots.
Download SD card image on to your PC
On your PC, download JetPack 5.1.3 image for Jetson Orin Nano Developer Kit from the official JetPack 5.1.3 page or from the below direct link button.
Warning
NVIDIA had updated the JetPack 5.1.3 image on 5/28/2024, as the old version had some issue and the following process did not work.So please download and use the latest image (the new file name is JP513-orin-nano-sd-card-image_b29.zip
).
Jetson Orin Nano Developer KitJetPack 5.1.3 image
Use Balena Etcher to flash image to SD card
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the flashed microSD card into the slot on Jetson module
Power-on
Turn on the Jetson Orin Nano Developer Kit with JetPack 5.1.3 SD card inserted by plugging in the DC power supply.
Complete the initial software setup (oem-config
)
Info
There is a chance that JetPack 5.1.3 SD card does not show any desktop UI on the attached display (the display remains black).
It would still schedule the UEFI firmware update in the background, so when you power cycle the unit, most likely it will perform the firmware update on the next boot up.
So if this happens, skip to the next step (3\ufe0f\u20e3-2).
Ensure firmware update is scheduled.
Once Jetson boots into Jetson Linux system, a background service automatically runs to schedule a firmware update (if needed) to be performed during the next boot-up process.
Once you see the following, or just wait about 5 minutes after powering on to ensure the scheduling is done, reboot.
GUI CUI
$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n
"},{"location":"initial_setup_jon.html#3-reboot-and-observe-firmware-update-to-50","title":"3\ufe0f\u20e3 Reboot and observe firmware update to 5.0
","text":" Reboot
Reboot your Jetson Orin Nano Developer Kit.
GUI CUI On the Ubuntu desktop click the power icon () and select \"Restart...\".
$ sudo reboot\n
Observe firmware update
You should see the following during the boot up process.
With monitor Headless (serial)
Once done, you will boot into JetPack 5.1.3 (again), with underlying firmware updated to 5.0-35550185
.
"},{"location":"initial_setup_jon.html#4-run-qspi-updater-package-to-schedule-qspi-update","title":"4\ufe0f\u20e3 Run QSPI Updater package to schedule QSPI update","text":"Now that your UEFI firmware is updated to 35.5.0 ( = JetPack 5.1.3 ), it is capable of updating the entire QSPI content to make it ready for JetPack 6.x.
We will run a special tool so that the entire QSPI update is scheduled to run automatically on the next boot-up.
Double-check your firmware version is up to date (35.5.0
= JetPack 5.1.3)
Once it reboots back into Jetson Linux system, on Jetson terminal, run the following:
sudo nvbootctrl dump-slots-info\n
You should see something like the following, with the Current version indicating 35.5.0
.
Current version: 35.5.0\nCapsule update status: 0\nCurrent bootloader slot: A\nActive bootloader slot: A\nnum_slots: 2\nslot: 0, status: normal\nslot: 1, status: normal\n
Install QSPI Updater Debian package to trigger the entire QSPI update
On Jetson terminal, run the following:
sudo apt-get install nvidia-l4t-jetson-orin-nano-qspi-updater\n
Installing the nvidia-l4t-jetson-orin-nano-qspi-updater
automatically runs its script to schedule the entire QSPI update to be performed during the next boot process, so that the firmware is ready for JetPack 6.x.
"},{"location":"initial_setup_jon.html#5-reboot-observe-qspi-update-and-power-off","title":"5\ufe0f\u20e3 Reboot, observe QSPI update, and power off","text":" Reboot
Once the QSPI update is scheduled, reboot your Jetson Orin Nano Developer Kit.
Observe update
You can observe the QSPI update during the boot up process.
Power off
Once the update is done, it reboots and tries to boot, however it will get stuck UNLESS you change the SD card to JetPack 6.x one.
Therefore you should just power off the developer kit simply by disconnecting the DC power supply.
Attention
This part may look very confusing as neither the attached monitor nor the debug UART shows any explicit message on what action to take next.
What is going on here is that the Jetson's firmware (inside the QSPI-NOR flash memory) is now updated, ready for the JetPack 6.x SD card, however it is now incompatible with JetPack 5.1.3 SD card left in the Jetson module's slot, so after the reboot it gets stuck in the boot process.
So there is no issue with this boot halt (or endless rebooting). Simply power off the device and insert the new SD card.
"},{"location":"initial_setup_jon.html#6-boot-with-jetpack-6x-sd-card","title":"6\ufe0f\u20e3 Boot with JetPack 6.x SD card","text":"Once we know the onboard firmware is up-to-date and ready for JetPack 6.x, we can boot Jetson Orin Nano Developer Kit with a microSD card for JetPack 6.
Download SD card image on to your PC
On your PC, download the latest JetPack 6.x image for Jetson Orin Nano Developer Kit from the official JetPack page or from the below direct link button.
Jetson Orin Nano Developer KitJetPack 6.1 (rev. 1) image
Use Balena Etcher to flash image to SD card
Insert your microSD card into your PC's SD card slot, and use Balena Etcher to flash the SD card with the image you just downloaded.
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the JetPack 6.x microSD card into the slot on Jetson module
Power-on by plugging the DC power supply
Complete the initial software setup (oem-config
)
Ensure firmware update is scheduled.
JetPack 6.1 (rev 1) SD card will automatically schedule another firmware/bootloader update in the next (3rd and final) reboot cycle.
GUI CUI
$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n
"},{"location":"initial_setup_jon.html#7-reboot-and-observe-firmware-update-to-3642","title":"7\ufe0f\u20e3 Reboot and observe firmware update to 36.4.2
","text":" Delete old power profile
Attention
If your Jetson Orin Developer Kit was previously running JetPack 6.0 or JetPack 6.1, execute the following command after the final login and reboot your device. This ensures that the MAXN performance mode becomes available on your system.
We need to delete the old power config in preparation for Super config introduction.
sudo rm -rf /etc/nvpmodel.conf\n
Reboot your Jetson Orin Nano Developer Kit on JetPack 6.1 (rev.1) SD card to trigger the final firmware update (to 36.4.2
).
GUI CUI On the Ubuntu desktop click the power icon () and select \"Restart...\".
$ sudo reboot\n
Observe firmware update
You should see the following during the boot up process.
With monitor Headless (serial)
Once done, you will boot into JetPack 6.1 (rev.1) again, with the underlying firmware updated to 36.4.2
, which unlock the Super performance.
"},{"location":"initial_setup_jon.html#8-unlock-super-performance","title":"8\ufe0f\u20e3 Unlock Super Performance","text":"Attention
If your Jetson Orin Developer Kit was previously running JetPack 6.0 or JetPack 6.1, execute the following command after the final login and reboot your device. This ensures that the MAXN performance mode becomes available on your system.
sudo rm -rf /etc/nvpmodel.conf\n
"},{"location":"initial_setup_jon.html#switch-to-maxn-mode","title":"Switch to MAXN mode","text":"Note that the default power mode is 15W. To switch to the new power mode and unlock the increased performance, follow below steps:
Click on the current power mode (15W) by clicking the NVIDIA icon on the right side of the Ubuntu desktop\u2019s top bar. Select Power mode from the menu. Choose MAXN to enable maximum performance.
"},{"location":"initial_setup_jon.html#8-start-developing-on-jetpack-6x","title":"8\ufe0f\u20e3 Start developing on JetPack 6.x","text":"\ud83c\udf8a Congratulations! Your Jetson Orin Nano Developer Kit is set up with JetPack 6.x SD card and you are ready to develop on JetPack 6.x.
"},{"location":"initial_setup_jon.html#next-step","title":"Next step","text":""},{"location":"initial_setup_jon.html#nvme-ssd-installation","title":"NVMe SSD installation","text":"Take a look at this page for installing NVMe SSD and setting up Docker with it.
"},{"location":"initial_setup_jon_sdkm.html","title":"\ud83d\udef8 Initial Setup using SDK Manager","text":"Note
This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.
The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.
This guide explains the alternative method for setting up Jetson Orin Nano Developer Kit by flashing both the latest firmware (QSPI image) and the latest JetPack all at once, using a host x86 PC.
"},{"location":"initial_setup_jon_sdkm.html#check-your-toolbox","title":"Check your toolbox","text":"The following item is needed or highly desired for flashing your Jetson Orin Nano Developer Kit using your host PC.
If you don't have them, check the default microSD-only setup instruction or you want to arrange them first then return to this guide once they are available.
What not come in the box - What you need/want to prepare
"},{"location":"initial_setup_jon_sdkm.html#host-pc","title":"Host PC","text":" x86 PC running Ubuntu 22.04 or Ubuntu 20.04 Source: NVIDIA SDK Manager Host Operating System Compatibility Matrix
"},{"location":"initial_setup_jon_sdkm.html#storage","title":"Storage","text":"Either of the following.
microSD card (64GB or bigger) NVMe SSD (Recommended for better performance)
"},{"location":"initial_setup_jon_sdkm.html#flashing-supply","title":"Flashing supply","text":" USB cable (USB-C plug for Jetson Orin Nano Developer Kit side, other end depends on your PC) jumper pin (or metal paper clip) "},{"location":"initial_setup_jon_sdkm.html#mean-to-access-terminal","title":"Mean to access terminal","text":"You need either of the following set:
DisplayPort cable, DisplayPort capable monitor and a USB keyboard DisplayPort to HDMI cable and HDMI capable monitor (or TV) and a USB keyboard USB to TTL Serial cable (Advanced) "},{"location":"initial_setup_jon_sdkm.html#default-method-microsd-card-only-setup-method","title":"\ud83d\ude80 Default method : microSD card only setup method","text":"In case you do NOT have an x86 PC running Ubuntu 22.04 or 20.04, you can fall back to the default \"microSD card only\" setup method that does not require any host PC.
The decision process can look like this.
flowchart LR\n A(start) --> B{Want to ditch microSD<br>and only use NVMe SSD?}\n B --[YES] --> S[\ud83d\udef8 SDK Manager method]\n B --[No] --> C{Have x86 PC running<br> Ubuntu 22.04/20.04?}\n C --[YES] --> S\n C --[No] --> U[\ud83d\ude80 microSD-only method]\n\n style S stroke-width:3px, fill:#f3e9f2,stroke:#b544c4\n style U stroke-width:3px, fill:#d2e9e5,stroke:#0e7a71
Click the button below to jump to the default setup page.
\ud83d\ude80 microSD-only method
Otherwise, continue reading on this page for the SDK Manager method.
"},{"location":"initial_setup_jon_sdkm.html#overall-flow-sdk-manager-method","title":"Overall flow (SDK Manager method)","text":"Jetson Orin Nano Initial Setup Flowchart (microSD-only method)
flowchart\n A(start) --> B[0\ufe0f\u20e3 Install SDK Manager]\n B --> C[1\ufe0f\u20e3 Connect Target Hardware] \n C --> D[2\ufe0f\u20e3 Select Software Components to Install]\n D --> E[3\ufe0f\u20e3 Download on Host PC]\n E --> F[4\ufe0f\u20e3 Flash]\n F --> G[5\ufe0f\u20e3 Boot and initial setup]\n G --> P(\ud83d\udc4d Start developing on JetPack 6.x)
"},{"location":"initial_setup_jon_sdkm.html#0-install-sdk-manager","title":"0\ufe0f\u20e3 Install SDK Manager","text":"Reference
NVIDIA SDK Manager Documentation : Download and Install Download and Run SDK Manager
On your Ubuntu 22.04 / 20.04 PC, open a terminal and execute the following command.
Select the tab that matches your Ubuntu version.
Ubuntu 22.04 Ubuntu 20.04 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install sdkmanager\n
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install sdkmanager\n
Then launch SDK Manager with the following command.
sdkmanager\n
First time using SDK Manager When using SDK Manager for the first time, log in with your NVIDIA Developer credentials.
Remember to tick the checkbox for \"Stay logged in\" and click \"LOGIN\" button.
A browser window opens. Input your email address that is registered as NVIDIA Developer.
Two factor authentication prompt may pop up.
Once authenticated, the browser windows opens back the SDK Manager app window.
SDK Manager continues to log you in.
SDK Manager launched and the login completed.
"},{"location":"initial_setup_jon_sdkm.html#1-connect-the-target-hardware","title":"1\ufe0f\u20e3 Connect the Target Hardware","text":"Once SDK Manager is launched, connect your Jetson developer kit to your Ubuntu PC and power it on in Forced Recovery mode.
Connect your Jetson Orin Nano Developer Kit to your PC with a USB cable.Note that the USB cable goes into the USB-C port of the carrier board of the developer kit.
While shorting pin 9
and pin 10
of J14
header located below the Jetson module using a jumper pin, insert the DC power supply plug into the DC jack of the carrier board to power it on.
You can use a paper clip like a photo below as well.
Select \"Jetson Orin Nano [8GB developer kit version]\" and hit \"OK\"
Uncheck \"Host Machine\" and it should now look like this
Click \"Continue\" button to proceed to the next step.
"},{"location":"initial_setup_jon_sdkm.html#2-select-software-components-to-install","title":"2\ufe0f\u20e3 Select Software Components to Install]","text":" Leave the only \"Jetson Linux\" component checked, and uncheck everything.
Remember to also check \"I accept the terms and conditions of the license agreements\".
Click \"Continue\" button to proceed to the next step.
It will prompt for the sudo
command password.
"},{"location":"initial_setup_jon_sdkm.html#3-download-on-host-pc","title":"3\ufe0f\u20e3 Download on Host PC","text":" It will start downloading the \"BSP\" package and \"RootFS\" package.
Once downloads are complete, it will untar the package and start generating the images to flash in the background.
Once images are ready, SDK it will open the prompt for flashing.
"},{"location":"initial_setup_jon_sdkm.html#4-flash","title":"4\ufe0f\u20e3 Flash","text":" On the flashing prompt, select \"Runtime\" for \"OEM Configuration\".
On the flashing prompt, select \"NVMe\" if you want to flash Jetson Linux (BSP) to NVMe SSD.
Click \"Flash\" and the prompt popup will change like this.
Flash progress.
DetailsTerminals
Flash successfully completes.
DetailsTerminals
"},{"location":"initial_setup_jon_sdkm.html#5-boot-and-initial-setup","title":"5\ufe0f\u20e3 Boot and Initial Setup","text":" If still plugged, remove the jumper from header (that was used to put it in Forced Recovery mode)
Connect the DisplayPort cable or adapter and USB keyboard and mouse to Jetson Orin Nano Developer Kit, or hook up the USB to TTL Serial cable.
Unplug the power supply and put back in to power cycle.
Jetson should now boot into the Jetson Linux (BSP) of your selected JetPack version from the storage of your choice.
Complete the initial software setup (oem-config
)
"},{"location":"initial_setup_jon_sdkm.html#switch-to-maxn-mode","title":"Switch to MAXN mode","text":"Note that the default power mode is 15W. To switch to the new power mode and unlock the increased performance, follow below steps:
Click on the current power mode (15W) by clicking the NVIDIA icon on the right side of the Ubuntu desktop\u2019s top bar. Select Power mode from the menu. Choose MAXN to enable maximum performance.
\ud83c\udf8a Congratulations! Your Jetson Orin Nano Developer Kit is set up with JetPack 6.x image and you are ready to develop on JetPack 6.x.
"},{"location":"initial_setup_jon_sdkm.html#next-step","title":"Next step","text":"Check this page for \"SSD\" and/or \"Docker\" set up.
"},{"location":"lerobot.html","title":"HuggingFace LeRobot","text":"Let's run HuggingFace LeRobot
to train Transformer-based action diffusion policies and ACT onboard NVIDIA Jetson. These models learn to predict actions for a particular task from visual inputs and prior trajectories, typically collected during teleoperation or in simulation.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 GA (L4T r36.3) JetPack 6.1 (L4T r36.4)
Sufficient storage space (preferably with NVMe SSD).
16.5GB
for lerobot
container image Space for models (>2GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"lerobot.html#work-with-real-world-robots-before-starting-containers","title":"Work with Real-World Robots - Before starting containers","text":"This section gives the guide on how you can work through the LeRobot official example of Getting Started with Real-World Robots (7_get_started_with_real_robot.md
) on your Jetson.
Tip
It's recommended to work on your Jetson in monitor-attached mode.
lerobot
is designed to show camera view in windows and playback TTS audio while capturing dataset, so it is more convenient to setup your Jetson with its monitor (and speakers) attached to Jetson.d
"},{"location":"lerobot.html#a-check-jetson-containers-location","title":"a. Check jetson-container
's location","text":"Through out the course of all the workflows of lerobot
, we will be generating a lot of data, especially for capturing dataset.
We will clone the lerobot
directory on host and mount the directory in the container to keep all the data persistant, but first make sure your jetson-containers
directory is placed on your SSD, not on your eMMC or microSD card.
If you have created the jetson-containers
directory on eMMC or microSD card (likely the case if you first set up your Jetson device without SSD first and later added SSD), then use the rsync
command to move the entire directory under SSD mount point.
rsync -aHAX /home/jetson/jetson-containers/ /ssd/jetson-containers/\n
Then run the installer again.
bash /ssd/jetson-containers/install.sh\n
"},{"location":"lerobot.html#b-create-lerobot-dir-on-host","title":"b. Create lerobot
dir on host","text":"As described above, we will setup the lerobot
directory under data
directory of jetson-containers
for monting it inside the container so that generated data persist.
cd jetson-containers\n./packages/robots/lerobot/clone_lerobot_dir_under_data.sh\n./packages/robots/lerobot/copy_overlay_files_in_data_lerobot.sh\n
"},{"location":"lerobot.html#c-pulseaudio-setup","title":"c. PulseAudio setup","text":"LeRobot's dataset capture flow (control_robot.py
) utilizes Speech Dispatcher to use espeak TTS, in order to give operators audio queues for notifying the status and signaling the next operation. It's actually very helpful.
Speech Dispatcher utilizes Pulse Audio, so rather than just sharing the /dev/snd
device when docker run
(which is good for ALSA), we need to add the following arguments.
--device /dev/snd \\\n -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native \\\n -v ${XDG_RUNTIME_DIR}/pulse:${XDG_RUNTIME_DIR}/pulse \\\n
This is already added to run.sh
of jetson-containers
, however, we need to edit /etc/pulse/default.pa
in order to allow the root user access to the socket file.
sudo vi /etc/pulse/default.pa\n
Find the section loading module-native-protomocl-unix
and add auth-anonymous=1
### Load several protocols\n.ifexists module-esound-protocol-unix.so\nload-module module-esound-protocol-unix auth-anonymous=1\n.endif\nload-module module-nativ\n
Then restart PulseAudio service to make the config take effect.
pulseaudio --kill\npulseaudio --start\n
For troubleshootings or details, please check the docs.md
of speech-dispatcher
package.
"},{"location":"lerobot.html#d-set-udev-rule-for-acm-devices","title":"d. Set udev rule for ACM devices","text":"It is more convenient if the lerobot programs can always find the device of leader and follower arm with unique names.
For that, we set an udev rule so that arms always get assigned the same device name as following. This is first done on Jetson host side.
/dev/ttyACM_kochleader
: Leader arm /dev/ttyACM_kochfollower
: Follower arm First only connect the leader arm to Jetson and record the serial ID by running the following:
ll /dev/serial/by-id/\n
The output should look like this.
lrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_BA98C8C350304A46462E3120FF121B06-if00 -> ../../ttyACM1\n
Then edit the first line of ./99-usb-serial.rules
like the following.
You can find the template of this file under ./packages/robots/lerobot
directory.
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"00000000000000000000000000000000\", SYMLINK+=\"ttyACM_kochfollower\"\n
Now disconnect the leader arm, and then only connect the follower arm to Jetson.
Repeat the same steps to record the serial to edit the second line of 99-usb-serial.rules
file.
$ ll /dev/serial/by-id/\nlrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_483F88DC50304A46462E3120FF0C081A-if00 -> ../../ttyACM0\n$ vi ./packages/robots/lerobot\n
You should have ./99-usb-serial.rules
now looking like this:
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"483F88DC50304A46462E3120FF0C081A\", SYMLINK+=\"ttyACM_kochfollower\"\n
Finally copy this under /etc/udev/rules.d/
(of host), and restart Jetson.
sudo cp ./99-usb-serial.rules /etc/udev/rules.d/\nsudo reboot\n
After reboot, check if we now have achieved the desired fixed simlinks names for the arms.
ls -l /dev/ttyACM*\n
You should get something like this:
crw-rw---- 1 root dialout 166, 0 Sep 24 17:20 /dev/ttyACM0\ncrw-rw---- 1 root dialout 166, 1 Sep 24 16:13 /dev/ttyACM1\nlrwxrwxrwx 1 root root 7 Sep 24 17:20 /dev/ttyACM_kochfollower -> ttyACM0\nlrwxrwxrwx 1 root root 7 Sep 24 16:13 /dev/ttyACM_kochleader -> ttyACM1\n
"},{"location":"lerobot.html#e-optional-csi-cameras","title":"e. (Optional) CSI cameras","text":"If you plan to use CSI cameras (not USB webcams) for data capture, you will use the new --csi2webcam
options of jetson-containers
, which exposes V4L2loopback devices that performs like USB webcams (MJPEG) for CSI cameras using Jetson's hardware JPEG encoder.
This feature require some packages to be installed.
sudo apt update && sudo apt install v4l2loopback-dkms v4l-utils\n
"},{"location":"lerobot.html#f-increse-the-swap-file-size","title":"f. Increse the swap file size","text":"You may ran out of memory when are setting up to perform ACT model training.
swapoff -a -v\nsudo rm /swfile\nsudo systemctl disable nvzramconfi\nsudo fallocate -l 8G /ssd/8GB.swap\nsudo chmod 600 /ssd/8GB.swap\nsudo mkswap /ssd/8GB.swap\nsudo echo \"/ssd/8GB.swap swap swap defaults 0 0\" >> /etc/fstab\nsudo reboot\n
"},{"location":"lerobot.html#g-starting-the-lerobot-container","title":"g. Starting the lerobot
container","text":"USB webcamsCSI cameras cd jetson-containers\n./run.sh \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
cd jetson-containers\n./run.sh \\\n --csi2webcam --csi-capture-res='1640x1232@30' --csi-output-res='640x480@30' \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
"},{"location":"lerobot.html#work-with-real-world-robots-once-in-container","title":"Work with Real-World Robots - Once in container","text":"JupyerLab tip
Inside the lerobot
container, JupyterLab server process starts.
You can access with http://localhost:8888/
(or http://<IP_ADDRESS>:8888/
from other PC on the same network).
In the notebooks
, there are some Jupyter notebooks for each segment of the official tutorial Getting Started with Real-World Robots (7_get_started_with_real_robot.md
).
Please note that some of them (like notebooks/7-2_real-robot_configure-motors.ipynb
) can be used as a real work notebook to execute python codes and scritps convniently inside the notebook along with instructions (rather than switching to console).
However, keep in mind that you are encouraged to always check the original official tutorial, and some operation like training is much better executed on console.
Bash history tip
Inside the container, on the console, you can press Up key to scroll through some of the frequently used commands pre-registered in bash history.
"},{"location":"lerobot.html#q-setup-audio","title":"q. Setup audio","text":"Check if PulseAudio is available.
pactl info\n
If you need to set the default audio output device, use set-default-sink
.
pactl list short sinks\npactl set-default-sink [SINK_NAME_OR_INDEX]\n
"},{"location":"lerobot.html#1-order-and-assemble-your-koch-v11","title":"1. Order and Assemble your Koch v1.1","text":"You can order the Koch v1.1 kits from ROBOTIS. (Note: they don't come with 3d printed parts)
TODO:
Document Jetson unique hardware setup Share custom 3D print models "},{"location":"lerobot.html#2-configure-motors-calibrate-arms-teleoperate-your-koch-v11","title":"2. Configure motors, calibrate arms, teleoperate your Koch v1.1","text":"Follow the Jupyter notebook 7-2_real-robot_configure-motors.ipynb
.
"},{"location":"lerobot.html#3-record-your-dataset-and-visualize-it","title":"3. Record your Dataset and Visualize it","text":"You should mostly operate on the container's terminal.
Follow the official document's section.
Camera config tip
The official document demonstrates the two camera positions, one at the top (\"phone\") and the other at directly in front facing the arm (\"laptop\").
In our trials, this camera placement worked, but we needed to make the camera zoom-up to the scene so that they capture better spacial resolution.
Another thing worth experimenting is the wrist cam. More to come later.
Tip
Following commands are registered in Bash history inside the lerobot
container.
wandb login\nexport HF_USER=\npython lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/koch_test_$(date +%Y%m%d_%H%M%S) \\\n --tags tutorial \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10\n
Tip
If you plan to perfom training on a different machine, scp
the dataset directory.
To another JetsonTo other PC scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/data/${HF_USER}/\n
scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/home/<USER>/lerobot/data/${HF_USER}/\n
"},{"location":"lerobot.html#4-train-a-policy-on-your-data","title":"4. Train a policy on your data","text":"You should operate on ther container's terminal.
Follow the official document's section.
Tip
wandb login\nDATA_DIR=data python lerobot/scripts/train.py \\\n dataset_repo_id=${HF_USER}/koch_test \\\n policy=act_koch_real \\\n env=koch_real \\\n hydra.run.dir=outputs/train/act_koch_test \\\n hydra.job.name=act_koch_test \\\n device=cuda \\\n wandb.enable=true\n
Tip
If you perform the training on other Jetson or PC, scp
the outputs directory content back to the orinal Jetson that has the leader and follower arm attached.
scp -r outputs/train/act_koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/outputs/train/ \n
"},{"location":"lerobot.html#5-evaluate-your-policy","title":"5. Evaluate your policy","text":"You should operate on the container's terminal.
Follow the official document's section.
Tip for a. Use koch.yaml
and our record
function
Modify the command in the bash history to add -p
arugment to points to the policy checkpoint.
python lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/eval_koch_test_01 \\\n --tags tutorial eval \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10 \\\n -p outputs/train/act_koch_test/checkpoints/last/pretrained_model \\\n --run-compute-stats 0\n
Tip for Visualize evaluation afterwards
python lerobot/scripts/visualize_dataset.py \\\n --root data \\\n --repo-id ${HF_USER}/eval_koch_test\n
If everything goes well, you should see
"},{"location":"lerobot.html#basic-walkthrough","title":"Basic Walkthrough","text":"This is from the lerobot top README.md.
"},{"location":"lerobot.html#visualize-datasets","title":"Visualize Datasets","text":"Outside of container, first launch the rerun.io visualization tool that LeRobot uses [\u2197]
pip3 install rerun-sdk\nrerun\n
Then, start the docker container to playback one of these LeRobot datasets.
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/visualize_dataset.py \\\n --repo-id lerobot/pusht \\\n --episode-index 0\n
"},{"location":"lerobot.html#evaluate-a-pretrained-diffusion-policy","title":"Evaluate a Pretrained Diffusion Policy","text":"This will download and run a pre-trained diffusion model on the PushT environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/eval.py \\\n -p lerobot/diffusion_pusht \\\n eval.n_episodes=10 \\\n eval.batch_size=10\n
"},{"location":"lerobot.html#train-your-own-act-policy","title":"Train your own ACT Policy","text":"Next, train ACT on the Aloha manipulation environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/train.py \\\n policy=act \\\n env=aloha \\\n env.task=AlohaInsertion-v0 \\\n dataset_repo_id=lerobot/aloha_sim_insertion_human \n
See Trossen Robotics for dual-arm ALOHA kits, and Robotis for the low-cost Koch v1.1 kit used in the LeRobot tutorial:
HuggingFace LeRobot - Get Started with Real-World Robots (YouTube Playlist)
"},{"location":"llama_vlm.html","title":"Llama 3.2 Vision","text":"The latest additions to Meta's family of foundation LLMs include multimodal vision/language models (VLMs) in 11B and 90B sizes with high-resolution image inputs (1120x1120) and cross-attention with base completion and instruction-tuned chat variants:
Llama-3.2-11B-Vision
Llama-3.2-11B-Vision-Instruct
Llama-3.2-90B-Vision
Llama-3.2-90B-Vision-Instruct
While quantization and optimization efforts are underway, we have started with running the unquantized 11B model in a container based on HuggingFace Transformers that has been updated with the latest support for Llama-3.2-Vision a jump start on trying out these exciting new multimodal models - thanks to Meta for continuing to release open Llama models!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
12.8GB
for llama-vision
container image Space for models (>25GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Request access to the gated models here with your HuggingFace API key.
"},{"location":"llama_vlm.html#code-example","title":"Code Example","text":"Today Llama-3.2-11B-Vision is able to be run on Jetson AGX Orin in FP16 via HuggingFace Transformers. Here's a simple code example from the model card for using it:
import time\nimport requests\nimport torch\n\nfrom PIL import Image\nfrom transformers import MllamaForConditionalGeneration, AutoProcessor\n\nmodel_id = \"meta-llama/Llama-3.2-11B-Vision\"\nmodel = MllamaForConditionalGeneration.from_pretrained(model_id, device_map=\"auto\", torch_dtype=torch.bfloat16)\nprocessor = AutoProcessor.from_pretrained(model_id)\n\nprompt = \"<|image|><|begin_of_text|>If I had to write a haiku for this one\"\nurl = \"https://llava-vl.github.io/static/images/view.jpg\"\nraw_image = Image.open(requests.get(url, stream=True).raw)\n\ninputs = processor(text=prompt, images=raw_image, return_tensors=\"pt\").to(model.device)\noutput = model.generate(**inputs, do_sample=False, max_new_tokens=32)\n
If I had to write a haiku for this one, it would be:\n\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\n
Initial testing seems that Llama-3.2-Vision has more conversational abilities than VLMs typically retain after VQA alignment. This llama_vision.py script has interactive completion and image loading to avoid re-loading the model. It can be launched from the container like this:
jetson-containers run \\\n -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n $(autotag llama-vision) \\\n python3 /opt/llama_vision.py \\\n --model \"meta-llama/Llama-3.2-11B-Vision\" \\\n --image \"/data/images/hoover.jpg\" \\\n --prompt \"I'm out in the\" \\\n --max-new-tokens 32 \\\n --interactive\n
After processing the initial image, it will ask you to submit another prompt or image:
total 4.8346s (39 tokens, 8.07 tokens/sec)\n\nEnter prompt or image path/URL:\n\n>> \n
We will update this page and container as support for the Llama-3.2-Vision architecture is added to quantization APIs like MLC and llama.cpp for GGUF, which will reduce the memory and latency.
"},{"location":"nerf.html","title":"Nerfstudio - Neural Reconstruction","text":"Nerfstudio is a tool for training and visualizing Neural Radiance Fields (NeRF) models for 3D volumetric reconstruction from a collection of images taken of a scene or environment. It's designed to be easy to use and starts generating imagery at the outset, which becomes further refined as training progresses. Neural reconstruction is being applied to edge devices and robotics like with FruitNeRF. This tutorial will guide you through the process of training high-quality NeRF models using nerfstudio on Jetson.
Special thanks to Johnny N\u00fa\u00f1ez Cano for porting the NeRF stack to Jetson! See NeRF in 2023: Theory and Practice by It-Jim for useful background info.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
17.6GB
for nerfstudio
container image Space for models and datasets (>5GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"nerf.html#start-container","title":"Start Container","text":"Use this command to automatically run, build, or pull a compatible container image for nerfstudio:
jetson-containers run $(autotag nerfstudio)\n
To mount your own directories into the container, use the -v
or --volume
flags: jetson-containers run -v /path/on/host:/path/in/container $(autotag nerfstudio)\n
Later a webserver will be reachable at http://localhost:7007
. The official docs include command-line usage and models [\u2197]
"},{"location":"nerf.html#training-your-first-model","title":"Training Your First Model","text":"It's recommended to train a test scene first by following these steps from the getting started tutorial:
# Download some test data\nns-download-data nerfstudio --capture-name=poster\n\n# Train model\nns-train nerfacto --data /data/nerfstudio/models\n
After it loads, you can navigate your browser to http://localhost:7007
(or substitute your Jetson's IP address)
The 3D visualizations will begin rendering after the first training step, and progressively become more refined. This process typically takes an hour on Jetson AGX Orin to complete the default setting of 30,000 training steps. The model checkpoints are saved under the mounted jetson-containers/data/nerfstudio
directory, and generate the same output quality as before.
"},{"location":"nerf.html#fruitnerf","title":"FruitNeRF","text":"The creators behind FruitNeRF fused NeRFs with segmentation for more accurate counting of fruits in 3D. By training a semantic neural radiance field on both RGB and segmentation labels from SAM, the model is robust to occlusions, double-counting, and mistaken or undesirable fruits.
Extended sampling of the volumetric grid can then blob objects for pose estimation, occupancy mapping, and navigation. Such approaches are promising for combining the strengths of NeRFs for 3D reconstruction, segmentation VITs for open-vocabulary classification, and Jetson's onboard compute for high-definition environmental scanning and perception while in the field.
"},{"location":"nerf.html#data-preparation","title":"Data Preparation","text":"We'll use the FruitNeRF datasets as an example of training NeRFs on custom data with nerfstudio. To train a NeRF model on a different scene, you first need to capture a set of images and corresponding camera poses (which can be estimated with the included photogrammetry and image registration tools like COLMAP and OpenSFM as shown here)
The images should be taken from different viewpoints of the scene you want to model, structured in the nerfstudio dataset format. For FruitNeRF, you can just download and extract pre-recorded real or synthetic data from here:
Real DataSynthetic data cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Real.zip\nunzip FruitNeRF_Real.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Synthetic.zip\nunzip FruitNeRF_Synthetic.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
"},{"location":"nerf.html#training-a-fruitnerf-model","title":"Training a FruitNeRF Model","text":"This command will generate a NeRF on the first tree (there are multiple trees to try from each dataset). You can find the complete usage information and models on the official GitHub repo: github.com/meyerls/FruitNeRF
ns-train nerfacto \\\n --data /data/nerfstudio/FruitNeRF_Dataset/tree_01 \\\n --output-dir /data/nerfstudio/models \\\n --pipeline.datamanager.camera-res-scale-factor 0.5\n
The memory usage depends on the number of images in the dataset and their resolution, so if you're running low on memory try adjusting the scaling factors and image sampling parameters. The simulated data has a lower resolution and can run at full size.
"},{"location":"openvla.html","title":"OpenVLA - Vision/Language Action Models for Embodied Robotics","text":"Fine Tuning and Deployment Guide
The tutorials's goal is to provide optimized quantization and inference for deploying VLA models, along with reference fine-tuning workflows for adapting models for new robots, tasks, and environments. Rigorous performance and accuracy validation is applied in a self-contained sim environment with scenario generation and domain randomization (MimicGen). Future phases will include sim2real with Isaac Lab and ROS2 integration, study of related models like CrossFormer and optimizations to the neural architecture for realtime performance.
\u2705 Quantization and inference optimizations for VLA models \u2705 Accuracy validation of the original OpenVLA-7B weights \u2705 Reference fine-tuning workflow with synthetic data generation \u2705 On-device training with LoRA's on Jetson AGX Orin and full fine-tuning on A100/H100 instances \u2705 85% accuracy on an example block-stacking task with domain randomization \u2705 Sample datasets and test models for reproducing results
Thank you to OpenVLA, Open X-Embodiment, MimicGen, Robosuite and many others with related work for sharing their promising research, models, and tools for advancing physical AI and robotics.
"},{"location":"openvla.html#vla-architecture","title":"VLA Architecture","text":"OpenVLA is a vision/language action model for embodied robotics and behavioral learning built on LLM/VLMs (this base model is a Prismatic VLM using Llama-7B, DINOv2, and SigLIP). Instead of image captioning or visual question/answering, VLA models generate action tokens from camera images and natural language instructions that are used for controlling the robot. Action tokens are discrete token ID's reserved from the text tokenizer's vocabulary that map to continuous values, normalized against the range of motion of each robot. These real-valued tokens are more efficient and accurate than the model outputting numerical data as text in JSON or Pydantic formats, where each digit, decimal point, separator, and whitespace takes an additional token to generate. Other hybrid vision/language models like Florence-2 have adopted similar approaches for continuous-domain prediction using Transformers.
Each action token generated by the model represents a degree-of-freedom of the output coordinate space (i.e. xyz, rotation pose), or a component of the robot that can be controlled (like the gripper). OpenVLA-7B was trained on the Open X-Embodiment dataset for manipulation, with a 7-DoF action space consisting of (delta xyz, delta roll/pitch/yaw, gripper)
. The position and rotation are relative changes to the end-effector (EEF) pose, with an external inverse kinematics (IK) solution like cuMotion solving joint constraints specific to each robotic arm. The gripper dimension is an absolute control between 0 (open) and 1 (closed) that does not recieve further scaling/normalization.
OpenVLA reserves 256 of the least-frequently used tokens out of the Llama-7B vocabulary for action values, which gives it 8-bit resolution over the controls. It has an input image resolution of 224x224 to stacked DINOv2/SigLIP vision encoders that are projected to ~275 input tokens (plus the text prompt), and outputs 7 tokens mapped to (\u0394pos, \u0394rotation, gripper)
coordinates.
"},{"location":"openvla.html#quantization","title":"Quantization","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models and datasets (>15GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Support for OpenVLA has been added to NanoLLM on top of its streaming VLM pipeline with INT4/FP8 quantization using MLC and vision encoders in FP16 with TensorRT. First we'll test the model on BridgeData V2, one of the top weighted datasets from the Open X-Embodiment collection. The model was trained on this data and is used to confirm that the quantization and inference are working correctly during deployment. This is what the dataset looks like, courtesy of their website:
The following command starts the container, downloads the dataset and model (if needed), quantizes it on the first run, and measures the accuracy of the action values against the groundtruth from the dataset using normalized mean-squared error (NRMSE) to unbias the varying ranges each dimension of the action space can have. We extracted a 100-episode subset of the original Bridge data here on HuggingFace Hub, so you don't need to download the entire ~400GB dataset just for these tests.
INT4FP8FP16 jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model openvla/openvla-7b \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp16.json\n
Quantization Accuracy Latency FPS FP16 95.3% 840 ms 1.19 FP8 95.2% 471 ms 2.12 INT4 90.1% 336 ms 2.97 These results were run on Jetson AGX Orin 64GB with JetPack 6, and we will see later with our fine-tuned model the INT4 accuracy match FP8/FP16.
Each frame, the 7D action vector predicted by the model is printed along with the groundtruth, along with the accuracy, latency, and framerate for that frame. The numbers printed after ~
are the averages of those so far, with the last value reported being the mean over the entire dataset processed.
# INT4\nstep 355 [-0.02692 0.00776 -0.00299 0.08160 0.07292 0.04791 0.99608] accuracy 0.8466 ~0.9017 time=336.2 ms fps=2.96 ~2.97\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP8\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9523 time=469.7 ms fps=2.13 ~2.12\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP16\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9531 time=842.3 ms fps=1.19 ~1.18\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n
The per-frame metrics and averages can be saved with the --save-stats
argument, and in the interests of time you can cap the amount of episodes processed with --max-episodes
. As mentioned above, the Bridge dataset used was included in the training dataset, and further below we run this again on data we generated not from the training dataset with significant variation. This tool can also load other datasets in RLDS/TFDS format from Open X-Embodiment, and HDF5 from Robomimic/MimicGen. You can also create your own agents and scripts using the exposed APIs from the coding examples below.
"},{"location":"openvla.html#inference-api","title":"Inference API","text":"The code is simple for running VLA inference on camera streams using the NanoLLM library in the container:
VLA on Videofrom nano_llm import NanoLLM\nfrom nano_llm.plugins import VideoSource\n\n# load vision/language action model\nmodel = NanoLLM.from_pretrained(model, quantization='q4f16_ft')\ncamera = VideoSource(video_source, cuda_stream=0)\n\nassert(model.vla) # make sure this is a VLA\n\nwhile True:\n # returns a cudaImage, np.ndarray, or torch.Tensor on the GPU\n image = camera.capture()\n\n if image is None: # in case of timeout, keep trying\n continue\n\n # returns a np.ndarray or torch.Tensor with vla.dof elements\n # for OpenVLA, this is (\u0394x, \u0394y, \u0394z, \u0394roll, \u0394pitch, \u0394yaw, gripper)\n actions = model.vla.predict_action(\n image, \n instruction=\"pick up the nearest object\", \n action_space=\"normalized\",\n return_tensors='np',\n )\n\n # send the actions to your robot controller or IK solver\n ...\n
VLA models are also supported in Agent Studio, which includes the simulator components as well.
"},{"location":"openvla.html#online-validation","title":"Online Validation","text":"Given the challenging task domain, dynamic feedback loops, and computational demands for sim/training/inference, using VLAs for language-guided dexterous manipulation involves a significant increase in complexity over baseline usage of LLMs and VLMs. To go from predicting logits at the token level to actions consistently correct enough over an extended series of frames to form useful behaviors, it's important to cross-check outputs and measure accuracy at each stage of the training/inference workflow to be able to identify the source of potential regressions when they occur.
Unlike typical applications in supervised learning, the metrics for end-task completion and success aren't measured from static pre-recorded datasets that don't account for the temporal domain and feedback from physical interactions along with compounding errors - they require online validation, either in simulation or real-world tests.
Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab
During training the token classification accuracy is measured from the groundtruth action labels (i.e. how many action tokens were predicted exactly right), with the model optimizing to minimize this loss (as is normal for LLMs). Action accuracy in the continuous domain is also is also measured during training from the L1 error of the detokenized real-valued outputs. Continuous action accuracy trends slightly higher than token classification accuracy, as the later does not provide any reward for being closer to the desired result. In practice, these should be >95% accurate at this level for completing tasks successfully in similar environments. To achieve that high degree of accuracy, it seems intentional in the work and related research to overfit the model by training it for many epochs (upwards of 30 epochs on the same 900K episodes for OpenVLA). Transformers are known to recall specific knowledge from few training examples, and are sensitive to overfitting and forgetting previously learned information. As such, LLMs are normally only trained for a few epochs at most to preserved their zero-shot capabilities and ability to generatize to out-of-distribution inputs. During the fine-tuning part of this project, we characterize the impacts on model accuracy and task success from the number of distinct training episodes versus the number of epochs over repeated data.
The actual task success rate doesn't get measured until the inference stage, when it is either connected to a simulator or physically tested in a series of time-consuming trials under similar conditions. We integrated MimicGen directly with the OpenVLA training scripts for an endless source of unseen data, but encountered gradient instabilities after the model had received a significant number of episodes.
"},{"location":"openvla.html#simulation-with-mimicgen","title":"Simulation with MimicGen","text":"MimicGen creates randomized episodes from as few as 10 teleoperated examples by utilizing scene graph information and task/subtask metadata about which objects in the environment are targets of the current subtask, in order to interpolate the original teloperated trajectories into their new random locations and poses. This generates large amounts of unique training data to improve robustness, without needing large amounts of human effort for the robot learning new skills and behaviors.
MimicGen is built on the Robomimic and Robosuite simulators and are able to run onboard Jetson headlessly alongside the VLA, simplifying the setup for reproducibility. The RoboCasa project is built on MimicGen and being integrated with NVIDIA Omniverse and OSMO, and in future work we'd use Isaac Lab for scalability, more accurate physics, and photorealistic rendering.
MimicGen includes 12 tasks like block stacking, pick and place, assembly, and kitchen scenarios. And each type of task has variants increasing in difficulty as learning progresses, which would be interesting to compare curated approaches to the purely random sequencing that OpenVLA uses with Open X-Embodiment. In this phase of the tutorial, we focus on the block stacking task to understand the training requirements and runtime performance needed to master a new task with success rates of >75-80%, similar to the paper. This will help inform scaling to multiple behaviors and more complex scenarios that vary significantly from in-distribution examples like the MimicGen environments (as evidenced by the original OpenVLA weights scoring zero successes in them).
"},{"location":"openvla.html#data-generation","title":"Data Generation","text":"We built MimicGen containers for Jetson from a fork of the code with some patches for aarch64+igpu along with enhancements like generation of natural language labels with random variations for the relevant tasks, along with additional domain randomization for the colors/materials of objects (these environments were added as Stack_D2
, Stack_D3
, and Stack_D4
). For training OpenVLA, the images and labels are saved to disk, whereas later inference is done with online simulation to measure the task success rate. To that effect we integrated MimicGen with Agent Studio for interactively testing the models and quickly dropping in components like ASR for verbally commanding the robot.
Online Training
There's initial support for direct integration of MimicGen in this fork of OpenVLA for live simulation and validation during training and endless episodes without repeating epochs. The models experienced spiking gradients later into LoRA's, and should try again with lower learning rates or by similarly integrating MimicGen into their full fine-tuning script using FDSP for increasing the batch size on dGPU.
This command will generate the specified number of training episodes, saved in Robomimic HDF5 format. We provide the rendered datasets for these on HuggingFace Hub with 1000 and 2500 episodes. OpenVLA suggests only needing 10-150 episodes for fine-tuning and data-efficient adaptation, which perhaps performs similarly in comparable spaces, but we ultimately found insufficient for the MimicGen environments.
jetson-containers run $(autotag nano_llm) \\\n python3 -m mimicgen.generate \\\n --tasks Stack_D4 \\\n --episodes 100 \\\n --output /data/datasets/mimicgen \\\n --cameras agentview \\\n --camera-width 224 \\\n --camera-height 224\n
The HDF5 dataset will be saved to /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5
(which is in a mounted volume under your jetson-containers/data
directory outside of the container), along with a video of sample episodes that were rendered:
This video is actually of Stack_D2 to avoid subjecting everyone to flickering colors. Stack_D4 is used for training and generates blocks with random colors and positions each frame, along with language labels augmented through the random combination of various nouns, adjectives, and verbs that form the instruction (Stack the red block on the green block
, Put the little cube on top
). Stack_D3 randomizes colors/positions each frame, and instructions each episode. Stack_D2 does them all per-episode (which is typically used at runtime). Since OpenVLA uses a single frame at a time with no temporal aspect during training, applying domain randomization per-frame as opposed to per-episode is feasible provides more variance in the dataset. The block-stacking episodes typically come out to be around ~110 frames each, and take around 10-15 seconds to generate per episode on Jetson AGX Orin with per-frame domain randomization, and 5 seconds per episode without domain randomization.
The agentview camera looks onward from the front of the scene towards the robot. There are others available like sideview and eye_in_hand (wrist view) - we tried using the onboard wrist camera, but found the model would too easily veer off track and get 'lost' offscreen. It may be possible for wrist-only to work should the dataset add examples of the robot recovering and returning to a wider vantage point. Other VIT-based embodied models like Octo and CrossFormer use both cameras, and is a future experiment with VLA's based on multi-image VLM's like VILA.
"},{"location":"openvla.html#rlds-conversion","title":"RLDS Conversion","text":"OpenVLA uses datasets in RLDS format (which is based on TFDS), so we provide a converter from HDF5. This extra step can also be time-consuming for a large number of epiodes, like those used here. This is one of the reasons we desire to run MimicGen online with training and performed the initial integration directly with OpenVLA. Unless you are generating different data, you can skip this and use the MimicGen datasets that we uploaded here in RLDS format.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.datasets \\\n --dataset /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5 \\\n --dataset-type mimicgen \\\n --convert rlds \\\n --remap-keys agentview:image \\\n --output /data/datasets/mimicgen/rlds/stack_d4_ep2500\n
This will create a set of tfrecord files under the output directory that are able to be loaded by the OpenVLA training scripts.
"},{"location":"openvla.html#fine-tuning","title":"Fine Tuning","text":"A primary objective of this project is to characterize the training needed to adapt the model to different robots and tasks. Our development primarily consisted of running test LoRA's onboard Jetson AGX Orin 64GB and debugging issues locally, and when the results were encouraging to perform a full fine-tuning with FDSP on multiple A100/H100's from spot instance providers like Brev.dev, Vast.ai, and RunPod. Full fine-tuning on 2x Jetson AGX Orin's was attempted with FDSP, but ran out of memory with all settings that we tried. We provide the test models we trained on HuggingFace for others to try in the inference + sim setup below. Below are the training GPU configurations that were used, along with their batch sizes that maximized memory usage:
Batch Size FPS $/hr Jetson AGX Orin 64GB 8 (LoRA) 1.57 - 2x A100 SMX4 80GB 48 13.07 ~$1.50 8x H100 NVL 94GB 256 92.4 ~$25 The rental fees are ballpark averages over the spot instances available with these GPUs at the time of writing, and becomes quite reasonable when used alongside a Jetson repurposed for training daily test LoRA's on a reduced amount of data. Training until convergence on Jetson and 2xA100 took roughly 24-36 hours depending on the amount of data and number of epochs. We kept to <5 epochs for the full fine-tunes in an attempt to prevent the afformentioned overfitting, instead opting to increase the number of episodes.
Below we provide the steps to run the OpenVLA LoRA training on Jetson, and for the dGPU systems refer to Fully Fine-Tuning OpenVLA. Typically you will launch a spot instance with your provider of choice in a CUDA or PyTorch container, then install the OpenVLA repo and its dependencies with pip, and download your dataset to the system before launching the command (or create a bundled container with it all included to save time). Here's the WandB Dashboard from the full fine-tuning runs that you can inspect, comparing a fewer number of episodes for more epochs, versus a larger number of episodes trained for fewer epochs:
"},{"location":"openvla.html#on-device-lora","title":"On-Device LoRA","text":"The OpenVLA repo provides working training scripts for LoRA/qLoRA and multi-node multi-GPU full fine-tunes using PyTorch FDSP. It was not difficult to go in a make changes and enhancements, some of which we have done for our purposes of on-device training in this fork. Overall we found the process to be more similar than not to training other vision DNNs, just with larger datasets and rigorous validation required of the data pipeline that all the coordinate spaces and transformations matched up at every step of the sim\u2192training\u2192inference workflow.
We built an OpenVLA container for JetPack that runs the LoRA training, which you can find the specific documentation about from the OpenVLA readme (it's also recommended to read their paper which includes many insights into the training process).
jetson-containers run $(autotag openvla) \\\n torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py \\\n --vla_path openvla/openvla-7b \\\n --data_root_dir /data/datasets/mimicgen/rlds \\\n --dataset_name stack_d4_ep2500 \\\n --run_root_dir /data/models/openvla \\\n --lora_rank 32 \\\n --batch_size 8 \\\n --grad_accumulation_steps 2 \\\n --learning_rate 5e-4 \\\n --image_aug False \\\n --save_steps 250 \\\n --epochs 5\n
This will start a TensorBoard server on port 6006 to monitor the training progress. Typically you would set the script running for more epochs than you intend to actually run, so that you can instead stop when the model converges (typically occurring with a loss below 0.5 and token accuracy >95%). This script was adapted so that if you interrupt training by pressing Ctrl+D
from the terminal, it will gracefully stop early and still merge the LoRA weights before exiting. If training is terminated otherwise, we added a merge.py
script that you should run afterwards get the model ready for inference.
"},{"location":"openvla.html#validation","title":"Validation","text":"Now that we have trained our test model (or you can download one from here), let's re-validate it again like we did above on the original OpenVLA model, but this time on unseen data from MimicGen with a different random seed (dusty-nv/mimicgen-stack_d4-ep100
). These commands will download and run the fully fine-tuned checkpoint (on 2500 episodes for 4 epochs) that we released to (dusty-nv/openvla-7b-mimicgen
. If you trained your own model, you can substitute the local path to the HF checkpoint.
INT4FP8FP16 jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp16.json\n
The results from this are collated in the next section along with the end-task success rates. Time to see it in action!
"},{"location":"openvla.html#inference-simulation","title":"Inference + Simulation","text":"To measure how well our model actually performs at completing the task, we spin up a MimicGen environment in Agent Studio that's connected to the VLA model. It counts the number of successful episodes by checking the reward issued by the sim, which is not used by the model but signals when the task was completed. We use a horizon of 200 frames for evaluation, after which it is deemed to be a failure.
INT4FP8FP16 jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-INT4\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP8\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP16\n
To start the benchmarking, connect the output of the AutoPrompt
node to the OpenVLA
node. On its own it will run forever - we did it for 100 episodes each, which can take several hours since the sim operates in lock step with the model (future experiments will train on actions accumulated from multiple timesteps and also reduce the model size to improve performance).
Quantization Train Accuracy Val Accuracy Task Success Avg Frames Latency FPS FP16 96.5% 85.4% 86% 132 827 ms 1.20 FP8 96.2% 85.1% 85% 131 467 ms 2.14 INT4 95.4% 84.4% 84% 138 335 ms 2.98 This is using the model fine-tuned on 2500 episodes for 4 epochs, and although the task may have been simple, is evidence of achieving the sought-after success rates of ~85%. Quantization has a negligible ~1% impact while scaling performance almost linearly. The average number of frames is how long it took the robot to complete the task, which efficiency is another important end-metric to evalulate models by (consider the source teleop episodes were ~110 frames long, and we realized after that these averages include failed episodes during evaluation). The training dataset dusty-nv/bridge_orig_ep2500
was used to measure the action Train Accuracy, while the previously unused and distinct dusty-nv/bridge_orig_ep100
was used for Validation Accuracy.
"},{"location":"openvla.html#episodes-vs-epochs","title":"Episodes vs Epochs","text":"Upon measuring the success rates of the other fine-tuned models that were trained on fewer episodes for more epochs, we can see the impact of increasing the size of the dataset:
Episodes Epochs Frames Task Success Avg Frames 500 10 550K 23% 186 1000 6 660K 48% 165 2500 4 1.1M 84% 138 That isn't to say that the 2500-episode model still isn't overfit - it is after learning to always stack the smallr block and essentially ignore the instructions. More task diversity in the training is required, which we can do now that we have the workflow. And we want to branch out into real-world domains instead of test environments in simulation to prove model viability. The amount of variance and data required to achieve generalist models in the challenging problem space of manipulation does raise interesting questions about purely random task learning versus more guided and curated approaches that ramp up in complexity as training progresses.
"},{"location":"openvla.html#future-research","title":"Future Research","text":"Embodied agents are an exciting emerging area at the forefront of robotics and physical AI, with many promising avenues to investigate further. Follow-up publications to OpenVLA include Embodied-CoT and CrossFormer, along with others sure to be in development. In addition to proceeding to physical testing, these areas of interest we'll experiment with as research progresses:
Smaller LLM with higher-resolution vision encoder(s) Multiple image inputs - multiple cameras and/or timesteps Action states from previous frame(s) as input Training on consecutive actions for larger timesteps Similar test model for UGV's in sim Using Isaac Lab and Robocasa sim2real deployment with ROS2 Trade-offs of VLA vs VIT-based approaches "},{"location":"research.html","title":"Jetson AI Lab Research Group","text":"The Jetson AI Lab Research Group is a global collective for advancing open-source Edge ML, open to anyone to join and collaborate with others from the community and leverage each other's work. Our goal is using advanced AI for good in real-world applications in accessible and responsible ways. By coordinating together as a group, we can keep up with the rapidly evolving pace of AI and more quickly arrive at deploying intelligent multimodal agents and autonomous robots into the field.
There are virtual meetings that anyone is welcome to join, offline discussion on the Jetson Projects forum, and guidelines for upstreaming open-source contributions.
Next Meeting - 1/21
The next team meeting is on Tuesday, January 21st at 9am PST - see the invite below or click here to join the meeting in progress.
"},{"location":"research.html#topics-of-interest","title":"Topics of Interest","text":"These are some initial research topics for us to discuss and investigate. This list will vary over time as experiments evolve and the SOTA progresses:
\u2022 Controller LLMs for dynamic pipeline code generation \u2022 Fine-tuning LLM/VLM onboard Jetson AGX Orin 64GB \u2022 HomeAssistant.io integration for smart home [1] [2] \u2022 Continuous multi-image VLM streaming and change detection \u2022 Recurrent LLM architectures (Mamba, RKVW, ect) [1] \u2022 Lightweight low-memory streaming ASR/TTS models \u2022 Diffusion models for image processing and enhancement \u2022 Time Series Forecasting with Transformers [1] [2] \u2022 Guidance, grammars, and guardrails for constrained output \u2022 Inline LLM function calling / plugins from API definitions \u2022 ML DevOps, edge deployment, and orchestration \u2022 Robotics, IoT, and cyberphysical systems integration New topics can be raised to the group either during the meetings or on the forums (people are welcome to work on whatever they want of course)
"},{"location":"research.html#contribution-guidelines","title":"Contribution Guidelines","text":"When experiments are successful, ideally the results will be packaged in such a way that they are easily reusable for others to integrate into their own projects:
Open-source libraries & code on GitHub Models on HuggingFace Hub Containers provided by jetson-containers Discussions on the Jetson Projects forum Documentation & tutorials on Jetson AI Lab Hackster.io for hardware-centric builds Ongoing technical discussions are encouraged to occur on the forums or GitHub Issues, with status updates on projects given during the meetings.
"},{"location":"research.html#meeting-schedule","title":"Meeting Schedule","text":"We'll aim to meet monthly or bi-weekly as a team in virtual meetings that anyone is welcome to join and speak during. We'll discuss the latest updates and experiments that we want to explore. Please remain courteous to others during the calls. We'll stick around after for anyone who has questions or didn't get the chance to be heard.
Tuesday January 21st at 9am PST (1/21/24)
Microsoft Teams - Meeting Link Meeting ID: 264 770 145 196
Passcode: Uwbdgj
Outlook Invite: Jetson AI Lab Research Group.ics
The agenda will be listed here beforehand - post to the forum to add agenda items. The meetings will be recorded so anyone unable to attend live can watch them after.
"},{"location":"research.html#past-meetings","title":"Past Meetings","text":"Recordings Archive The meeting recordings are stored on Google Drive here: https://drive.google.com/drive/folders/18BC7o32jorx_LzZXx5wW0Io_nf1ZwO6X?usp=sharing
October 15, 2024 Topics Covered:
HuggingFace LeRobot (Chitoku Yato) Stanley H1 Humanoid (Dave Niewinski) October 1, 2024 Topics Covered:
ReMEmbR: Long-Horizon Memory for Navigation (Abrar Anwar) Diffusion Policies, Shape LLM, 3D Encoders September 17, 2024 Topics Covered:
NASA JPL - ROSA (Rob Royce & Shehryar Khattak) LeRobot Walkthrough (Chitoku Yato) VLM Agent in Isaac Sim/ROS (Kabilan Kb) September 3, 2024 Topics Covered:
Edge NeRF's and nerfstudio (Johnny N\u00fa\u00f1ez Cano) Review of OpenVLA results (Dustin Franklin) Oculus Interface for Jetson (Al Costa) TAO Toolkit 5.5 August 20, 2024 Topics Covered:
GenAI ROS Nodes for VLM (Khannah Shaltiel) Isaac Sim and Orin Nano with Hardware-in-the-Loop (Kabilan Kb) Miru Edge Deployment Infrastructure (Vedant Nair) August 6, 2024 Topics Covered:
OpenVLA Fine-Tuning Gemma-2-2b (Asier Arranz) Ultralytics YOLOv8 (Lakshantha Dissanayake) July 23, 2024 Topics Covered:
Llama-3 Function & Specs OpenVLA with MimicGen Phi-3 Vision via ONNX (Jambo Chen) OStream GenRunner (Kerry Shih) July 9, 2024 Topics Covered:
OpenVLA Quantization (openvla.github.io) visualnav-transformer (robodhruv/visualnav-transformer) Florence-2, Holoscan, Grammars (Nigel Nelson, florence-2-vision) LLaMa-Factory (hiyouga/LLaMA-Factory) June 25, 2024 Topics Covered:
Function Calling in Agent Studio Jetson Copilot (Chitoku Yato) Jetson Platform Services (Sammy Ochoa) On-device Fine-tuning (Nurgaliyev Shakhizat) June 11, 2024 Topics Covered:
Agent Studio HomeAssistant 2024.6 AWS IoT Greengrass (Romil Shah) May 29, 2024 Topics Covered:
OpenAI-style Tools with NousResearch/Hermes-2-Pro-Llama-3-8B Jetson Copilot with jetrag whisper_trt for Orin Nano May 15, 2024 Topics Covered:
VILA-1.5 on Video Sequences Voicecraft Container (Martin Cerven) JetBot / Nanosaur Updates for Orin Nano (Chitoku Yato & Raffaello Bonghi) Controller LLM & Advanced Function Calling (NousResearch/Hermes-2-Pro-Llama-3-8B
) RAG Samples with LlamaIndex (Chitoku Yato) May 1, 2024 Topics Covered:
Function Calling with Llama-3 Home Assistant / Wyoming (Mieszko Syty) Smart Sorting / Recycling (Alvaro Costa) April 17, 2024 Topics Covered:
Ollama Support for Jetson Devices Home Assistant Integration jetson-container
Updates Upgrading JetBot with Orin Nano April 3, 2024 Project Kickoffs:
Home Assistant Integration Agent Controller LLM ML DevOps, Containers, Core Inferencing "},{"location":"research.html#team-members","title":"Team Members","text":"Below are shown some of the sustaining members of the group who have been working on generative AI in edge computing:
Dustin Franklin, NVIDIA Principal Engineer | Pittsburgh, PA (jetson-inference, jetson-containers) Nurgaliyev Shakhizat Institute of Smart Systems and AI | Kazakhstan (Assistive Devices, Vision2Audio, HPC) Kris Kersey, Kersey Fabrications Embedded Software Engineer | Atlanta, GA (The OASIS Project, AR/VR, 3D Fabrication) Johnny N\u00fa\u00f1ez Cano PhD Researcher in CV/AI | Barcelona, Spain (Recurrent LLMs, Pose & Behavior Analysis) Doruk S\u00f6nmez, ConnectTech Intelligent Video Analytics Engineer | Turkey (NVIDIA DLI Certified Instructor, IVA, VLM) Akash James, Spark Cognition AI Architect, UC Berkeley Researcher | Oakland (NVIDIA AI Ambassador, Personal Assistants) Mieszko Syty, MS/1 Design AI/ML Engineer | Warsaw, Poland (LLM, Home Assistants, ML DevOps) Jim Benson, JetsonHacks DIY Extraordinaire | Los Angeles, CA (AI in Education, RACECAR/J) Chitoku Yato, NVIDIA Jetson AI DevTech | Santa Clara, CA (JetBot, JetRacer, MinDisk, Containers) Dana Sheahen, NVIDIA DLI Curriculum Developer | Santa Clara, CA (AI in Education, Jetson AI Fundamentals) Sammy Ochoa, NVIDIA Jetson AI DevTech | Austin, TX (Metropolis Microservices) John Welsh, NVIDIA (NanoOWL, NanoSAM, JetBot, JetRacer, torch2trt, trt_pose, Knowledge Distillation) Dave Niewinski Dave's Armoury | Waterloo, Ontario (GLaDOS, Fetch, Offroad La-Z-Boy, KUKA Bot) Gary Hilgemann, REBOTNIX CEO & AI Roboticist | L\u00fcnen, Germany (GUSTAV, SPIKE, VisionTools, GenAI) Elaine Wu, Seeed Studio AI & Robotics Partnerships | Shenzhen, China (reComputer, YOLOv8, LocalJARVIS, Voice Bot) Patty Delafuente, NVIDIA Data Scientist & UMBC PhD Student | MD (AI in Education, DLI Robotics Teaching Kit) Song Han, MIT HAN Lab NVIDIA Research | Cambridge, MA (Efficient Large Models, AWQ, VILA) Bryan Hughes, Mimzy AI Founder, Entrepreneur | SF Bay Area (Multimodal Assistants, AI at the Edge) Tianqi Chen, CMU Catalyst OctoML, CTO | Seattle, WA (MLC, Apache TVM, XGBoost) Michael Gr\u00fcner, RidgeRun Team Lead / Architect | Costa Rica (Embedded Vision & AI, Multimedia) Jesse Flot, CMU Robotics Academy Co-Director | Pittsburgh, PA (Applied AI & Robotics, Autonomy Foundations) Paul DeCarlo, Microsoft Professor | University of Houston (Azure IoT, Video Analytics, Microsoft JARVIS) Mike Hansen, Nabu Casa Voice Engineering Lead | Urbandale, Iowa (Home Assistant, Piper TTS, Wyoming) Lakshantha Dissanayake, Ultralytics Embedded CV Engineer | Vancouver, BC (YOLOv8, TensorRT, DeepStream) Kerry Shih, GenAI Nerds Founder, CEO | Los Angeles, CA (Gapi) Ziad Youssfi, CMU ECE Professor | Pittsburgh, PA (ML in Robotics & Embedded Systems) Walter Lucetti, Stereolabs Robotics & Vision Engineer | Paris, France (MyzharBot, ROS2, GStreamer) Raffaello Bonghi, NVIDIA AI & Robotics Engineer | Manchester, UK (Nanosaur, Panther, jetson-stats) Alvaro Costa, ANS Group AI & Robotics Lead | Epsom, UK (TeknTrash, StipraPOD) David Pearson, ConnectTech Embedded Systems Engineer | Ontario, CA (Edge AI Systems, Vision/Language Models) Jason Seawall, Numurus CEO | Seattle, WA (NEPI, Edge AI & Automation) Martin Cerven AI Researcher | Germany (Autonomous Robotics, Voicecraft) Romil Shah, Amazon GenAI IIoT @ AWS | San Jose, CA (aws-samples/genai-at-edge
) Kabilan Kb, Roboticist NVIDIA DLI Ambassador | Tamil Nadu, IN (ROS2 Tutorials, Autonomous Wheelchair)"},{"location":"ros.html","title":"ROS2 Nodes for Generative AI","text":"The ros2_nanollm
package provides ROS2 nodes for running optimized LLM's and VLM's locally inside a container. These are built on NanoLLM and ROS2 Humble for deploying generative AI models onboard your robot with Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm:humble
container image Space for models (>10GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"ros.html#running-the-live-demo","title":"Running the Live Demo","text":"Recommended
Before you start, please review NanoVLM and Live LLaVa demos. For primary documentation, view ROS2 NanoLLM.
Ensure you have a camera device connected
ls /dev/video*\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nano_llm:humble) \\\n ros2 launch ros2_nanollm camera_input_example.launch.py\n
This command will start the launch file of the container.
By default this will load the Efficient-Large-Model/Llama-3-VILA1.5-8B
VLM and publish the image captions and overlay to topics that can be subscribed to by your other nodes, or visualized with RViz or Foxglove. Refer to the ros2_nanollm
repo for documentation on the input/output topics that are exposed.
"},{"location":"ros.html#build-your-own-ros-nodes","title":"Build your own ROS Nodes","text":"To build your own ROS2 node using LLM or VLM, first create a ROS 2 workspace and package in a directory mounted to the container (following the ROS 2 Humble Documentation). Your src folder should then look like this:
\u2514\u2500\u2500 src \n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 launch \n \u2514\u2500\u2500 camera_input.launch.py\n \u251c\u2500\u2500 resource\n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 your-package-name\n \u2514\u2500\u2500 __init__.py \n \u2514\u2500\u2500 your-node-name_py.py\n \u251c\u2500\u2500 test\n \u2514\u2500\u2500 test_copyright.py\n \u2514\u2500\u2500 test_flake8.py\n \u2514\u2500\u2500 test_pep257.py\n \u251c\u2500\u2500 package.xml\n \u251c\u2500\u2500 setup.cfg\n \u251c\u2500\u2500 setup.py\n \u2514\u2500\u2500 README.md\n
We will create the launch folder, as well as the camera_input.launch.py and your-node-name_py.py files in later steps.
"},{"location":"ros.html#editing-the-setup","title":"Editing the Setup","text":"Let\u2019s begin by editing the setup.py
file. At the top of the file, add
from glob import glob \n
In the setup method, find the data_files=[]
line, and make sure it looks like this:
data_files=[\n ('share/ament_index/resource_index/packages',\n ['resource/' + package_name]),\n ('share/' + package_name, ['package.xml']),\n ('share/' + package_name, glob('launch/*.launch.py')),\n ],\n
Edit the maintainer line with your name. Edit the maintainer email to your email. Edit the description line to describe your package.
maintainer='kshaltiel', \nmaintainter_email='kshaltiel@nvidia.com', \ndescription='YOUR DESCRIPTION', \n
Find the console_scripts
line in the entry_points method. Edit the inside to be:
'your-node-name_py = your-package-name.your-node-name_py:main'\n
For example:
entry_points={\n 'console_scripts': [\n 'nano_llm_py = ros2_nanollm.nano_llm_py:main'\n ],\n },\n
All done for this file!"},{"location":"ros.html#creating-the-node","title":"Creating the Node","text":"Inside your package, under the folder that shares your package's name and contains the __init__.py
file, create a file named after your node. For NanoLLM, this file would be called nano_llm_py.py
.
Paste the following code into the empty file:
import rclpy \nfrom std_msgs.msg import String\nfrom sensor_msgs.msg import Image\nfrom cv_bridge import CvBridge\nfrom PIL import Image as im\nfrom MODEL_NAME import NECESSARY_MODULES\n\nclass Your_Model_Subscriber(Node):\n\n def __init__(self):\n super().__init__('your_model_subscriber')\n\n #EDIT PARAMETERS HERE \n self.declare_parameter('param1', \"param1_value\") \n self.declare_parameter('param2', \"param2_value\")\n\n # Subscriber for input query\n self.query_subscription = self.create_subscription(\n String,\n 'input_query',\n self.query_listener_callback,\n 10)\n self.query_subscription # prevent unused variable warning\n\n # Subscriber for input image\n self.image_subscription = self.create_subscription(\n Image,\n 'input_image',\n self.image_listener_callback,\n 10)\n self.image_subscription # prevent unused variable warning\n\n # To convert ROS image message to OpenCV image\n self.cv_br = CvBridge() \n\n #LOAD THE MODEL\n self.model = INSERT_MODEL.from_pretrained(\"PATH-TO-MODEL\")\n\n #chatHistory var \n self.chat_history = ChatHistory(self.model)\n\n ## PUBLISHER\n self.output_publisher = self.create_publisher(String, 'output', 10)\n self.query = \"Describe the image.\"\n\n def query_listener_callback(self, msg):\n self.query = msg.data\n\n def image_listener_callback(self, data): \n input_query = self.query\n\n # call model with input_query and input_image \n cv_img = self.cv_br.imgmsg_to_cv2(data, 'rgb8')\n PIL_img = im.fromarray(cv_img)\n\n # Parsing input text prompt\n prompt = input_query.strip(\"][()\")\n text = prompt.split(',')\n self.get_logger().info('Your query: %s' % text) #prints the query\n\n #chat history \n self.chat_history.append('user', image=PIL_img)\n self.chat_history.append('user', prompt, use_cache=True)\n embedding, _ = self.chat_history.embed_chat()\n\n #GENERATE OUTPUT\n output = self.model.generate(\n inputs=embedding,\n kv_cache=self.chat_history.kv_cache,\n min_new_tokens = 10,\n streaming = False, \n do_sample = True,\n )\n\n output_msg = String()\n output_msg.data = output\n self.output_publisher.publish(output_msg)\n self.get_logger().info(f\"Published output: {output}\")\n\ndef main(args=None):\n rclpy.init(args=args)\n\n your_model_subscriber = Your_Model_Subscriber()\n\n rclpy.spin(your_model_subscriber)\n\n # Destroy the node explicitly\n # (optional - otherwise it will be done automatically\n # when the garbage collector destroys the node object)\n nano_llm_subscriber.destroy_node()\n rclpy.shutdown()\n\nif __name__ == '__main__':\n main()\n
Edit the import statement at the top of the file to import the necessary modules from the model.
Next, edit the class name and name inside the __init__()
function to reflect the model that will be used.
Find the comment that reads #EDIT PARAMETERS HERE
. Declare all parameters except for the model name following the format in the file. Under the #LOAD THE MODEL
comment, include the path to the model.
Lastly, edit the generate method under the GENERATE OUTPUT
comment to include any additional parameters.
All done for this file!
"},{"location":"ros.html#creating-the-launch-file","title":"Creating the Launch File","text":"Inside your package, create the launch folder. Create your launch file inside of it.
mkdir launch\ncd launch \ntouch camera_input.launch.py\n
You can edit this file externally, and it will update within the container. Paste the following code into the empty file.
from launch import LaunchDescription\nfrom launch_ros.actions import Node\nfrom launch.substitutions import LaunchConfiguration\nfrom launch.actions import DeclareLaunchArgument\n\ndef generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'param1',\n default_value='param1_default',\n description='Description of param1'),\n DeclareLaunchArgument(\n 'param2',\n default_value='param2_default',\n description='Description of param2'),\n ]\n\n\n #Your model parameters \n param1 = LaunchConfiguration('param1')\n param2 = LaunchConfiguration('param2')\n\n #camera node for camera input\n cam2image_node = Node(\n package='image_tools',\n executable='cam2image',\n remappings=[('image', 'input_image')],\n )\n\n #model node\n model_node = Node(\n package='your-package-name', #make sure your package is named this\n executable='your-node-name_py', \n parameters=[{\n 'param1': param1, \n 'param2': param2,\n }]\n )\n\n final_launch_description = launch_args + [cam2image_node] + [model_node]\n\n return LaunchDescription(final_launch_description)\n
Find the required parameters for your model. You can view this by looking at the Model API for your specific model and taking note to how the model is called. For example, NanoLLM retrieves models through the following:
model = NanoLLM.from_pretrained(\n \"meta-llama/Llama-3-8b-hf\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n
The parameters for NanoLLM would be the model name, api, and quantization.
In the generate_launch_description
function, edit the DeclareLaunchArgument
to accomodate for all parameters except the model name. For NanoLLM, this would look like:
def generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'api',\n default_value='mlc',\n description='The model backend to use'),\n DeclareLaunchArgument(\n 'quantization',\n default_value='q4f16_ft',\n description='The quantization method to use'),\n ]\n
Then edit the lines under #Your model Parameters
to match the parameters of your model, again excluding the model name. Lastly, fill in the code under the #model node
comment with your package name, the name of your node file, and all of your parameters, this time including the model.
All done for this file!
"},{"location":"tensorrt_llm.html","title":"TensorRT-LLM for Jetson","text":"TensorRT-LLM is a high-performance LLM inference library with advanced quantization, attention kernels, and paged KV caching. Initial support for building TensorRT-LLM from source for JetPack 6.1 has been included in the v0.12.0-jetson
branch of the TensorRT-LLM repo for Jetson AGX Orin.
We've provided pre-compiled TensorRT-LLM wheels and containers along with this guide for TensorRT-LLM Deployment on Jetson Orin
What you need
One of the following Jetson devices:
Jetson AGX Orin *Support for other Orin devices is currently undergoing testing.
Running one of the following versions of JetPack:
JetPack 6.1 (L4T r36.4)
Sufficient storage space (preferably with NVMe SSD).
18.5GB
for tensorrt_llm
container image Space for models (>10GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tensorrt_llm.html#building-tensorrt-llm-engine-for-llama","title":"Building TensorRT-LLM Engine for Llama","text":"You can find the steps for converting Llama to TensorRT-LLM under examples/llama
in the repo, and also in the documentation. This script will automate the process for Llama-7B with INT4 quantization applied, and run some generation and performance checks on the model:
jetson-containers run \\\n -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n -e FORCE_BUILD=on \\\n dustynv/tensorrt_llm:0.12-r36.4.0 \\\n /opt/TensorRT-LLM/llama.sh\n
There are many such conversion procedures outlined in the TensorRT-LLM examples for different model architectures.
"},{"location":"tensorrt_llm.html#openai-api-endpoint","title":"OpenAI API Endpoint","text":"TensorRT-LLM has programming APIs for Python and C++ available, but it also includes an example server endpoint for the OpenAI protocol that makes it easy to substitute for other local or cloud model backends.
This will start the TensorRT-LLM container with the server and model that you built above:
jetson-containers run \\\n dustynv/tensorrt_llm:0.12-r36.4.0 \\\n python3 /opt/TensorRT-LLM/examples/apps/openai_server.py \\\n /data/models/tensorrt_llm/Llama-2-7b-chat-hf-gptq\n
Then you can make chat completion requests against it from practically any language or from any connected device. This example shows a simple way of testing it initially from another terminal with curl:
curl http://localhost:8000/v1/completions \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"model\": <model_name>,\n \"prompt\": \"Where is New York?\",\n \"max_tokens\": 16,\n \"temperature\": 0\n }'\n
Or the code included with openai_client.py will handle these requests using the standard openai-python
package that can be installed outside of the container or on another machine.
jetson-containers run \\\n --workdir /opt/TensorRT-LLM/examples/apps \\\n dustynv/tensorrt_llm:0.12-r36.4.0 \\\n python3 openai_client.py --prompt \"Where is New York?\" --api chat\n
The patches in the branch above for TensorRT-LLM 0.12 are a preview release for Jetson AGX Orin, and we'll continue with validating and testing the various settings in TensorRT-LLM. If you need any support, please post to the Jetson Developer Forums..
"},{"location":"tips_ram-optimization.html","title":"RAM Optimization","text":"Running a LLM requires a huge RAM space.
Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models.
Here we share a couple of ways to optimize the system RAM usage.
"},{"location":"tips_ram-optimization.html#disabling-the-desktop-gui","title":"Disabling the Desktop GUI","text":"If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).
You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:
$ sudo init 3 # stop the desktop\n# log your user back into the console (Ctrl+Alt+F1, F2, ect)\n$ sudo init 5 # restart the desktop\n
If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:
To disable desktop on boot
sudo systemctl set-default multi-user.target\n
To enable desktop on boot
sudo systemctl set-default graphical.target\n
"},{"location":"tips_ram-optimization.html#disabling-misc-services","title":"Disabling misc services","text":"sudo systemctl disable nvargus-daemon.service\n
"},{"location":"tips_ram-optimization.html#mounting-swap","title":"Mounting Swap","text":"If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:
If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.
sudo systemctl disable nvzramconfig\nsudo fallocate -l 16G /ssd/16GB.swap\nsudo mkswap /ssd/16GB.swap\nsudo swapon /ssd/16GB.swap\n
Then add the following line to the end of /etc/fstab to make the change persistent:
/ssd/16GB.swap none swap sw 0 0\n
"},{"location":"tips_ssd-docker.html","title":"Tips - SSD + Docker","text":"Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using jetson-containers
, you want to make sure you have a huge storage space for all the containers and the models you will download.
We are going to show how you can install SSD on your Jetson, and set it up for Docker.
"},{"location":"tips_ssd-docker.html#ssd","title":"SSD","text":""},{"location":"tips_ssd-docker.html#physical-installation","title":"Physical installation","text":" Unplug power and any peripherals from the Jetson developer kit. Physically install an NVMe SSD card on the carrier board of your Jetson developer kit, making sure to properly seat the connector and secure with the screw. Reconnect any peripherals, and then reconnect the power supply to turn on the Jetson developer kit. Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:
lspci\n
The output should look like the following:
0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)\n
"},{"location":"tips_ssd-docker.html#format-and-set-up-auto-mount","title":"Format and set up auto-mount","text":" Run lsblk
to find the device name.
lsblk\n
The output should look like the following:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT\nloop0 7:0 0 16M 1 loop \nmmcblk1 179:0 0 59.5G 0 disk \n\u251c\u2500mmcblk1p1 179:1 0 58G 0 part /\n\u251c\u2500mmcblk1p2 179:2 0 128M 0 part \n\u251c\u2500mmcblk1p3 179:3 0 768K 0 part \n\u251c\u2500mmcblk1p4 179:4 0 31.6M 0 part \n\u251c\u2500mmcblk1p5 179:5 0 128M 0 part \n\u251c\u2500mmcblk1p6 179:6 0 768K 0 part \n\u251c\u2500mmcblk1p7 179:7 0 31.6M 0 part \n\u251c\u2500mmcblk1p8 179:8 0 80M 0 part \n\u251c\u2500mmcblk1p9 179:9 0 512K 0 part \n\u251c\u2500mmcblk1p10 179:10 0 64M 0 part \n\u251c\u2500mmcblk1p11 179:11 0 80M 0 part \n\u251c\u2500mmcblk1p12 179:12 0 512K 0 part \n\u251c\u2500mmcblk1p13 179:13 0 64M 0 part \n\u2514\u2500mmcblk1p14 179:14 0 879.5M 0 part \nzram0 251:0 0 1.8G 0 disk [SWAP]\nzram1 251:1 0 1.8G 0 disk [SWAP]\nzram2 251:2 0 1.8G 0 disk [SWAP]\nzram3 251:3 0 1.8G 0 disk [SWAP]\nnvme0n1 259:0 0 238.5G 0 disk \n
Identify the device corresponding to your SSD. In this case, it is nvme0n1
.
Format the SSD, create a mount point, and mount it to the filesystem.
sudo mkfs.ext4 /dev/nvme0n1\n
You can choose any name for the mount point directory. We use /ssd
here, but in jetson-containers
' setup.md documentation, /mnt
is used.
sudo mkdir /ssd\n
sudo mount /dev/nvme0n1 /ssd\n
In order to ensure that the mount persists after boot, add an entry to the fstab
file:
First, identify the UUID for your SSD:
lsblk -f\n
Then, add a new entry to the fstab
file:
sudo vi /etc/fstab\n
Insert the following line, replacing the UUID with the value found from lsblk -f
:
UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2\n
Finally, change the ownership of the /ssd
directory.
sudo chown ${USER}:${USER} /ssd\n
"},{"location":"tips_ssd-docker.html#docker","title":"Docker","text":" Install nvidia-container
package.
Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including nvidia-containers
) and Docker are already pre-installed, so this step can be skipped.
sudo apt update\nsudo apt install -y nvidia-container\n
JetPack 6.x users
If you flash **Jetson Linux (L4T) R36.x (JetPack 6.x) on your Jetson using SDK Manager, and install nvidia-container
using apt
, on JetPack 6.x it no longer automatically installs Docker.
Therefore, you need to run the following to manually install Docker and set it up.
sudo apt update\nsudo apt install -y nvidia-container curl\ncurl https://get.docker.com | sh && sudo systemctl --now enable docker\nsudo nvidia-ctk runtime configure --runtime=docker\n
Restart the Docker service and add your user to the docker
group, so that you don't need to use the command with sudo
.
sudo systemctl restart docker\nsudo usermod -aG docker $USER\nnewgrp docker\n
Add default runtime in /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert the \"default-runtime\": \"nvidia\"
line as following:
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\"\n}\n
Restart Docker
sudo systemctl daemon-reload && sudo systemctl restart docker\n
"},{"location":"tips_ssd-docker.html#migrate-docker-directory-to-ssd","title":"Migrate Docker directory to SSD","text":"Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.
Stop the Docker service.
sudo systemctl stop docker\n
Move the existing Docker folder
sudo du -csh /var/lib/docker/ && \\\n sudo mkdir /ssd/docker && \\\n sudo rsync -axPS /var/lib/docker/ /ssd/docker/ && \\\n sudo du -csh /ssd/docker/ \n
Edit /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert \"data-root\"
line like the following.
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\",\n \"data-root\": \"/ssd/docker\"\n}\n
Rename the old Docker data directory
sudo mv /var/lib/docker /var/lib/docker.old\n
Restart the docker daemon
sudo systemctl daemon-reload && \\\n sudo systemctl restart docker && \\\n sudo journalctl -u docker\n
"},{"location":"tips_ssd-docker.html#test-docker-on-ssd","title":"Test Docker on SSD","text":" [Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.
watch -n1 df \n
[Terminal 2] Next, open a new terminal and start Docker pull.
docker pull nvcr.io/nvidia/l4t-base:r35.2.1\n
[Terminal 1] Observe that the disk usage on /ssd
goes up as the container image is downloaded and extracted.
~$ docker image ls\nREPOSITORY TAG IMAGE ID CREATED SIZE\nnvcr.io/nvidia/l4t-base r35.2.1 dc07eb476a1d 7 months ago 713MB\n
"},{"location":"tips_ssd-docker.html#final-verification","title":"Final Verification","text":"Reboot your Jetson, and verify that you observe the following:
~$ sudo blkid | grep nvme\n/dev/nvme0n1: UUID=\"9fc06de1-7cf3-43e2-928a-53a9c03fc5d8\" TYPE=\"ext4\"\n\n~$ df -h\nFilesystem Size Used Avail Use% Mounted on\n/dev/mmcblk1p1 116G 18G 94G 16% /\nnone 3.5G 0 3.5G 0% /dev\ntmpfs 3.6G 108K 3.6G 1% /dev/shm\ntmpfs 734M 35M 699M 5% /run\ntmpfs 5.0M 4.0K 5.0M 1% /run/lock\ntmpfs 3.6G 0 3.6G 0% /sys/fs/cgroup\ntmpfs 734M 88K 734M 1% /run/user/1000\n/dev/nvme0n1 458G 824M 434G 1% /ssd\n\n~$ docker info | grep Root\n Docker Root Dir: /ssd/docker\n\n~$ sudo ls -l /ssd/docker/\ntotal 44\ndrwx--x--x 4 root root 4096 Mar 22 11:44 buildkit\ndrwx--x--- 2 root root 4096 Mar 22 11:44 containers\ndrwx------ 3 root root 4096 Mar 22 11:44 image\ndrwxr-x--- 3 root root 4096 Mar 22 11:44 network\ndrwx--x--- 13 root root 4096 Mar 22 16:20 overlay2\ndrwx------ 4 root root 4096 Mar 22 11:44 plugins\ndrwx------ 2 root root 4096 Mar 22 16:19 runtimes\ndrwx------ 2 root root 4096 Mar 22 11:44 swarm\ndrwx------ 2 root root 4096 Mar 22 16:20 tmp\ndrwx------ 2 root root 4096 Mar 22 11:44 trust\ndrwx-----x 2 root root 4096 Mar 22 16:19 volumes\n\n~$ sudo du -chs /ssd/docker/\n752M /ssd/docker/\n752M total\n\n~$ docker info | grep -e \"Runtime\" -e \"Root\"\n Runtimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2\n Default Runtime: nvidia\n Docker Root Dir: /ssd/docker\n
Your Jetson is now set up with the SSD!
"},{"location":"try.html","title":"Try","text":"Jump to NVIDIA Jetson Store.
"},{"location":"tutorial-intro.html","title":"Tutorial - Introduction","text":""},{"location":"tutorial-intro.html#overview","title":"Overview","text":"Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.
"},{"location":"tutorial-intro.html#text-llm","title":"Text (LLM)","text":"text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui Ollama Get started effortlessly deploying GGUF models for chat and web UI llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! NanoLLM Optimized inferencing library for LLMs, multimodal agents, and speech. Small LLM (SLM) Deploy Small Language Models (SLM) with reduced memory usage and higher throughput. API Examples Learn how to write Python code for doing LLM inference using popular APIs."},{"location":"tutorial-intro.html#text-vision-vlm","title":"Text + Vision (VLM)","text":"Give your locally running LLM an access to vision!
LLaVA Different ways to run LLaVa vision/language model on Jetson for visual understanding. Live LLaVA Run multimodal models interactively on live video streams over a repeating set of prompts. NanoVLM Use mini vision/language models and the optimized multimodal pipeline for live streaming. Llama 3.2 Vision Run Meta's multimodal Llama-3.2-11B-Vision model on Orin with HuggingFace Transformers."},{"location":"tutorial-intro.html#vision-transformers","title":"Vision Transformers","text":"EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson SAM Meta's SAM, Segment Anything model TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation"},{"location":"tutorial-intro.html#image-generation","title":"Image Generation","text":"Flux + ComfyUI Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. Stable Diffusion Run AUTOMATIC1111's stable-diffusion-webui
to generate images from prompts SDXL Ensemble pipeline consisting of a base model and refiner with enhanced image generation. nerfstudio Experience neural reconstruction and rendering with nerfstudio and onboard training."},{"location":"tutorial-intro.html#audio","title":"Audio","text":"Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR) AudioCraft Meta's AudioCraft, to produce high-quality audio and music Voicecraft Interactive speech editing and zero shot TTS"},{"location":"tutorial-intro.html#rag-vector-database","title":"RAG & Vector Database","text":"NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data LlamaIndex Realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents LlamaIndex Reference application for building your own local AI assistants using LLM, RAG, and VectorDB"},{"location":"tutorial-intro.html#api-integrations","title":"API Integrations","text":"ROS2 Nodes Optimized LLM and VLM provided as ROS2 nodes for robotics Holoscan SDK Use the Holoscan-SDK to run high-throughput, low-latency edge AI pipelines Jetson Platform Services Quickly build microservice driven vision applications with Jetson Platform Services Gapi Workflows Integrating generative AI into real world environments Gapi Micro Services Wrapping models and code to participate in systems Ultralytics YOLOv8 Run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT."},{"location":"tutorial-intro.html#about-nvidia-jetson","title":"About NVIDIA Jetson","text":"Note
We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores RAM(CPU+GPU) 64GB 32GB 8GB Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)"},{"location":"tutorial_api-examples.html","title":"Tutorial - API Examples","text":"It's good to know the code for generating text with LLM inference, and ancillary things like tokenization, chat templates, and prompting. On this page we give Python examples of running various LLM APIs, and their benchmarks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35) JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for l4t-text-generation
container image Space for models (>10GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_api-examples.html#transformers","title":"Transformers","text":"The HuggingFace Transformers API is the de-facto API that models are released for, often serving as the reference implementation. It's not terribly fast, but it does have broad model support, and also supports quantization (AutoGPTQ, AWQ). This uses streaming:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer\nfrom threading import Thread\n\nmodel_name='meta-llama/Llama-2-7b-chat-hf'\nmodel = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nstreamer = TextIteratorStreamer(tokenizer)\n\nprompt = [{'role': 'user', 'content': 'Can I get a recipe for French Onion soup?'}]\ninputs = tokenizer.apply_chat_template(\n prompt,\n add_generation_prompt=True,\n return_tensors='pt'\n).to(model.device)\n\nThread(target=lambda: model.generate(inputs, max_new_tokens=256, streamer=streamer)).start()\n\nfor text in streamer:\n print(text, end='', flush=True)\n
To run this (it can be found here), you can mount a directory containing the script or your jetson-containers directory:
jetson-containers run --volume $PWD/packages/llm:/mount --workdir /mount \\\n $(autotag l4t-text-generation) \\\n python3 transformers/test.py\n
We use the l4t-text-generation
container because it includes the quantization libraries in addition to Transformers, for running the quanztized versions of the models like TheBloke/Llama-2-7B-Chat-GPTQ
"},{"location":"tutorial_api-examples.html#benchmarks","title":"Benchmarks","text":"The huggingface-benchmark.py
script will benchmark the models:
./run.sh --volume $PWD/packages/llm/transformers:/mount --workdir /mount \\\n $(./autotag l4t-text-generation) \\\n python3 huggingface-benchmark.py --model meta-llama/Llama-2-7b-chat-hf\n
* meta-llama/Llama-2-7b-chat-hf AVG = 20.7077 seconds, 6.2 tokens/sec memory=10173.45 MB\n* TheBloke/Llama-2-7B-Chat-GPTQ AVG = 12.3922 seconds, 10.3 tokens/sec memory=7023.36 MB\n* TheBloke/Llama-2-7B-Chat-AWQ AVG = 11.4667 seconds, 11.2 tokens/sec memory=4662.34 MB\n
"},{"location":"tutorial_api-examples.html#nanollm","title":"NanoLLM","text":"The NanoLLM
library uses the optimized MLC/TVM library for inference, like on the Benchmarks page:
> NanoLLM Reference Documentationfrom nano_llm import NanoLLM, ChatHistory, ChatTemplates\n\n# load model\nmodel = NanoLLM.from_pretrained(\n model='meta-llama/Meta-Llama-3-8B-Instruct', \n quantization='q4f16_ft', \n api='mlc'\n)\n\n# create the chat history\nchat_history = ChatHistory(model, system_prompt=\"You are a helpful and friendly AI assistant.\")\n\nwhile True:\n # enter the user query from terminal\n print('>> ', end='', flush=True)\n prompt = input().strip()\n\n # add user prompt and generate chat tokens/embeddings\n chat_history.append(role='user', msg=prompt)\n embedding, position = chat_history.embed_chat()\n\n # generate bot reply\n reply = model.generate(\n embedding, \n streaming=True, \n kv_cache=chat_history.kv_cache,\n stop_tokens=chat_history.template.stop,\n max_new_tokens=256,\n )\n\n # append the output stream to the chat history\n bot_reply = chat_history.append(role='bot', text='')\n\n for token in reply:\n bot_reply.text += token\n print(token, end='', flush=True)\n\n print('\\n')\n\n # save the inter-request KV cache \n chat_history.kv_cache = reply.kv_cache\n
This example keeps an interactive chat running with text being entered from the terminal. You can start it like this:
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat.example\n
Or for easy editing from the host device, copy the source into your own script and mount it into the container with the --volume
flag. And for authenticated models, request access through HuggingFace (like with Llama) and substitute your account's API token above.
"},{"location":"tutorial_audiocraft.html","title":"Tutorial - AudioCraft","text":"Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
10.7 GB
for audiocraft
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_audiocraft.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag audiocraft)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
"},{"location":"tutorial_audiocraft.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"AudioCraft repo comes with demo Jupyter notebooks.
On Jupyter Lab navigation pane on the left, double-click demos
folder.
"},{"location":"tutorial_audiocraft.html#audiogen-demo","title":"AudioGen demo","text":"Run cells with Shift + Enter
, first one will download models, which can take some time.
Info
You may encounter an error message like the following when executing the first cell, but you can keep going.
A matching Triton is not available, some optimizations will not be enabled.\nError caught was: No module named 'triton'\n
In the Audio Continuation cells, you can generate continuation based on text, while in Text-conditional Generation you can generate audio based just on text descriptions.
You can also use your own audio as prompt, and use text descriptions to generate continuation:
prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\") # you can upload your own audio\nprompt_duration = 2\nprompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\noutput = model.generate_continuation(prompt_waveform.expand(3, -1, -1), prompt_sample_rate=prompt_sr,descriptions=[\n 'Subway train blowing its horn', # text descriptions for continuation\n 'Horse neighing furiously',\n 'Cat hissing'\n], progress=True)\ndisplay_audio(output, sample_rate=16000)\n
"},{"location":"tutorial_audiocraft.html#musicgen-and-magnet-demos","title":"MusicGen and MAGNeT demos","text":"The two other jupyter notebooks are similar to AuidioGen, where you can generate continuation or generate audio, while using models trained to generate music.
"},{"location":"tutorial_comfyui_flux.html","title":"ComfyUI and Flux on Jetson Orin","text":"Hey there, fellow developer! \ud83d\udc4b I'm excited to share with you our latest project: Flux, an open-source model for image generation. Here at NVIDIA, we're pushing the boundaries to make Flux work seamlessly across all platforms, including our Jetson Orin devices. While we're still fine-tuning the model for the Jetson Orin Nano, we've already got it running smoothly on the Jetson AGX Orin.
In this tutorial, I'm going to walk you through every step needed to get Flux up and running on your Jetson Orin, even if you've just flashed your system. Follow along, and you should have no trouble getting everything set up. And hey, if something doesn't work out, reach out to me\u2014I\u2019ll keep this guide updated to make sure it's always on point.
So, let's dive in and get Flux running on your Jetson!
"},{"location":"tutorial_comfyui_flux.html#1-install-miniconda-and-create-a-python-310-environment","title":"1. Install Miniconda and Create a Python 3.10 Environment","text":"First things first, you'll need to install Miniconda on your Jetson Orin and create a Python 3.10 environment called comfyui
. This will ensure all dependencies are handled properly within an isolated environment.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh\nchmod +x Miniconda3-latest-Linux-aarch64.sh\n./Miniconda3-latest-Linux-aarch64.sh\n\nconda update conda\n\nconda create -n comfyui python=3.10\nconda activate comfyui\n
"},{"location":"tutorial_comfyui_flux.html#2-install-cuda-cudnn-and-tensorrt","title":"2. Install CUDA, cuDNN, and TensorRT","text":"Once your environment is set up, install CUDA 12.4 along with the necessary cuDNN and TensorRT libraries to ensure compatibility and optimal performance on your Jetson Orin.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda-toolkit-12-4 cuda-compat-12-4\nsudo apt-get install cudnn python3-libnvinfer python3-libnvinfer-dev tensorrt\n
"},{"location":"tutorial_comfyui_flux.html#3-verify-and-configure-cuda","title":"3. Verify and Configure CUDA","text":"After installing CUDA, you'll want to verify that the correct version (12.4) is being used and make this change permanent in your environment.
ls -l /usr/local | grep cuda\nsudo ln -s /usr/local/cuda-12.4 /usr/local/cuda\n\nexport PATH=/usr/local/cuda/bin:$PATH\nnvcc --version\n\necho 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc\necho 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc\necho 'export CUDA_PATH=/usr/local/cuda' >> ~/.bashrc\nsource ~/.bashrc\n
"},{"location":"tutorial_comfyui_flux.html#4-compile-and-install-bitsandbytes-with-cuda-support","title":"4. Compile and Install bitsandbytes
with CUDA Support","text":"Now it\u2019s time to compile and install bitsandbytes
with CUDA support. This involves cloning the repository, configuring the build with CMake, compiling using all available cores, and installing the resulting package.
export BNB_CUDA_VERSION=124\nexport LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH\n\ngit clone https://github.com/timdettmers/bitsandbytes.git\ncd bitsandbytes\n\nmkdir -p build\ncd build\ncmake .. -DCOMPUTE_BACKEND=cuda -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4\nmake -j$(nproc)\n\ncd ..\npython setup.py install\n
Verify the installation by importing the package in Python:
python\n>>> import bitsandbytes as bnb\n>>> print(bnb.__version__)\n
"},{"location":"tutorial_comfyui_flux.html#5-install-pytorch-torchvision-and-torchaudio","title":"5. Install PyTorch, TorchVision, and TorchAudio","text":"Next up, install the essential libraries PyTorch
, torchvision
, and torchaudio
for Jetson Orin. You can always check for the latest links here.
pip install http://jetson.webredirect.org/jp6/cu124/+f/5fe/ee5f5d1a75229/torch-2.3.0-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/988/cb71323efff87/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/0aa/a066463c02b4a/torchaudio-2.3.0+952ea74-cp310-cp310-linux_aarch64.whl\n
"},{"location":"tutorial_comfyui_flux.html#6-clone-the-comfyui-repository","title":"6. Clone the ComfyUI Repository","text":"Clone the ComfyUI repository from GitHub to get the necessary source code.
git clone https://github.com/comfyanonymous/ComfyUI.git\ncd ComfyUI\n
"},{"location":"tutorial_comfyui_flux.html#7-update-dependencies","title":"7. Update Dependencies","text":"Make sure all the necessary dependencies are installed by running the requirements.txt
file.
pip install -r requirements.txt\n
"},{"location":"tutorial_comfyui_flux.html#8-resolve-issues-with-numpy","title":"8. Resolve Issues with NumPy","text":"If you encounter issues with NumPy, downgrade to a version below 2.0 to avoid compatibility problems.
pip install \"numpy<2\"\n
"},{"location":"tutorial_comfyui_flux.html#9-run-comfyui","title":"9. Run ComfyUI","text":"Finally, run ComfyUI to ensure everything is set up correctly.
python main.py\n
Great! Now that you\u2019ve got ComfyUI up and running, let's load the workflow to start using the Flux model.
Download the workflow file using this link. And load it from the ComfyUI interface. You\u2019ll need to download the Flux Schnell model flux1-schnell.safetensors
and vae ae.safetensors
from Hugging Face and place the model in the models/unet
folder and vae in models/vae
within ComfyUI. Download clip_l.safetensors
and t5xxl_fp8_e4m3fn.safetensors
from Stability's Hugging Face and place them inside models/clip
folder. Alright, you're all set to launch your first run! Head over to the URL provided by ComfyUI (127.0.0.1:8188) on your Jetson AGX Orin, and hit that Queue Prompt button. The first time might take a little longer as the model loads, but after that, each generation should take around 21 seconds. Plus, you can queue up multiple prompts and let it generate images for hours!!
Happy generating! \ud83c\udf89
ASIER \ud83d\ude80
Some examples:
"},{"location":"tutorial_distillation.html","title":"CLIP model distillation","text":"See \"Jetson Introduction to Knowledge Distillation\" repo's README.md.
https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation
"},{"location":"tutorial_gapi_microservices.html","title":"Tutorial - Gapi Micro Services","text":"A Micro Service is a process that runs a wrapper python script that integrates your custom code/models so they can integrate into Gapi Workflows.
You can run a Micro Service wherever you like and connect it to a Gapi Server via the streaming, hybrid binary+json message protocol.
There are some out-of-the-box \u201cCommunity Micro Services\" that we integrate, test and pack into Docker images. When you run them, they auto integrate, load NVIDIA layers correctly and offer logging to the host system.
Even better, create your own! Just implement an on_message Python handler to process requests and respond. The rest is handled for you.
Think of a Micro Service as a simple wrapper to some code or model you have. It works the same as any other Node inside a Workflow. When it's your Micro Service Node's turn your on_message function will be called. Your script gets the rolling Transaction data for context and you then publish your data directly back into the flow.
"},{"location":"tutorial_gapi_microservices.html#running-the-community-micro-services","title":"Running the Community Micro Services","text":"Requirements for Community Micro Services
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space.
Literally 4KB for your own Micro Service Anywhere from ~4GB to ~10GB for each Community Micro Service #1 Login and go to the Micro Services tab\n#2 Follow the instructions in the blue box on that page to download your custom configuration\n#3 Then follow the instructions below that for installing the Micro Service you want\n
Example of instruction page:
"},{"location":"tutorial_gapi_microservices.html#congrats-you-can-go-through-the-workflow-tips-now","title":"Congrats! You Can Go Through the Workflow Tips Now","text":""},{"location":"tutorial_gapi_microservices.html#creating-your-own-micro-service","title":"Creating Your Own Micro Service","text":"The entire Micro Service zip file is just 4KB with 4 files:
message_handler.py: for you to respond message.py: for the streaming binary/json protocol gapi-ms: as entry point and handler) requirements.txt: defines just asyncio + websockets Full documentation here: Gapi Micro Service Docs. Synopsis below...
#1 Create logical Micro Service in UI and copy the key\n#2 Download the zip file from the UI\n#3 python gapi-ms.py ws://0.0.0.0:8090/gapi-ws [MICROSERVICE_KEY]\n#4 Refresh the UI to confirm it's online\n#5 Edit the message_handler.py to handle binary+json input and change the output\n#6 Add a Micro Service Node to a Workflow and tie it to your Micro Service. Hit Test.\n
"},{"location":"tutorial_gapi_workflows.html","title":"Tutorial - Gapi","text":"Gapi is an embeddable API gateway that creates streaming integrations between AI micro services and the systems that users leverage everyday.
The project's goal is to accelerate the speed of creating pilots and demos of Jetson AI Lab achievements into real world environments
\"On Device\" generative AI doesn't mean it has to live on an island!
Workflow engine with low code UI with dozens of open integrations and customizable clients for mobile web and desktop. Micro service framework for wrapping Jetson containers (Ollama, Whisper, Piper TTS, etc. are done, with more coming). Or wrap your own models/code and integrate it into Workflows. Real-time, hybrid, binary+json messaging smoothens intra-service calls and reduced latency. A fast path to proving generative AI value to stakeholders in their actual environment. "},{"location":"tutorial_gapi_workflows.html#gapi-server","title":"Gapi Server","text":"Embeddable API gateway software that runs in the background with a low code workflow UI for testing. The server is a message hub and state machine for workflow 'nodes' that talk to Micro Services. Think of it as connective-tissue for applications.
A Micro Service is a process that runs some wrapper python scripts that integrates custom code/models into Workflows using a streaming API.
What you need to run Gapi Server on Jetson
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin NX (8GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Docker
Sufficient storage space.
Gapi Server will run on other environments. Email us at support@GenAINerds.com if that's something you think is worthwhile.
Explaining the Steps:
1) On the Docker host, create working dir for persistant data 2) Download configuration files 3) Unzip 4) Pull Docker image, create container and start the process (will return console to you) Copy and Run the Steps:
mkdir ~/gapiData && cd ~/gapiData\ncurl -L https://raw.githubusercontent.com/genai-nerds/Gapi/main/gapiConfigs.zip -o gapiConfigs.zip\nunzip -q gapiConfigs.zip\ndocker run -d --name gapi --network host -v ~/gapiData:/opt/gapi/vdata genainerds/gapi:arm64 /bin/bash -c \"cd /opt/gapi/bin && ./startGapi.sh\"\necho \"You may need to hit Enter now. Afterwards the Docker container 'gapi' should be running\"\n
Troubleshooting:
Keep in mind all data read or written is in ~/gapiData Look at ~/gapiData/gapi.log to see what happened (if say the docker run command doesn't work) gapiServerConfig.json has all the initial setup NOTE: You will need to run some Micro Services before doing anything meaningful, so please review the mini tour below but don't do any of it in the UI untill you complete the setup (instructions at the bottom)
"},{"location":"tutorial_gapi_workflows.html#ui","title":"UI","text":""},{"location":"tutorial_gapi_workflows.html#tips-use-case-templates","title":"Tips & Use Case Templates","text":" When you login there will be an array of Tip Workflows that have notes and explain core concepts.
Tips:
Hello World: Basics plus it literally says hello Run a Local LLM: Play with Ollama graphically and connect it to other systems Streaming Speech to Text: PiperTTS Querying a Vector Database: Query a simple set of vectorized text documents Variables, Flow and Logic: Understand how to setup more robust workflows Calling Workflows from Outside Gapi: Configure Webhooks Workflows Calling Your Code: Micro Service Nodes that invoke your code Communications: 3rd party communications like Slack (IM), Twilio (SMS), SendGrid (EMAIL) "},{"location":"tutorial_gapi_workflows.html#workflows","title":"Workflows","text":"Workflows visually connect the execution and flow of data between Nodes.
A Transaction (or single firing) has \"Rolling Input\" data it accumulates as Node to Node steps each talk to Micro Services and APIs. All Nodes enjoy variables and flow control using familiar json and javascript concepts.
Each Node can append or reference the rolling data in the Transaction while making decisions along the way.
Watch live Transactions as they start from clients, webhooks and published messages from Micro Services with visual feedback and debugging.
"},{"location":"tutorial_gapi_workflows.html#apis-to-business-systems","title":"APIs to Business Systems","text":"Gapi can help smoothly integrate generative AI into systems that people already use everyday via APIs. It has the streaming API to Micro Services plus the state management and chops to handle the outward (webhook) style APIs to existing systems.
Our hope is to rally contributions by the community to keep growing the out-of-the-box/tested Nodes but there is a DIY one as well to manually map what you need into your Workflows.
Some of the out-of-the-box API Nodes: Slack (IM), Twilio (SMS), SendGrid (Email), Service Now (Ticketing), DIY Webhook
"},{"location":"tutorial_gapi_workflows.html#micro-services","title":"Micro Services","text":"There are community published Micro Services as well as custom ones you can make yourself. Gapi Server becomes most useful when leveraging them so please follow the How To below.
Current Community Micro Services:
Whisper Ollama Vector Text to Speech Img to Text Complete the Setup: How To Run and/or Create Micro Services
"},{"location":"tutorial_gapi_workflows.html#support-contribute","title":"Support / Contribute","text":"Gapi is a project from the GenAI Nerds and hosted on Github.
Ask a question, support@GenAINerds.com or Say hello, hello@GenAINerds.com Contribute/create tickets on Github "},{"location":"tutorial_holoscan.html","title":"Tutorial - Holoscan SDK","text":"The Holoscan SDK is part of NVIDIA Holoscan, the AI sensor processing platform that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications, from embedded to edge to cloud.
"},{"location":"tutorial_holoscan.html#holoscan-sdk-core-concepts","title":"Holoscan SDK - Core Concepts","text":"A Holoscan SDK Application
is composed of Fragments
, each of which runs a graph of Operators
. The implementation of that graph is sometimes referred to as a pipeline or workflow, which can be visualized below:
"},{"location":"tutorial_holoscan.html#holoscan-sdk-getting-started-on-jetson","title":"Holoscan SDK - Getting Started on Jetson","text":"The best place to get started using the Holoscan SDK is the HoloHub repo. This is a central repository for the NVIDIA Holoscan AI sensor processing community to share apps and extensions.
So, let's walk through how to run the Surgical Tool Tracking example application from HoloHub!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
13.7 GB
for efficientvit
container image 850 Mb
for Tool Tracking ONNX model + example video Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_holoscan.html#launching-a-holoscan-compatible-container","title":"Launching a Holoscan-compatible Container","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build the Holoscan container.
Use the -v
option to mount HoloHub's ./build
and ./data
directories to the local Holoscan package directory so builds and data are cached across runs
jetson-containers run \\\n -v ./packages/holoscan/holohub/data:/opt/nvidia/holohub/data \\\n -v ./packages/holoscan/holohub/build:/opt/nvidia/holohub/build \\\n $(autotag holoscan)\n
"},{"location":"tutorial_holoscan.html#running-holohubs-endoscopy-tool-tracking-app","title":"Running HoloHub's Endoscopy Tool Tracking App","text":"An example application from HoloHub is the Endoscopy Tool Tracking application. This sample application demonstrates how the Holoscan SDK can be used to build an efficient pipeline that streams a video feed, preprocesses the data, runs inference using TensorRT, post-processes the data, and renders the video feed with the inference overlays.
"},{"location":"tutorial_holoscan.html#building-the-app","title":"Building The App","text":"The Holoscan SDK uses CMake to build C++ applications and also leverages CMake to pull and build app dependencies. So, regardless of whether an application is implemented using C++ or Python, many apps will still require that you \"build\" them first.
The Endoscopy Tool Tracking App has both a Python and C++ implementation. Building this app creates the C++ application program, pulls in an example video, and builds the TensorRT engine used for inference.
Go to the HoloHub directory
cd /opt/nvidia/holohub\n
Build the app using the 'run' script ./run build endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#running-the-python-app","title":"Running The Python App","text":"First, add the Holoscan SDK and the HoloHub build directory to your PYTHONPATH environment variable.
export HOLOHUB_BUILD_PATH=/opt/nvidia/holohub/build/endoscopy_tool_tracking\nexport PYTHONPATH=$PYTHONPATH:$HOLOHUB_BUILD_PATH/python/lib:/opt/nvidia/holoscan/python/lib\n
Next, run the application using Python! python3 /opt/nvidia/holohub/applications/endoscopy_tool_tracking/python/endoscopy_tool_tracking.python --data /opt/nvidia/holohub/data/endoscopy/\n
Shortly after launching the application, you will see the HoloViz window that visualizes the sample video and the model's outputs:
"},{"location":"tutorial_holoscan.html#running-the-c-app","title":"Running The C++ App","text":"The C++ app can be run using the run script by specifying the app name:
./run launch endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#next-steps","title":"Next Steps","text":"Congratulations! You've successfully run a Holoscan SDK application!
To dive deeper into the Holoscan SDK, explore the SDK's documentation on Core Concepts, Holoscan by Example, and Creating an Application.
"},{"location":"tutorial_jetson-copilot.html","title":"Tutorial - Jetson Copilot","text":"Jetson Copilot is a reference application for a local AI assistant, which demonstrates;
Running open-source LLMs (large language models) on device RAG (retrieval-augmented generation) to let LLM have access to your locally indexed knowledge What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6 GB
for jetrag
container image About 4 GB
for downloading some default models (llama3
and mxbai-embed-large
) Info
To run Jetson Copilot, you do not need to have jetson-containers
on your system. It uses the jetrag
container image that is being managed and built on jetson-containers
.
"},{"location":"tutorial_jetson-copilot.html#getting-started","title":"\ud83c\udfc3 Getting started","text":""},{"location":"tutorial_jetson-copilot.html#first-time-setup","title":"First time setup","text":"If this is your first time to run Jetson Copilot on your Jetson, first run setup.sh
to ensure you have all the necessary software installed and the environment set up.
git clone https://github.com/NVIDIA-AI-IOT/jetson-copilot/\ncd jetson-copilot\n./setup_environment.sh\n
It will install the following, if not yet.
Chromium web browser Docker "},{"location":"tutorial_jetson-copilot.html#how-to-start-jetson-copilot","title":"How to start Jetson Copilot","text":"cd jetson-copilot\n./launch_jetson_copilot.sh\n
This will start a Docker container and start a Ollama server and Streamlit app inside the container. It will shows the URL on the console in order to access the web app hosted on your Jetson.
With your web browser on Jetson, open the Local URL (http://localhost:8501
).Or on a PC connected on the same network as on your Jetson, access the Network URL.
Local URL: http://localhost:8501\nNetwork URL: http://10.110.50.252:8501 \n
Info
You will need the Internet connection on Jetson when it launches for the first time, as it will pull the container image (and download the default LLM and embedding model when web UI starts for the first time).
When you access the web UI for the first time, it will dowload the default LLM (llama3
) and the embedding model (mxbai-embed-large
).
Tips
If you are on Ubuntu Desktop, a frameless Chromium window will pop up to access the web app, to make it look like an independent application. You need to close the window as stopping the container on the console won't shutdown Chromium.
"},{"location":"tutorial_jetson-copilot.html#how-to-use-jetson-copilot","title":"\ud83d\udcd6 How to use Jetson Copilot","text":""},{"location":"tutorial_jetson-copilot.html#0-interact-with-the-plain-llama3-8b","title":"0. Interact with the plain Llama3 (8b)","text":"You can use Jetson Copilot just to interact with a LLM withut enabling RAG feature.
By default, Llama3 (8b) model is downloaded when running for the first time and use as the default LLM.
You will be surprized how much a model like Llama3 is capable, but may soon find limitations as it does not have information prior to its cutoff date nor know anything about your specific subject matter.
"},{"location":"tutorial_jetson-copilot.html#1-ask-jetson-related-question-using-pre-built-index","title":"1. Ask Jetson related question using pre-built index","text":"On the side panel, you can toggle \"Use RAG\" on to enable RAG pipeline. The LLM will have an access to a custom knowledge/index that is selected under \"Index\".
As a sample, a pre-build index \"_L4T_README
\" is provided. This is built on all the README text files that supplied in the \"L4T-README\" folder on the Jetson desktop.
It is mounted as /media/<USER_NAME>/L4T-README/
once you execute udisksctl mount -b /dev/disk/by-label/L4T-README
.
You can ask questions like:
What IP address does Jetson gets assigned when connected to a PC via a USB cable in USB Device Mode?\n
"},{"location":"tutorial_jetson-copilot.html#2-build-your-own-index-based-on-your-documents","title":"2. Build your own index based on your documents","text":"You can build your own index based on your local and/or online documents.
First, on the console (or on the desktop) create a directory under Documents
directory to store your documents.
cd jetson-copilot\nmkdir Documents/Jetson-Orin-Nano\ncd Documents/Jetson-Orin-Nano\nwget https://developer.nvidia.com/downloads/assets/embedded/secure/jetson/orin_nano/docs/jetson_orin_nano_devkit_carrier_board_specification_sp.pdf\n
Now back on the web UI, open the side bar, toggle on \"Use RAG\", then click on \"\u2795Build a new index\" to jump to a \"Build Index\" page.
Give a name for the Index you are to build. (e.g. \"JON Carrier Board\") Type in the field and hit Enter
key, then it will check and show what path will be created for your index.
And then from the drop select box under \"Local documents\", select the directory you created and saved your documents in. (e.g. /opt/jetson_copilot/Documents/Jetson-Orin-Nano
).
It will show the summary of files found in the selected directory.
If you want to rather only or additionally supply URLs for the online docuemnts to be ingested, fill the text area with one URL per a line. You can skip this if you are building your index only based on your local documents.
Info
On the sidebar, make sure mxbai-embed-large
is selected for the embedding model.
Use of OpenAI embedding models is not well supported and needs more testing.
Finally, hit \"Build Index\" button. It will show the progress in the drop-down \"status container\", so you can check the status by clicking on it. Once done, it will show the summary of your index and time it took.
You can go back to the home screen to now select the index you just built.
"},{"location":"tutorial_jetson-copilot.html#3-test-different-llm-or-embedding-model","title":"3. Test different LLM or Embedding model","text":"TODO
"},{"location":"tutorial_jetson-copilot.html#development","title":"\ud83c\udfd7\ufe0f Development","text":"Streamlit based web app is very easy to develop.
On web UI, at the top-right of the screen to choose \"Always rerun\" to automatically update your app every time you change the source codes.
See Streamlit Documentation for the detail.
"},{"location":"tutorial_jetson-copilot.html#manually-run-streamlit-app-inside-the-container","title":"Manually run streamlit app inside the container","text":"In case you make more fundamental changes, you can also manually run streamlit app.
cd jetson-copilot\n./launch_dev.sh\n
Once in container;
streamlit run app.py\n
"},{"location":"tutorial_jetson-copilot.html#directory-structure","title":"\ud83e\uddf1 Directory structure","text":"\u2514\u2500\u2500 jetson-copilot\n \u251c\u2500\u2500 launch_jetson_copilot.sh\n \u251c\u2500\u2500 setup_environment.sh\n \u251c\u2500\u2500 Documents \n \u2502 \u2514\u2500\u2500 your_abc_docs\n \u251c\u2500\u2500 Indexes\n \u2502 \u251c\u2500\u2500 _L4T_README\n \u2502 \u2514\u2500\u2500 your_abc_index\n \u251c\u2500\u2500 logs\n \u2502 \u251c\u2500\u2500 container.log\n \u2502 \u2514\u2500\u2500 ollama.log\n \u251c\u2500\u2500 ollama_models\n \u2514\u2500\u2500 Streamlit_app\n \u251c\u2500\u2500 app.py\n \u251c\u2500\u2500 build_index.py\n \u2514\u2500\u2500 download_model.py\n
Following directories inside the jetson-copilot
directory are mounted in the Docker container.
Directory Name Description Docuemtns
Directory to store your documents to be indexed Indexes
Directory to store pre-built (or built-by-you) indexes for LLM to perform RAG on logs
Directory for the app to store log files ollama_models
Directory for the ollama server to store download models stremlit_app
Directory for Python scripts to make up the web app"},{"location":"tutorial_jetson-copilot.html#troubleshooting","title":"\ud83d\udcab Troubleshooting","text":"If you find any issue, please check GitHub Issues of the Jetson Copilot repo.
"},{"location":"tutorial_jps.html","title":"Tutorial - Jetson Platform Services","text":"Jetson Plaform Services (JPS) provide a platform to simplify development, deployment and management of Edge AI applications on NVIDIA Jetson. JPS is a modular & extensible architecture for developers to distill large complex applications into smaller modular microservice with APIs to integrate into other apps & services. At its core are a collection of AI services leveraging generative AI, deep learning, and analytics, which provide state of the art capabilities including video analytics, video understanding and summarization, text based prompting, zero shot detection and spatio temporal analysis of object movement.
VLM Alert Workflow built with JPS
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
To get started with Jetson Platform Services, follow the quickstart guide to install and setup JPS. Then explore the reference workflows to learn how to use DeepStream, Analytics, Generative AI and more with JPS:
1) Quick Start Guide 2) AI Powered Network Video Recorder 3) Zero Shot Detection with NanoOWL 4) Visual Language Model Alerts
The reference workflows demonstrate how to use the microservices provided in JPS to build full end to end systems on your Jetson.
VLM Alert Workflow Architecture
View the links below to learn more about Jetson Platform Services: VLM Alert Blog JPS Product Page JPS Documentation VLM Alert Demo Video
"},{"location":"tutorial_live-llava.html","title":"Tutorial - Live LLaVA","text":"Recommended
Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for in interactive pipeline editor built from live VLMs.
This multimodal agent runs a vision-language model on a live camera feed or video stream, repeatedly applying the same prompts to it:
It uses models like LLaVA or VILA and has been quantized with 4-bit precision. This runs an optimized multimodal pipeline from the NanoLLM
library, including running the CLIP/SigLIP vision encoder in TensorRT, event filters and alerts, and multimodal RAG (see the NanoVLM page for benchmarks)
"},{"location":"tutorial_live-llava.html#running-the-live-llava-demo","title":"Running the Live Llava Demo","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>10GB
) Follow the chat-based LLaVA and NanoVLM tutorials first.
Supported vision/language models:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GB The VideoQuery agent applies prompts to the incoming video feed with the VLM. Navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera (Chrome is recommended with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled)
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
This uses jetson_utils
for video I/O, and for options related to protocols and file formats, see Camera Streaming and Multimedia. In the example above, it captures a V4L2 USB webcam connected to the Jetson (under the device /dev/video0
) and outputs a WebRTC stream.
"},{"location":"tutorial_live-llava.html#processing-a-video-file-or-stream","title":"Processing a Video File or Stream","text":"The example above was running on a live camera, but you can also read and write a video file or network stream by substituting the path or URL to the --video-input
and --video-output
command-line arguments like this:
jetson-containers run \\\n -v /path/to/your/videos:/mount\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /mount/my_video.mp4 \\\n --video-output /mount/output.mp4 \\\n --prompt \"What does the weather look like?\"\n
This example processes and pre-recorded video (in MP4, MKV, AVI, FLV formats with H.264/H.265 encoding), but it also can input/output live network streams like RTP, RTSP, and WebRTC using Jetson's hardware-accelerated video codecs.
"},{"location":"tutorial_live-llava.html#nanodb-integration","title":"NanoDB Integration","text":"If you launch the VideoQuery agent with the --nanodb
flag along with a path to your NanoDB database, it will perform reverse-image search on the incoming feed against the database by re-using the CLIP embeddings generated by the VLM.
To enable this mode, first follow the NanoDB tutorial to download, index, and test the database. Then launch VideoQuery like this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output \\\n --nanodb /data/nanodb/coco/2017\n
You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:
"},{"location":"tutorial_live-llava.html#video-vila","title":"Video VILA","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. The vision/video.py
example keeps a rolling history of frames:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_live-llava.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_live-llava.html#walkthrough-videos","title":"Walkthrough Videos","text":""},{"location":"tutorial_llamaindex.html","title":"Tutorial - LlamaIndex","text":"Let's use LlamaIndex, to realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents!
What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.5 GB
for llama-index
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_llamaindex.html#how-to-start-a-container-with-samples","title":"How to start a container with samples","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag llama-index:samples)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
You can follow along LlamaIndex_Local-Models_L4T.ipynb
(which is based on the official LlamaIndex tutorial).
"},{"location":"tutorial_llamaspeak.html","title":"Tutorial - llamaspeak","text":"Talk live with Llama using streaming ASR/TTS, and chat about images with Llava!
The NanoLLM
library provides optimized inference for LLM and speech models. It's recommended to run JetPack 6.0 to be able to run the latest containers. The WebChat
agent has responsive conversational abilities and multimodal support for chatting about images with vision/language models, including overlapping ASR/LLM/TTS generation and verbal interruptability.
"},{"location":"tutorial_llamaspeak.html#running-llamaspeak","title":"Running llamaspeak","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>10GB
) Start the Riva server first and test the ASR examples.
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --asr=riva --tts=piper\n
This will start llamaspeak with text LLM and ASR/TTS enabled. You can then navigate your browser to https://IP_ADDRESS:8050
The default port is 8050, but can be changed with --web-port
(and --ws-port
for the websocket port) During bot replies, the TTS model will pause output if you speak a few words in the mic to interrupt it. Request access to the Llama models on HuggingFace and substitute your account's API token above. The code and docs for the WebAgent
that runs llamaspeak can be found in the NanoLLM library. This block diagram shows the speech pipeline with interleaved model generation, user interruption, and streaming I/O:
"},{"location":"tutorial_llamaspeak.html#multimodality","title":"Multimodality","text":"If you load a multimodal vision/language model instead, you can drag images into the chat and ask questions about them:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model Efficient-Large-Model/VILA-7b \\\n --asr=riva --tts=piper\n
For more info about the supported vision/language models, see the NanoVLM page.
"},{"location":"tutorial_llamaspeak.html#function-calling","title":"Function Calling","text":"There's the ability to define functions from Python that the bot has access to and can invoke based on the chat flow:
This works by using the bot_function()
decorator and adding the API description's to the system prompt:
from nano_llm import NanoLLM, ChatHistory, BotFunctions, bot_function\nfrom datetime import datetime\n\n@bot_function\ndef DATE():\n \"\"\" Returns the current date. \"\"\"\n return datetime.now().strftime(\"%A, %B %-m %Y\")\n\n@bot_function\ndef TIME():\n \"\"\" Returns the current time. \"\"\"\n return datetime.now().strftime(\"%-I:%M %p\")\n\nsystem_prompt = \"You are a helpful and friendly AI assistant.\" + BotFunctions.generate_docs()\n
The system prompt can be autogenerated from the Python docstrings embedded in the functions themselves, and can include parameters that the bot can supply (for example, selectively saving relevant user info to a vector database for RAG like is shown in the video).
For more information about this topic, see the Function Calling section of the NanoLLM documentation.
"},{"location":"tutorial_llava.html","title":"Tutorial - LLaVA","text":"LLaVA is a popular multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries. Llava uses the CLIP vision encoder to transform images into the same embedding space as its LLM (which is the same as Llama architecture). Below we cover different methods to run Llava on Jetson, with increasingly optimized performance:
Chat with Llava using text-generation-webui
Run from the terminal with llava.serve.cli
Quantized GGUF models with llama.cpp
Optimized Multimodal Pipeline with NanoVLM
Llava-13B (Jetson AGX Orin) Quantization Tokens/sec Memory text-generation-webui
4-bit (GPTQ) 2.3 9.7 GB llava.serve.cli
FP16 (None) 4.2 27.7 GB llama.cpp
4-bit (Q4_K) 10.1 9.2 GB NanoVLM
4-bit (MLC) 21.1 8.7 GB In addition to Llava, the NanoVLM
pipeline supports VILA and mini vision models that run on Orin Nano as well.
"},{"location":"tutorial_llava.html#1-chat-with-llava-using-text-generation-webui","title":"1. Chat with Llava using text-generation-webui
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for text-generation-webui
container image Space for models CLIP model : 1.7GB
Llava-v1.5-13B-GPTQ model : 7.25GB
Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_llava.html#download-model","title":"Download Model","text":"jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 download-model.py --output=/data/models/text-generation-webui \\\n TheBloke/llava-v1.5-13B-GPTQ\n
"},{"location":"tutorial_llava.html#start-web-ui-with-multimodal-extension","title":"Start Web UI with Multimodal Extension","text":"jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 server.py --listen \\\n --model-dir /data/models/text-generation-webui \\\n --model TheBloke_llava-v1.5-13B-GPTQ \\\n --multimodal-pipeline llava-v1.5-13b \\\n --loader autogptq \\\n --disable_exllama \\\n --verbose\n
Go to Chat tab, drag and drop an image into the Drop Image Here area, and your question in the text area and hit Generate:
"},{"location":"tutorial_llava.html#result","title":"Result","text":""},{"location":"tutorial_llava.html#2-run-from-the-terminal-with-llavaservecli","title":"2. Run from the terminal with llava.serve.cli
","text":"What you need
One of the following Jetson:
Jetson AGX Orin 64GB Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1GB
for llava
container 14GB
for Llava-7B (or 26GB
for Llava-13B) This example uses the upstream Llava repo to run the original, unquantized Llava models from the command-line. It uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging. See the Llava container readme for more info.
"},{"location":"tutorial_llava.html#llava-v15-7b","title":"llava-v1.5-7b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-7b \\\n --image-file /data/images/hoover.jpg\n
"},{"location":"tutorial_llava.html#llava-v15-13b","title":"llava-v1.5-13b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-13b \\\n --image-file /data/images/hoover.jpg\n
Unquantized 13B may run only on Jetson AGX Orin 64GB due to memory requirements.
"},{"location":"tutorial_llava.html#3-quantized-gguf-models-with-llamacpp","title":"3. Quantized GGUF models with llama.cpp
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
llama.cpp is one of the faster LLM API's, and can apply a variety of quantization methods to Llava to reduce its memory usage and runtime. Despite its name, it uses CUDA. There are pre-quantized versions of Llava-1.5 available in GGUF format for 4-bit and 5-bit:
mys/ggml_llava-v1.5-7b mys/ggml_llava-v1.5-13b jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/hoover.jpg \\\n --prompt \"What does the sign say\"'\n
Quantization Bits Response Tokens/sec Memory Q4_K
4 The sign says \"Hoover Dam, Exit 9.\" 10.17 9.2 GB Q5_K
5 The sign says \"Hoover Dam exit 9.\" 9.73 10.4 GB A lower temperature like 0.1 is recommended for better quality (--temp 0.1
), and if you omit --prompt
it will describe the image:
jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/lake.jpg'\n\nIn this image, a small wooden pier extends out into a calm lake, surrounded by tall trees and mountains. The pier seems to be the only access point to the lake. The serene scene includes a few boats scattered across the water, with one near the pier and the others further away. The overall atmosphere suggests a peaceful and tranquil setting, perfect for relaxation and enjoying nature.\n
You can put your own images in the mounted jetson-containers/data
directory. The C++ code for llava-cli can be found here. The llama-cpp-python bindings also support Llava, however they are slower from Python (potentially handling of the tokens)
"},{"location":"tutorial_llava.html#4-optimized-multimodal-pipeline-with-nanovlm","title":"4. Optimized Multimodal Pipeline with NanoVLM
","text":"What's Next
This section got too long and was moved to the NanoVLM page - check it out there for performance optimizations, mini VLMs, and live streaming!
"},{"location":"tutorial_minigpt4.html","title":"Tutorial - MiniGPT-4","text":"Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.8GB
for container image Space for pre-quantized MiniGPT-4 model Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_minigpt4.html#start-minigpt4-container-with-models","title":"Start minigpt4
container with models","text":"To start the MiniGPT4 container and webserver with the recommended models, run this command:
jetson-containers run $(autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 && python3 webui.py \\\n $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \\\n $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'\n
Then, open your web browser and access http://<IP_ADDRESS>:7860
.
"},{"location":"tutorial_minigpt4.html#results","title":"Results","text":""},{"location":"tutorial_mmj.html","title":"Tutorial mmj","text":"17# First steps with Metropolis Microservices for Jetson
NVIDIA Metropolis Microservices for Jetson simplifies the development of vision AI applications, offering a suite of customizable, cloud-native tools. Before diving into this tutorial, ensure you've filled out the Metropolis Microservices for Jetson Early Access form to gain the necessary access to launch the services. This step is crucial as it enables you to utilize all the features and capabilities discussed in this guide.
Perfect for both newcomers and experts, this tutorial provides straightforward steps to kick-start your edge AI projects. Whether you're a student or an ecosystem partner working on a use case, this guide offers a straightforward start for every skill level.
"},{"location":"tutorial_mmj.html#0-install-nvidia-jetson-services","title":"0. Install NVIDIA Jetson Services:","text":"Ok, let's start by installing NVIDIA Jetson Services:
sudo apt install nvidia-jetson-services\n
Let's add some performance hacks that will be needed to run the demo faster and without streaming artifacts:
If you don't have the Orin at max performance, you can use these two commands, a reboot is needed after:
sudo nvpmodel -m 0 \nsudo jetson_clocks\n
After these two commands, a reboot is needed if your Jetson wasn't already in high-performance mode. These are optional, but they fine-tune your network buffers to ensure smoother streaming by optimizing how much data can be sent and received:
sudo sysctl -w net.core.rmem_default=2129920\nsudo sysctl -w net.core.rmem_max=10000000\nsudo sysctl -w net.core.wmem_max=2000000\n
"},{"location":"tutorial_mmj.html#1-download-nvidia-cli-for-jetson","title":"1. Download NVIDIA CLI for Jetson","text":"Download NGC for ARM64 from the NGC for CLI site:
unzip ngccli_arm64.zip\nchmod u+x ngc-cli/ngc\necho \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" >> ~/.bash_profile && source ~/.bash_profile\nngc config set\n
Here it will ask for your API Key, and the organization name, to get those you need to login into NGC and generate an API key here. You should then paste the API key and use the organization name you are using. You can also press [Enter] to select the default values for the remaining options. After this, you should get the message:
Successfully saved NGC configuration to /home/jetson/.ngc/config\n
Then, login with the same API key:
sudo docker login nvcr.io -u \"\\$oauthtoken\" -p <NGC-API-KEY>\n
Now launch the Redis and Ingress services, as we need them for this tutorial.
sudo systemctl start jetson-redis\nsudo systemctl start jetson-ingress\n
"},{"location":"tutorial_mmj.html#2-download-and-launch-nvstreamer","title":"2. Download and launch NVStreamer","text":""},{"location":"tutorial_mmj.html#_1","title":"Tutorial mmj","text":"First, we need to install NVStreamer, an app that streams the videos MMJs will need to run AI on them. Follow this NVStreamer Link (In the top-left, click Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf nvstreamer.tar.gz\ncd nvstreamer\n
Launch it: sudo docker compose -f compose_nvstreamer.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#3-download-ai_nvr-and-launch","title":"3. Download AI_NVR and launch:","text":""},{"location":"tutorial_mmj.html#_2","title":"Tutorial mmj","text":"AI NVR (NGC) Link (Top-left -> Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf ai_nvr.tar.gz\nsudo cp ai_nvr/config/ai-nvr-nginx.conf /opt/nvidia/jetson/services/ingress/config/\ncd ai_nvr\nsudo docker compose -f compose_agx.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#4-download-some-sample-videos-and-upload-them-to-nvstreamer","title":"4. Download some sample videos and upload them to NVStreamer","text":"Download them from here.
unzip files.zip\n
Ok, now, this is important to understand, there are 2 web interfaces: The NVStream Streamer Dashboard, running in: http://localhost:31000 The NVStreamer Camera Management Dashboard, running in: http://localhost:31000 So, first we need to upload the file in the Streamer interface, it looks like this:
There, go to File Upload, and drag and drop the file in the upload squared area.
After uploading it, go to the Dashboad option of the left menu, and copy the RTSP URL of the video you just uploaded, you will need it for the Camera Management Dashboard.
Now jump to the Camera Management Dashboard (http://localhost:30080/vst), it looks like this:
Go to the Camera Management option of the menu, then use the Add device manually option, and paste the RTSP URL, add the name of your video to the Name and Location text boxes, so it will be displayed on top of the stream.
Finally, click in the Live Streams option of the left menu, and you should be able to watch your video stream.
"},{"location":"tutorial_mmj.html#5-watch-rtsp-ai-processed-streaming-from-vlc","title":"5. Watch RTSP AI processed streaming from VLC","text":"Open VLC from another computer (localhost doesn't work here), and point to your Jetson Orin's IP address (you should be in the same network, or not having a firewal to access).
The easiest way to get Jetson's ip is launching:
ifconfig\n
And checking the IP of the interface (usually wlan0, inet IP). Then go to rtsp://[JETSON_IP]:8555/ds-test using VLC like this:
"},{"location":"tutorial_mmj.html#6-android-app","title":"6. Android app","text":"There is an Android app that allows you to track events and create areas of interest to monitor, you can find it on Google Play as AI NVR.
Here is a quick walkthough where you can see how to:
Add the IP address of the Jetson Track current events Add new areas of interest Add tripwire to track the flux and direction of events
"},{"location":"tutorial_nano-llm.html","title":"NanoLLM - Optimized LLM Inference","text":"NanoLLM
is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM\u2019s, multimodality, speech services, vector databases with RAG, and web frontends like Agent Studio.
It provides similar APIs to HuggingFace, backed by highly-optimized inference libraries and quantization tools:
NanoLLM Reference Documentationfrom nano_llm import NanoLLM\n\nmodel = NanoLLM.from_pretrained(\n \"meta-llama/Meta-Llama-3-8B-Instruct\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n api_token='hf_abc123def', # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n\nresponse = model.generate(\"Once upon a time,\", max_new_tokens=128)\n\nfor token in response:\n print(token, end='', flush=True)\n
"},{"location":"tutorial_nano-llm.html#containers","title":"Containers","text":"To test a chat session with Llama from the command-line, install jetson-containers
and run NanoLLM like this:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Llama CLIAgent Studio jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --prompt \"Can you tell me a joke about llamas?\"\n
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
If you haven't already, request access to the Llama models on HuggingFace and substitute your account's API token above.
"},{"location":"tutorial_nano-llm.html#resources","title":"Resources","text":"Here's an index of the various tutorials & examples using NanoLLM on Jetson AI Lab:
Benchmarks Benchmarking results for LLM, SLM, VLM using MLC/TVM backend. API Examples Python code examples for chat, completion, and multimodal. Documentation Reference documentation for the NanoLLM model and agent APIs. Llamaspeak Talk verbally with LLMs using low-latency ASR/TTS speech models. Small LLM (SLM) Focus on language models with reduced footprint (7B params and below) Live LLaVA Realtime live-streaming vision/language models on recurring prompts. Nano VLM Efficient multimodal pipeline with one-shot image tagging and RAG support. Agent Studio Rapidly design and experiment with creating your own automation agents. OpenVLA Robot learning with Vision/Language Action models and manipulation in simulator."},{"location":"tutorial_nano-vlm.html","title":"NanoVLM - Efficient Multimodal Pipeline","text":"We saw in the previous LLaVA tutorial how to run vision-language models through tools like text-generation-webui
and llama.cpp
. In a similar vein to the SLM page on Small Language Models, here we'll explore optimizing VLMs for reduced memory usage and higher performance that reaches interactive levels (like in Liva LLava). These are great for fitting on Orin Nano and increasing the framerate.
There are 3 model families currently supported: Llava, VILA, and Obsidian (mini VLM)
"},{"location":"tutorial_nano-vlm.html#vlm-benchmarks","title":"VLM Benchmarks","text":"This FPS measures the end-to-end pipeline performance for continuous streaming like with Live Llava (on yes/no question)
"},{"location":"tutorial_nano-vlm.html#multimodal-chat","title":"Multimodal Chat","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>10GB
) Supported VLM models in NanoLLM
:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GB The optimized NanoLLM
library uses MLC/TVM for quantization and inference provides the highest performance. It efficiently manages the CLIP embeddings and KV cache. You can find Python code for the chat program used in this example here.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32\n
This starts an interactive console-based chat with Llava, and on the first run the model will automatically be downloaded from HuggingFace and quantized using MLC and W4A16 precision (which can take some time). See here for command-line options.
You'll end up at a >> PROMPT:
in which you can enter the path or URL of an image file, followed by your question about the image. You can follow-up with multiple questions about the same image. Llava does not understand multiple images in the same chat, so when changing images, first reset the chat history by entering clear
or reset
as the prompt. VILA supports multiple images (area of active research)
"},{"location":"tutorial_nano-vlm.html#automated-prompts","title":"Automated Prompts","text":"During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'what does the road sign say?' \\\n --prompt 'what kind of environment is it?' \\\n --prompt 'reset' \\\n --prompt '/data/images/lake.jpg' \\\n --prompt 'please describe the scene.' \\\n --prompt 'are there any hazards to be aware of?'\n
You can also use --prompt /data/prompts/images.json
to run the test sequence, the results of which are in the table below.
"},{"location":"tutorial_nano-vlm.html#results","title":"Results","text":"\u2022 \u00a0 The model responses are with 4-bit quantization enabled, and are truncated to 128 tokens for brevity. \u2022 \u00a0 These chat questions and images are from /data/prompts/images.json
(found in jetson-containers)
"},{"location":"tutorial_nano-vlm.html#json","title":"JSON","text":"When prompted, these models can also output in constrained JSON formats (which the LLaVA authors cover in their LLaVA-1.5 paper), and can be used to programatically query information about the image:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model liuhaotian/llava-v1.5-13b \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'extract any text from the image as json'\n\n{\n \"sign\": \"Hoover Dam\",\n \"exit\": \"2\",\n \"distance\": \"1 1/2 mile\"\n}\n
"},{"location":"tutorial_nano-vlm.html#web-ui","title":"Web UI","text":"To use this through a web browser instead, see the llamaspeak tutorial:
"},{"location":"tutorial_nano-vlm.html#live-streaming","title":"Live Streaming","text":"These models can also be used with the Live Llava agent for continuous streaming - just substitute the desired model name below:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
Then navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera. Using Chrome or Chromium is recommended for a stable WebRTC connection, with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled.
The Live Llava tutorial shows how to enable additional features like vector database integration, image tagging, and RAG.
"},{"location":"tutorial_nano-vlm.html#video-sequences","title":"Video Sequences","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model. The vision/video.py
example shows how to use this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_nano-vlm.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_nanodb.html","title":"Tutorial - NanoDB","text":"Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
16GB
for container image 40GB
for MS COCO dataset Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_nanodb.html#how-to-start","title":"How to start","text":""},{"location":"tutorial_nanodb.html#download-coco","title":"Download COCO","text":"Just for an example, let's use MS COCO dataset:
cd jetson-containers\nmkdir -p data/datasets/coco/2017\ncd data/datasets/coco/2017\n\nwget http://images.cocodataset.org/zips/train2017.zip\nwget http://images.cocodataset.org/zips/val2017.zip\nwget http://images.cocodataset.org/zips/unlabeled2017.zip\n\nunzip train2017.zip\nunzip val2017.zip\nunzip unlabeled2017.zip\n
"},{"location":"tutorial_nanodb.html#download-index","title":"Download Index","text":"You can download a pre-indexed NanoDB that was already prepared over the COCO dataset from here:
cd jetson-containers/data\nwget https://nvidia.box.com/shared/static/icw8qhgioyj4qsk832r4nj2p9olsxoci.gz -O nanodb_coco_2017.tar.gz\ntar -xzvf nanodb_coco_2017.tar.gz\n
This allow you to skip the indexing process in the next step, and jump to starting the Web UI.
"},{"location":"tutorial_nanodb.html#indexing-data","title":"Indexing Data","text":"If you didn't download the NanoDB index for COCO from above, we need to build the index by scanning your dataset directory:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --scan /data/datasets/coco/2017 \\\n --path /data/nanodb/coco/2017 \\\n --autosave --validate \n
This will take a few hours on AGX Orin. Once the database has loaded and completed any start-up operations , it will drop down to a >
prompt from which the user can run search queries. You can quickly check the operation by typing your query on this prompt:
> a girl riding a horse\n\n* index=80110 /data/datasets/coco/2017/train2017/000000393735.jpg similarity=0.29991915822029114\n* index=158747 /data/datasets/coco/2017/unlabeled2017/000000189708.jpg similarity=0.29254037141799927\n* index=123846 /data/datasets/coco/2017/unlabeled2017/000000026239.jpg similarity=0.292171448469162\n* index=127338 /data/datasets/coco/2017/unlabeled2017/000000042508.jpg similarity=0.29118549823760986\n* index=77416 /data/datasets/coco/2017/train2017/000000380634.jpg similarity=0.28964102268218994\n* index=51992 /data/datasets/coco/2017/train2017/000000256290.jpg similarity=0.28929752111434937\n* index=228640 /data/datasets/coco/2017/unlabeled2017/000000520381.jpg similarity=0.28642547130584717\n* index=104819 /data/datasets/coco/2017/train2017/000000515895.jpg similarity=0.285491943359375\n
You can press Ctrl+C to exit. For more info about the various options available, see the NanoDB container documentation.
"},{"location":"tutorial_nanodb.html#interactive-web-ui","title":"Interactive Web UI","text":"Spin up the Gradio server:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --path /data/nanodb/coco/2017 \\\n --server --port=7860\n
Then navigate your browser to http://<IP_ADDRESS>:7860
, and you can enter text search queries as well as drag/upload images:
To use the dark theme, navigate to http://<IP_ADDRESS>:7860/?__theme=dark
instead"},{"location":"tutorial_ollama.html","title":"Tutorial - Ollama","text":"
Ollama is a popular open-source tool that allows users to easily run a large language models (LLMs) locally on their own computer, serving as an accessible entry point to LLMs for many.
It now offers out-of-the-box support for the Jetson platform with CUDA support, enabling Jetson users to seamlessly install Ollama with a single command and start using it immediately.
In this tutorial, we introduce two installation methods: (1) the default native installation using the official Ollama installer, and (2) the Docker container method, which allows users to avoid making changes to their existing system.
The ollama
client can run inside or outside container after starting the server. You can also run an Open WebUI server for supporting web clients. Supports the latest models like Llama-3 and Phi-3 Mini! "},{"location":"tutorial_ollama.html#ollama-server","title":"Ollama Server","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7GB
for ollama
container image Space for models (>5GB
) "},{"location":"tutorial_ollama.html#1-native-install","title":"(1) Native Install","text":"Ollama's official installer already support Jetson and can easily install CUDA-supporting Ollama.
curl -fsSL https://ollama.com/install.sh | sh \n
It create a service to run ollama serve
on start up, so you can start using ollama
command right away.
"},{"location":"tutorial_ollama.html#example-ollama-usage","title":"Example: Ollama usage","text":"ollama\n
"},{"location":"tutorial_ollama.html#example-run-a-model-on-cli","title":"Example: run a model on CLI","text":"ollama run llama3.2:3b\n
"},{"location":"tutorial_ollama.html#2-docker-container-for-ollama-using-jetson-containers","title":"(2) Docker container for ollama
using jetson-containers
","text":"# models cached under jetson-containers/data\njetson-containers run --name ollama $(autotag ollama)\n\n# models cached under your user's home directory\ndocker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0\n
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted jetson-containers/data/models/ollama
directory (or another directory that you override with OLLAMA_MODELS
)
Start the Ollama command-line chat client with your desired model (for example: llama3
, phi3
, mistral
)
# if running inside the same container as launched above\n/bin/ollama run phi3\n\n# if launching a new container for the client in another terminal\njetson-containers run $(autotag ollama) /bin/ollama run phi3\n
Or you can install Ollama's binaries for arm64 outside of container (without CUDA, which only the server needs)
# download the latest ollama release for arm64 into /bin\nsudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort=\"version:refname\" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama\nsudo chmod +x /bin/ollama\n\n# use the client like normal outside container\n/bin/ollama run phi3\n
"},{"location":"tutorial_ollama.html#open-webui","title":"Open WebUI","text":"To run an Open WebUI server for client browsers to connect to, use the open-webui
container:
docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main\n
You can then navigate your browser to http://JETSON_IP:8080
, and create a fake account to login (these credentials are only local)
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the Llava page. It gets roughly half of peak performance versus the faster APIs like NanoLLM, but is generally considered fast enough for text chat.
"},{"location":"tutorial_openwebui.html","title":"Tutorial - Open WebUI","text":"Open WebUI is a versatile, browser-based interface for running and managing large language models (LLMs) locally, offering Jetson developers an intuitive platform to experiment with LLMs on their devices.
It can work with Ollama as a backend as well as other backend that is compatible with OpenAI, which can also run well on Jetson.
"},{"location":"tutorial_openwebui.html#ollama-server","title":"Ollama Server","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7GB
for open-webui
container image sudo docker run -d --network=host \\\n -v ${HOME}/open-webui:/app/backend/data \\\n -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \\\n --name open-webui \\\n --restart always \\ \n ghcr.io/open-webui/open-webui:main\n
"},{"location":"tutorial_openwebui.html#ollama-backend","title":"Ollama backend","text":"If you have installed Ollama, you can just run the Open WebUI docker container without installing any other things.
sudo docker run -d --network=host \\\n -v ${HOME}/open-webui:/app/backend/data \\\n -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \\\n --name open-webui \\\n --restart always \\ \n ghcr.io/open-webui/open-webui:main\n
"},{"location":"tutorial_slm.html","title":"Tutorial - Small Language Models (SLM)","text":"Small Language Models (SLMs) represent a growing class of language models that have <7B parameters - for example StableLM, Phi-2, and Gemma-2B. Their smaller memory footprint and faster performance make them good candidates for deploying on Jetson Orin Nano. Some are very capable with abilities at a similar level as the larger models, having been trained on high-quality curated datasets.
This tutorial shows how to run optimized SLMs with quantization using the NanoLLM
library and MLC/TVM backend. You can run these models through tools like text-generation-webui
and llama.cpp as well, just not as fast - and since the focus of SLMs is reduced computational and memory requirements, here we'll use the most optimized path available. Those shown below have been profiled:
"},{"location":"tutorial_slm.html#slm-benchmarks","title":"SLM Benchmarks","text":"\u2022 \u00a0 The HuggingFace Open LLM Leaderboard is a collection of multitask benchmarks including reasoning & comprehension, math, coding, history, geography, ect. \u2022 \u00a0 The model's memory footprint includes 4-bit weights and KV cache at full context length (factor in extra for process overhead, library code, ect) \u2022 \u00a0 The Chat Model
is the instruction-tuned variant for chatting with in the commands below, as opposed to the base completion model.
Based on user interactions, the recommended models to try are stabilityai/stablelm-zephyr-3b
and princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT
, for having output quality on par with Llama-2-7B and well-optimized neural architectures. These models have also been used as the base for various fine-tunes (for example Nous-Capybara-3B-V1.9
) and mini VLMs. Others may not be particularly coherent.
"},{"location":"tutorial_slm.html#chatting-with-slms","title":"Chatting with SLMs","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>5GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
The nano_llm.chat
program will automatically download and quantize models from HuggingFace like those listed in the table above:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT\n
\u2022 \u00a0 For models requiring authentication, use --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN>
\u2022 \u00a0 Press Ctrl+C twice in succession to exit (once will interrupt bot output)
This will enter into interactive mode where you chat back and forth using the keyboard (entering reset
will clear the chat history)
"},{"location":"tutorial_slm.html#automated-prompts","title":"Automated Prompts","text":"During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model stabilityai/stablelm-zephyr-3b \\\n --max-new-tokens 512 \\\n --prompt 'hi, how are you?' \\\n --prompt 'whats the square root of 900?' \\\n --prompt 'can I get a recipie for french onion soup?'\n
You can also load JSON files containing prompt sequences, like with --prompt /data/prompts/qa.json
(the output of which is below)
"},{"location":"tutorial_slm.html#results","title":"Results","text":"\u2022 \u00a0 The model responses are with 4-bit quantization, and are truncated to 256 tokens for brevity. \u2022 \u00a0 These chat questions are from /data/prompts/qa.json
(found in jetson-containers)
"},{"location":"tutorial_slm.html#nemotron-mini","title":"Nemotron Mini","text":"Nemotron-Mini-4B-Instruct is a 4B SLM tuned for on-device deployment, RAG, and function calling and is based on Minitron-4B (pruned and distilled from Nemotron4 15B). Inference on Jetson is available through HuggingFace Transformers and llama.cpp for quantization. Here's how to run a local OpenAI-compatible server with llama.cpp and 4-bit quantized GGUF:
jetson-containers run $(autotag llama_cpp) \\\n llama-server \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --host 0.0.0.0 \\\n --port 8080\n
For a quick test, you can navigate your browser to http://JETSON_IP:8080
, connect other clients like Open WebUI, or have applications send requests to your server's OpenAI chat completion endpoints (i.e. from openai-python, REST, JavaScript, ect)
You can more easily see the performance with the llama-cli
tool:
jetson-containers run $(autotag llama_cpp) \\\n llama-cli \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --ignore-eos \\\n -n 128 \\\n -p \"The meaning to life and the universe is\"\n
# Jetson AGX Orin\nllama_print_timings: load time = 1408.27 ms\nllama_print_timings: sample time = 70.05 ms / 128 runs ( 0.55 ms per token, 1827.32 tokens per second)\nllama_print_timings: prompt eval time = 120.08 ms / 9 tokens ( 13.34 ms per token, 74.95 tokens per second)\nllama_print_timings: eval time = 3303.93 ms / 127 runs ( 26.02 ms per token, 38.44 tokens per second)\nllama_print_timings: total time = 3597.17 ms / 136 tokens\n
The model can also be previewed at build.nvidia.com (example client requests for OpenAI API are also there)
"},{"location":"tutorial_slm.html#llama-32","title":"Llama 3.2","text":"Meta has released multilingual 1B and 3B SLMs in the latest additions to the Llama family with Llama-3.2-1B
and Llama-3.2-3B
. These can be run with INT4 quantization using the latest MLC container for Jetson (dustynv/mlc:0.1.2-r36.3.0
). After having requested access to the models from Meta with your HuggingFace API key, you can download, quantize, and benchmark them with these commands:
HUGGINGFACE_KEY=YOUR_API_KEY \\\nMLC_VERSION=0.1.2 \\\n jetson-containers/packages/llm/mlc/benchmark.sh \\\n meta-llama/Llama-3.2-1B\n
Llama-3.2-1B
\u00a0 Jetson Orin Nano 54.8 tokens/sec, Jetson AGX Orin 163.9 tokens/sec Llama-3.2-3B
\u00a0 Jetson Orin Nano 27.7 tokens/sec, Jetson AGX Orin 80.4 tokens/sec The Llama-3.2 SLMs use the same core Llama architecture as previous Llama releases (except tie_word_embeddings=True
), so it is already supported with quantization and full performance on edge devices. Thanks to Meta for continuing to advance open generative AI models with Llama.
"},{"location":"tutorial_stable-diffusion-xl.html","title":"Tutorial - Stable Diffusion XL","text":"Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities. All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.
What you need
One of the following Jetson devices (SDXL requires >= ~13GB memory)
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image 12.4GB
for SDXL models Have followed the previous stable-diffusion-webui
tutorial and have the webserver container running.
"},{"location":"tutorial_stable-diffusion-xl.html#downloading-sdxl-models","title":"Downloading SDXL Models","text":"Stable Diffusion XL is supported through AUTOMATIC1111's stable-diffusion-webui
with some additional settings. First you need to download the SDXL models to your jetson-containers
data directory (which is automatically mounted into the container)
# run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device\nCONTAINERS_DIR=/path/to/your/jetson-containers\nMODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/\n\nsudo chown -R $USER $MODEL_DIR\n\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors\n
It's assumed that you already have the stable-diffusion-webui
container and webserver running from the previous tutorial.
"},{"location":"tutorial_stable-diffusion-xl.html#sdxl-settings","title":"SDXL Settings","text":"After the models have finished downloading, click the \ud83d\udd04 button to refresh the model list in the web UI. Select sd_xl_base_1.0.safetensors
from the Stable Diffusion checkpoint drop-down:
Then under the Generation tab, expand the Refiner section, and select sd_xl_refiner_1.0.safetensors
from the drop-down:
Guidance on other relevant settings:
Change the width/height to 1024x1024. SDXL works best at higher resolutions, and using 512x512 often results in more simplistic/cartoonish content. Changing image resolutions impacts the actual scene contents of the image, not just the details. The refiner's Switch at
setting dictates the step at which the refiner takes over from the base model. At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image. Typical Sampling steps
are between 20-40 and Switch at
is between 0.7-0.9. This takes experimentation to find the best combination for the characteristics of your desired output. Extensive negative prompts are not as necessary in the same way as SD 1.5 was (e.g. low quality, jpeg artifacts, blurry, ect
) CFG Scale
controls how closely the model conforms to your prompt versus how creative it is. When you get a good image, remember to save your random seed and settings so you can re-generate it later!
"},{"location":"tutorial_stable-diffusion-xl.html#results","title":"Results","text":" photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)
a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)
small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)
small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
"},{"location":"tutorial_stable-diffusion.html","title":"Tutorial - Stable Diffusion","text":"Let's run AUTOMATIC1111's stable-diffusion-webui
on NVIDIA Jetson to generate images from our prompts!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image 4.1GB
for SD 1.5 model "},{"location":"tutorial_stable-diffusion.html#setup-a-container-for-stable-diffusion-webui","title":"Setup a container for stable-diffusion-webui","text":"The jetson-containers project provides pre-built Docker images for stable-diffusion-webui
. You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
"},{"location":"tutorial_stable-diffusion.html#how-to-start","title":"How to start","text":"Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag stable-diffusion-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/stable-diffusion-webui && python3 launch.py \\\n --data=/data/models/stable-diffusion \\\n --enable-insecure-extension-access \\\n --xformers \\\n --listen \\\n --port=7860\n
You should see it downloading the model checkpoint on the first run.
Open your browser and access http://<IP_ADDRESS>:7860
"},{"location":"tutorial_stable-diffusion.html#results-output-examples","title":"Results / Output Examples","text":""},{"location":"tutorial_stable-diffusion.html#stable-diffusion-xl","title":"Stable Diffusion XL","text":"To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
"},{"location":"tutorial_text-generation.html","title":"Tutorial - text-generation-webui","text":"Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui
on NVIDIA Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for container image Spaces for models "},{"location":"tutorial_text-generation.html#set-up-a-container-for-text-generation-webui","title":"Set up a container for text-generation-webui","text":"The jetson-containers project provides pre-built Docker images for text-generation-webui
along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect). You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
"},{"location":"tutorial_text-generation.html#how-to-start","title":"How to start","text":"Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag text-generation-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/text-generation-webui && python3 server.py \\\n --model-dir=/data/models/text-generation-webui \\\n --chat \\\n --listen\n
Open your browser and access http://<IP_ADDRESS>:7860
.
"},{"location":"tutorial_text-generation.html#download-a-model-on-web-ui","title":"Download a model on web UI","text":"See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using download-model.py
jetson-containers run --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \\\n 'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'\n
From within the web UI, select Model tab and navigate to \"Download model or LoRA\" section.
You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub). Then click the Download button.
"},{"location":"tutorial_text-generation.html#gguf-models","title":"GGUF models","text":"The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.
You can download a single model file for a particular quantization, like *.Q4_K_M.bin
. Input the file name and hit Download button.
Model Quantization Memory (MB) TheBloke/Llama-2-7b-Chat-GGUF
llama-2-7b-chat.Q4_K_M.gguf
5,268 TheBloke/Llama-2-13B-chat-GGUF
llama-2-13b-chat.Q4_K_M.gguf
8,609 TheBloke/LLaMA-30b-GGUF
llama-30b.Q4_K_S.gguf
19,045 TheBloke/Llama-2-70B-chat-GGUF
llama-2-70b-chat.Q4_K_M.gguf
37,655
Info
"},{"location":"tutorial_text-generation.html#model-selection-for-jetson-orin-nano","title":"Model selection for Jetson Orin Nano","text":"Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below. The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano. Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.
"},{"location":"tutorial_text-generation.html#load-a-model","title":"Load a model","text":"After you have downloaded a model, click the \ud83d\udd04 button to refresh your model list, and select the model you want to use.
For a GGUF model, remember to
Set n-gpu-layers
to 128
Set n_gqa
to 8
if you using Llama-2-70B (on Jetson AGX Orin 64GB) Then click the Load button.
"},{"location":"tutorial_text-generation.html#chat-template","title":"Chat Template","text":"If you're using a Llama model fine-tuned for chat, like the models listed above (except for LLaMA-30b
), you need to use the oobabooga Instruct mode and set the template. On the Parameters tab, go to the Instruction Template sub-tab, then select Llama-v2
from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)
For the base text completion models (like LLaMA-30b
), use the Default or Notebook tab.
Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior. There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like temperature
and top_p
for controlling randomness.
Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode). Then you can start chatting with the LLM!
"},{"location":"tutorial_text-generation.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html#things-to-do-with-your-llm","title":"Things to do with your LLM","text":"Here are some common test prompts for coding, math, history ect. You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else. Also Llama-2 is quite playful and likes to play games to test its logic abilities!
>> What games do you like to play?\n\nI'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:\n\n- 20 Questions: You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions. The goal is to guess the object in as few questions as possible.\n- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!\n- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!\n- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!\n- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own\n- Jokes: I love to tell jokes and make people laugh with my \"Dad humor\"! Knock knock! *giggles*\n
"},{"location":"tutorial_text-generation.html#model-size-tested","title":"Model size tested","text":"With llama.cpp, GGUF model, 4-bit quantization:
Model size Jetson AGX Orin 64GB Jetson AGX Orin 32GB Jetson Orin Nano 8GB 70B model \u2705 30B model \u2705 \u2705 13B model \u2705 \u2705 7B model \u2705 \u2705 \u2705 Want to explore using Python APIs to run LLMs directly? See jetson-containers
for its LLM related packages and containers.
Limited to 7B model (4-bit quantized).\u00a0\u21a9
"},{"location":"tutorial_ultralytics.html","title":"Tutorial - Ultralytics YOLOv8","text":"Let's run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT.
Ultralytics YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Nano (4GB)
Running one of the following versions of JetPack:
JetPack 4 (L4T r32.x) JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
"},{"location":"tutorial_ultralytics.html#how-to-start","title":"How to start","text":"Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.
JetPack 4JetPack 5JetPack 6 t=ultralytics/ultralytics:latest-jetson-jetpack4\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack5\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack6\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
"},{"location":"tutorial_ultralytics.html#convert-model-to-tensorrt-and-run-inference","title":"Convert model to TensorRT and run inference","text":"The YOLOv8n model in PyTorch format is converted to TensorRT to run inference with the exported model.
Example
PythonCLI from ultralytics import YOLO\n\n# Load a YOLOv8n PyTorch model\nmodel = YOLO(\"yolov8n.pt\")\n\n# Export the model\nmodel.export(format=\"engine\") # creates 'yolov8n.engine'\n\n# Load the exported TensorRT model\ntrt_model = YOLO(\"yolov8n.engine\")\n\n# Run inference\nresults = trt_model(\"https://ultralytics.com/images/bus.jpg\")\n
# Export a YOLOv8n PyTorch model to TensorRT format\nyolo export model=yolov8n.pt format=engine # creates 'yolov8n.engine'\n\n# Run inference with the exported model\nyolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'\n
Manufacturing Sports Wildlife Vehicle Spare Parts Detection Football Player Detection Tiger pose Detection Note
Visit the Export page to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when dynamic=False
. To change the input source for inference, please refer to Model Prediction page.
"},{"location":"tutorial_ultralytics.html#benchmarks","title":"Benchmarks","text":"Benchmarks of the YOLOv8 variants with TensorRT were run by Seeed Studio on their reComputer systems:
Xavier NX 8GBOrin NX 16GBAGX Orin 32GB Model PyTorch FP32 FP16 INT8 YOLOv8n 32 63 120 167 YOLOv8s 25 26 69 112 YOLOv8m 11 11 33 56 YOLOv8l 6 6 20 38 Model PyTorch FP32 FP16 INT8 YOLOv8n 56 115 204 256 YOLOv8s 53 67 128 196 YOLOv8m 26 31 63 93 YOLOv8l 16 20 42 69 Model PyTorch FP32 FP16 INT8 YOLOv8n 77 192 323 385 YOLOv8s 67 119 213 303 YOLOv8m 40 56 105 145 YOLOv8l 27 38 73.5 114 FP32/FP16/INT8 with TensorRT (frames per second) Original post with the benchmarks are found here "},{"location":"tutorial_ultralytics.html#further-reading","title":"Further reading","text":"To learn more, visit our comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson including benchmarks!
Note
Ultralytics YOLOv8 models are offered under AGPL-3.0 License
which is an OSI-approved open-source license and is ideal for students and enthusiasts, promoting open collaboration and knowledge sharing. See the LICENSE file for more details.
"},{"location":"tutorial_ultralytics.html#one-click-run-ultralytics-yolo-on-jetson-orin-by-seeed-studio-jetson-examples","title":"One-Click Run Ultralytics YOLO on Jetson Orin - by Seeed Studio jetson-examples","text":""},{"location":"tutorial_ultralytics.html#quickstart","title":"Quickstart \u26a1","text":" Install the package:
pip install jetson-examples\n
Restart your reComputer:
sudo reboot\n
Run Ultralytics YOLO on Jetson with one command:
reComputer run ultralytics-yolo\n
Enter http://127.0.0.1:5001
or http://device_ip:5001
in your browser to access the Web UI.
For more details, please read: Jetson-Example: Run Ultralytics YOLO Platform Service on NVIDIA Jetson Orin .
"},{"location":"tutorial_ultralytics.html#follow-ultralytics-to-stay-updated","title":"Follow Ultralytics to stay updated!","text":""},{"location":"tutorial_voicecraft.html","title":"Tutorial - VoiceCraft","text":"Let's run VoiceCraft, a Zero-Shot Speech Editing and Text-to-Speech in the Wild!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
15.6 GB
for voicecraft
container image Space for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_voicecraft.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag voicecraft)\n
The container has a default run command (CMD
) that will automatically start the Gradio app.
Open your browser and access http://<IP_ADDRESS>:7860
.
"},{"location":"tutorial_voicecraft.html#gradio-app","title":"Gradio app","text":"VoiceCraft repo comes with Gradio demo app.
Select which models you want to use, I recommend using 330M_TTSEnhanced on 32GB AGX Orin Click load, if you run it for the first time, models are downloaded from huggingface, otherwise are loaded from /data
folder, where are saved to from previous runs Upload audio file of your choice (MP3/wav) Click transcribe, it will use whisper to get transcription along with start/end time of each word spoken Now you can edit the sentence, or use TTS. Click Run to generate output.
Warning
For TTS it's okay to use only first few seconds of audio as prompt, since it consumes a lot of memory. On AGX 32GB Orin the maximal TTS length of generated audio is around ~16 seconds in headless mode.
"},{"location":"tutorial_voicecraft.html#resources","title":"Resources","text":"If you want to know how it works under the hood, you can read following papers:
VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild High Fidelity Neural Audio Compression Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers "},{"location":"tutorial_whisper.html","title":"Tutorial - Whisper","text":"Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1 GB
for whisper
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_whisper.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag whisper)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server, with SSL enabled.
Open your browser and access https://<IP_ADDRESS>:8888
.
Attention
Note it is https
(not http
).
HTTPS (SSL) connection is needed to allow ipywebrtc
widget to have access to your microphone (for record-and-transcribe.ipynb
).
You will see a warning message like this.
Press \"Advanced\" button and then click on \"Proceed to (unsafe)\" link to proceed to the Jupyter Lab web interface.
The default password for Jupyter Lab is nvidia
.
"},{"location":"tutorial_whisper.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"Whisper repo comes with demo Jupyter notebooks, which you can find under /notebooks/
directory.
jetson-containers
also adds one convenient notebook (record-and-transcribe.ipynb
) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.
"},{"location":"tutorial_whisper.html#record-and-transcribeipynb","title":"record-and-transcribe.ipynb
","text":"This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's medium
model to transcribe the audio sample.
It uses Jupyter notebook/lab's ipywebrtc
extension to record an audio sample on your web browser.
Attention
When you click the \u23fa botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.
Final check Once done, if you click on the \"\u26a0 Not secure\" part in the URL bar, you should see something like this.
"},{"location":"tutorial_whisper.html#result","title":"Result","text":"Once you go through all the steps, you should see the transcribe result in text like this.
"},{"location":"vit/index.html","title":"Vision Transformers & Computer Vision","text":"You can quickly run these realtime Vision Transformers (ViT) and computer vision models onboard your Jetson:
"},{"location":"vit/index.html#efficient-vit","title":"Efficient ViT","text":""},{"location":"vit/index.html#nanosam","title":"NanoSAM","text":""},{"location":"vit/index.html#nanoowl","title":"NanoOWL","text":""},{"location":"vit/index.html#sam","title":"SAM","text":""},{"location":"vit/index.html#tam","title":"TAM","text":""},{"location":"vit/index.html#ultralytics-yolov8","title":"Ultralytics YOLOv8","text":""},{"location":"vit/tutorial_efficientvit.html","title":"Tutorial - EfficientViT","text":"Let's run MIT Han Lab's EfficientViT on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
10.9 GB
for efficientvit
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_efficientvit.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag efficientvit)\n
"},{"location":"vit/tutorial_efficientvit.html#usage-of-efficientvit","title":"Usage of EfficientViT","text":"The official EfficientViT repo shows the complete usage information: https://github.com/mit-han-lab/efficientvit#usage
"},{"location":"vit/tutorial_efficientvit.html#run-examplebenchmark","title":"Run example/benchmark","text":"Inside the container, a small benchmark script benchmark.py
is added under /opt/efficientvit
directory by the jetson-container build process.
It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.
"},{"location":"vit/tutorial_efficientvit.html#download-l2pt-model","title":"Download l2.pt
model","text":"mkdir -p /data/models/efficientvit/sam/\ncd /data/models/efficientvit/sam/\nwget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt\n
The downloaded checkpoint file is stored on the /data/
directory that is mounted from the Docker host.
"},{"location":"vit/tutorial_efficientvit.html#run-benchmark-script","title":"Run benchmark script","text":"cd /opt/efficientvit\npython3 ./benchmark.py\n
At the end you should see a summary like the following.
AVERAGE of 2 runs:\n encoder --- 0.062 sec\n latency --- 0.083 sec\nMemory consumption : 3419.68 MB\n
"},{"location":"vit/tutorial_efficientvit.html#check-the-outputresult","title":"Check the output/result","text":"The output image file (of the last inference result) is stored as /data/benchmarks/efficientvit_sam_demo.png
.
It is stored under /data/
directory that is mounted from the Docker host. So you can go back to your host machine, and check jetson-containers/data/benchmark/
directory.
You should find the output like this.
"},{"location":"vit/tutorial_nanoowl.html","title":"Tutorial - NanoOWL","text":"Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7.2 GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_nanoowl.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl)\n
"},{"location":"vit/tutorial_nanoowl.html#how-to-run-the-tree-prediction-live-camera-example","title":"How to run the tree prediction (live camera) example","text":" Ensure you have a camera device connected
ls /dev/video*\n
If no video device is found, exit from the container and check if you can see a video device on the host side.
Launch the demo
cd examples/tree_demo\npython3 tree_demo.py ../../data/owl_image_encoder_patch32.engine\n
Info
If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.
python3 -m nanoowl.build_image_encoder_engine \\\n data/owl_image_encoder_patch32.engine\n
Second, open your browser to http://<ip address>:7860
Type whatever prompt you like to see what works!
Here are some examples
Example: [a face [a nose, an eye, a mouth]]
Example: [a face (interested, yawning / bored)]
Example: (indoors, outdoors)
"},{"location":"vit/tutorial_nanoowl.html#result","title":"Result","text":""},{"location":"vit/tutorial_nanosam.html","title":"Tutorial - NanoSAM","text":"Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.3GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_nanosam.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nanosam)\n
"},{"location":"vit/tutorial_nanosam.html#run-examples","title":"Run examples","text":"Inside the container, you can move to /opt/nanosam
directory, to go through all the examples demonstrated on the repo.
cd /opt/nanosam\n
To run the \"Example 1 - Segment with bounding box\":
python3 examples/basic_usage.py \\\n --image_encoder=\"data/resnet18_image_encoder.engine\" \\\n --mask_decoder=\"data/mobile_sam_mask_decoder.engine\"\n
The result is saved under /opt/nanosam/data/basic_usage_out.jpg
.
To check on your host machine, you can copy that into /data
directory of the container where that is mounted from the host.
cp data/basic_usage_out.jpg /data/\n
Then you can go to your host system, and find the file under jetson-containers/data/basic_usage_out.jpg
"},{"location":"vit/tutorial_nanosam.html#results","title":"Results","text":""},{"location":"vit/tutorial_sam.html","title":"Tutorial - SAM (Segment Anything)","text":"Let's run Meta's SAM
on NVIDIA Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_sam.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag sam)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
"},{"location":"vit/tutorial_sam.html#run-jupyter-notebook","title":"Run Jupyter notebook","text":"In Jupyter Lab, navigate to notebooks
and open automatic_mask_generator_example.py
notebook.
Create a new cell at the top, insert the model download command below and run the cell.
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\n
Then go through executing all the cells below Set-up.
"},{"location":"vit/tutorial_sam.html#results","title":"Results","text":" The biggest vit_h
(2.4GB) model may not ran due to OOM, but vit_l
(1.1GB) runs on Jetson Orin Nano.\u00a0\u21a9
"},{"location":"vit/tutorial_tam.html","title":"Tutorial - SAM (Segment Anything)","text":"Let's run TAM
to perform Segment Anything on videos on NVIDIA Jetson.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_tam.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag tam)\n
The container has a default run command (CMD
) that will automatically start TAM's web server.
Open your browser and access http://<IP_ADDRESS>:12212
.
"},{"location":"vit/tutorial_tam.html#tam-web-ui","title":"TAM web UI","text":"Check out the official tutorial to learn how to operate the web UI.
"},{"location":"vit/tutorial_tam.html#results","title":"Results","text":""},{"location":"vit/tutorial_tam.html#troubleshooting","title":"Troubleshooting","text":""},{"location":"vit/tutorial_tam.html#filenotfounderror-errno-2-no-such-file-or-directory-checkpointse2fgvi-hq-cvpr22pth","title":"FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/E2FGVI-HQ-CVPR22.pth'
","text":"You may find the TAM app fails to download a checkpoint file E2FGVI-HQ-CVPR22.pth
.
Downloading checkpoints from Google Drive... tips: If you cannot see the progress bar, please try to download it manuall and put it in the checkpointes directory. E2FGVI-HQ-CVPR22.pth: https://github.com/MCG-NKU/E2FGVI(E2FGVI-HQ model)\nAccess denied with the following error:\n\n Cannot retrieve the public link of the file. You may need to change\n the permission to 'Anyone with the link', or have had many accesses. \n\nYou may still be able to access the file from the browser:\n\n https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \n
You can manually download the checkpoint file on your Docker host machine.
cd jetson-containers/\npip install gdown\nsource ~/.profile\ngdown https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \nmv E2FGVI-HQ-CVPR22.pth ./data/models/tam/\n
And you can try running the TAM container.
jetson-containers run $(autotag tam)\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"agent_studio.html","title":"Agent Studio","text":"Rapidly design and experiment with creating your own automation agents, personal assistants, and edge AI systems in an interactive sandbox for connecting multimodal LLMs, speech and vision transformers, vector databases, prompt templating, and function calling to live sensors and I/O. Optimized for deployment onboard Jetson with on-device compute, low-latency streaming, and unified memory.
\u2022 Edge LLM inference with quantization and KV caching (NanoLLM) \u2022 Realtime vision/language models (ala Live Llava and Video VILA) \u2022 Speech recognition and synthesis (Whisper ASR, Piper TTS, Riva) \u2022 Multimodal vector database from NanoDB \u2022 Audio and video streaming (WebRTC, RTP, RTSP, V4L2) \u2022 Performance monitoring and profiling \u2022 Native bot-callable functions and agent tools \u2022 Extensible plugins with auto-generated UI controls \u2022 Save, load, and export pipeline presets \u2022 Native bot-callable functions and agent tools"},{"location":"agent_studio.html#running-agent-studio","title":"Running Agent Studio","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>5GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
This will start the server running on your device. You can then navigate your browser to https://IP_ADDRESS:8050
You can load a preset at startup with the --load
flag (like --load /data/nano_llm/presets/xyz.json
) The default port is 8050, but can be changed with --web-port
(and --ws-port
for the websocket port) Request access to the Llama models on HuggingFace and substitute your account's API token above. "},{"location":"agent_studio.html#dev-mode","title":"Dev Mode","text":"To make code changes without rebuilding the container, clone the NanoLLM sources and then mount them over /opt/NanoLLM
git clone https://github.com/dusty-nv/NanoLLM\njetson-containers run \\\n -v ${PWD}/NanoLLM:/opt/NanoLLM \\\n $(autotag nano_llm)\n
You can then edit the source from outside the container. And in the terminal that starts, you can install other packages from apt/pip/ect.
"},{"location":"agent_studio.html#plugins","title":"Plugins","text":"The models, I/O, and logic blocks share a lightweight plugin interface, exposing a process()
function and any runtime properties:
Custom Pluginfrom nano_llm import Plugin\n\nclass TextFilter(Plugin):\n def __init__(self, search: str='bad word', replace: str='censored', **kwargs):\n \"\"\"\n A simple plugin performing text replacement. Write docs because they're extracted for the UI,\n and for the tool descriptions that go to the bot for function calling (and use the type hints) \n\n Args:\n search: The string to look for an replace in incoming text.\n replace: What to replace occurances of the search string with.\n \"\"\"\n super().__init__(inputs=['text'], outputs=['text'])\n\n # set these attributes and register them as configurable\n self.add_parameters(search=search, replace=replace)\n\n def process(self, input, **kwargs):\n \"\"\" Substitute the text, and the return value will be sent to the plugins connected to this. \"\"\"\n return input.replace(self.search, self.replace)\n
Typically each plugin runs asynchronously its own thread and queue of data, although some lightweight routines are executed inline. Usually they're in the same process, but could make requests to microservices. Websockets are used for clients and the web UI.
Below are descriptions of commonly-used components. Help text for these is extracted from their Python docs and shown in the UI, along with settings that you can control. In the node editor, click on a plugin to open its grid widget or configuration window.
"},{"location":"agent_studio.html#llm","title":"LLM","text":"NanoLLMAutoPromptUserPromptTextStream > nano_llm.plugins.NanoLLM
(source)
Load quantized LLM/VLM with MLC (speed), AWQ (quality), or HF Transformers (compatability)\n\n - Inputs\n * str, list[str], np.ndarray, torch.Tensor, cudaImage, PIL.Image \n\n - Commands \n * /reset /refresh /generate (these inputs can control the chat)\n\n - Outputs\n * delta The latest detokenized text to be generated\n * partial All of the text generated in the response so far\n * final The complete response sent at the end of the request\n * words Similar to 'delta', but with at least one word at a time\n * history A list of the messages in the chat history (as dictionaries)\n * tools Enable the bot to call functions in plugins connected on this channel\n
> nano_llm.plugins.AutoPrompt
(source)
Apply a template each time new data comes in, for example '<image> Describe the image' \n would tag each incoming image along with that text prompt (with images, usually for VLM)\n\n AutoPrompt can be used to compose structured messages that reference prior inputs, \n like `Image 1: <image> Image 2: <image>` or using <text> for last text recieved: \n\n The three most recent replies were:\n * <text>\n * <text>\n * <text> \n\nThese most recent inputs are used in newest to oldest order from a LIFO queue.\n
> nano_llm.plugins.UserPrompt
(source)
This plugin is for inputting text on the keyboard, either from the terminal or text box in the UI.\n\n It can also load prompts from a text or json file, and those files can reference other files to load.\n
> nano_llm.plugins.TextStream
(source)
Simple plugin for viewing any stream of text from the system in a text box in the UI. \n\n For sources emanating from ASR or LLM, can apply color highlighting to partial/final responses.\n
"},{"location":"agent_studio.html#speech","title":"Speech","text":"Whisper ASRPiper TTSVAD FilterRiva > nano_llm.plugins.WhisperASR
(source)
Whisper streaming speech-to-text with TensorRT (github.com/NVIDIA-AI-IOT/whisper_trt)\n\n Supports these Whisper models: 'tiny' (39M), 'base' (74M), 'small' (244M)\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * final The text transcript of the entire segment spoken in the input\n * partial Rolling partial transcript of the words spoken so far in the audio segment\n
The Whisper plugin is designed to be used in a pipeline with the VAD Filter to reduce erroneous output from background noise:
> nano_llm.plugins.PiperTTS
(source)
Piper text-to-speech using CUDA and onnxruntime (github.com/rhasspy/piper)\n\n Populates a list of the available Piper models that it can download, along with the speaker voices \n for multi-speaker models. The default model is 'en_US-libritts-high'. Has controls for voice speed.\n\n - Inputs\n * str either individual words, sentences, or an entire passage to synthesize\n\n - Outputs\n * audio np.ndarray (int16) at 16KHz sample rate (for low-quality models) or 22.5KHz (for medium/high)\n
This PiperTTS preset is available, which uses RateLimit to stream the output at 1x realtime factor (so it can be muted later)
> nano_llm.plugins.VADFilter
(source)
Voice activity detection model using Silero. Drops incoming audio unless it exceeds the VAD threshold.\n\n Use this on incoming audio before ASR plugins to reduce spurious transcripts produced from background noise.\n\n - Inputs\n * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n * audio np.ndarray or torch.Tensor (only when VAD confidence > threshold)\n
Riva ASR / TTS services - these run in another container over GRPC. See these resources for setup:
NGC - riva-embedded quickstart guide
jetson-containers - riva container setup
JetsonHacks - Speech AI on NVIDIA Jetson Tutorial
"},{"location":"agent_studio.html#audio","title":"Audio","text":"AudioInputDevice
USB/I2S soundcard input via PulseAudio/PyAudio (for microphone directly attached to system) AudioOutputDevice
USB/I2S soundcard output via PulseAudio/PyAudio (for speakers directly attached to system) AudioRecorder
Save an audio stream to WAV file on the server WebAudioIn
Recieve audio samples streamed from the client over websockets WebAudioOut
Transmit audio samples to the client over websockets"},{"location":"agent_studio.html#video","title":"Video","text":"VideoSource
Capture images from camera device (V4L2/CSI), network stream (RTP, RTSP), video file (MP4, MKV, AVI, FLV) VideoOutput
Output H264/H265-encoded video to network stream (RTP, RTSP, WebRTC), display, or file (MP4, MKV, AVI, FLV) VideoOverlay
Draw text on top of video streams for HUD or OSD-style display RateLimit
Throttle transmission to a specified rate (can also be used with audio)"},{"location":"agent_studio.html#database","title":"Database","text":"NanoDB
Optimized in-memory multimodal vectorDB for txt2img/img2img similarity search and image tagging [[\u2197]](tutorial_nanodb.md) Deduplicate
Filter incoming text against previous entries with embedding model and cosine similarity over a timeout period EventFilter
Look for keywords or search strings in streams of text, keeping track of the begin/end times that these occur."},{"location":"agent_studio.html#tips-tricks","title":"Tips & Tricks","text":"Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recreated in the dynamic environment of Agent Studio without needing to touch any code. Here are some practical tips as you go about trying different combinations of model pipelines:
"},{"location":"agent_studio.html#ui","title":"UI","text":" To add an element to the agent pipeline, either use the menus in the node editor title bar, or right click inside the editor. Inserting elements via the context menu (right click) will position them at your cursor, so you needn't reposition them. To remove a component from the pipeline, right click on it and then select the little X button in the top right. To remove a connection between components, first select the link and then right click for the X button to appear. Click on a node in the graph to open its grid widget if it implements one. Click again to open its settings dialog. "},{"location":"agent_studio.html#save-load","title":"Save & Load","text":" Under the Agent menu, there is a Save button which will serialize the current pipeline to JSON. These presets are saved in a mounted directory on the host, under jetson-containers/data/nano_llm/presets
Some commonly-used subgraphs are already included to quickly add and combine (like ASR, TTS, and video VLM) You can load these via the Agent menu, by right-clicking inside the node editor, or with the --load
flag at startup. "},{"location":"agent_studio.html#memory","title":"Memory","text":" As you are adding models, keep an eye on the system resources in the top right to make sure you don't run out of memory. Due to loading times, the models are cached in memory even after you remove them from the node editor. To actually free the memory, use the Clear Cache
button in the Agent menu. "},{"location":"agent_studio.html#performance","title":"Performance","text":" You can monitor the system's CPU and GPU usage in the top right corner. Various statistics are refreshed in the nodes to see performance of each component. Use the RateLimiter
plugin to throttle data sources and balance resources. The drop_inputs
setting that some plugins expose will always keep them up with the latest request. "},{"location":"agent_studio.html#templating","title":"Templating","text":" \ud83e\udd16 Have fun bot building! If you need help, reach out on the Jetson Forums or GitHub Issues.
"},{"location":"benchmarks.html","title":"Benchmarks","text":"Below are AI inferencing benchmarks for Jetson Orin Nano Super and Jetson AGX Orin.
"},{"location":"benchmarks.html#jetson-orin-nano-super","title":"Jetson Orin Nano Super","text":"LLM / SLMVision / Language ModelsVision Transformers Model Jetson Orin Nano (original) Jetson Orin Nano Super Perf Gain (X) Llama 3.1 8B 14 19.14 1.37 Llama 3.2 3B 27.7 43.07 1.55 Qwen2.5 7B 14.2 21.75 1.53 Gemma 2 2B 21.5 34.97 1.63 Gemma 2 9B 7.2 9.21 1.28 Phi 3.5 3B 24.7 38.1 1.54 SmolLM2 41 64.5 1.57 For running these benchmarks, this script will launch a series of containers that download/build/run the models with MLC and INT4 quantization.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\nbash jetson-containers/packages/llm/mlc/benchmarks.sh\n
Model Jetson Orin Nano (original) Jetson Orin Nano Super Perf Gain (X) VILA 1.5 3B 0.7 1.06 1.51 VILA 1.5 8B 0.574 0.83 1.45 LLAVA 1.6 7B 0.412 0.57 1.38 Qwen2 VL 2B 2.8 4.4 1.57 InternVL2.5 4B 2.5 5.1 2.04 PaliGemma2 3B 13.7 21.6 1.58 SmolVLM 2B 8.1 12.9 1.59
Model Jetson Orin Nano (original) Jetson Orin Nano Super Perf Gain (X) clip-vit-base-patch32 196 314 1.60 clip-vit-base-patch16 95 161 1.69 DINOv2-base-patch14 75 126 1.68 SAM2 base 4.42 6.34 1.43 Grounding DINO 4.11 6.23 1.52 vit-base-patch16-224 98 158 1.61 vit-base-patch32-224 171 273 1.60"},{"location":"benchmarks.html#jetson-agx-orin","title":"Jetson AGX Orin","text":"Large Language Models (LLM)Small Language Models (SLM)Vision Language Models (VLM)Vision Transformers (ViT)Stable DiffusionRivaVector Database For running LLM benchmarks, see the MLC
container documentation.
Small language models are generally defined as having fewer than 7B parameters (Llama-7B shown for reference) For more data and info about running these models, see the SLM
tutorial and MLC
container documentation.
This measures the end-to-end pipeline performance for continuous streaming like with Live Llava. For more data and info about running these models, see the NanoVLM
tutorial.
VIT performance data from [1] [2] [3]
For running Riva benchmarks, see ASR Performance and TTS Performance.
For running vector database benchmarks, see the NanoDB
container documentation.
"},{"location":"community_articles.html","title":"Community Projects","text":"Below, you'll find a collection of guides, tutorials, and articles contributed by the community showcasing the implementation of generative AI on the Jetson platform.
"},{"location":"community_articles.html#github-jetbot-voice-activated-copilot-tools-with-ros2-riva-and-nanollm-9-21-2024","title":"GitHub Jetbot Voice-Activated Copilot Tools with ROS2, RIVA, and NanoLLM (9-21-2024) ","text":"Jen Hung Ho created ROS2 nodes for ASR/TTS and LLM/VLM on Jetson that can can be used to control JetBot, including customizable voice commands and the execution of advanced actions. Check it out on GitHub under Jen-Hung-Ho/ros2_jetbot_tools
and Jen-Hung-Ho/ros2_jetbot_voice
and on the forums here.
"},{"location":"community_articles.html#hackster-ai-powered-photorealistic-talking-avatar-4-26-2024","title":"Hackster AI-Powered Photorealistic Talking Avatar (4-26-2024) ","text":"Nurgaliyev Shakhizat creates an interactive talking avatar using ASR, TTS, LLM, and Audio2Face with NVIDIA Omniverse and Unreal Engine:
"},{"location":"community_articles.html#hackster-an-eye-for-an-item-4-26-2024","title":"Hackster An Eye for an Item (4-26-2024) ","text":"Allen Tao builds an indoor robot with Isaac ROS that maps your home and remembers where things are using SLAM and NanoDB!
"},{"location":"community_articles.html#hackster-escalator-people-tracker-4-2-2024","title":"Hackster Escalator People Tracker (4-2-2024) ","text":"George Profenza & team install local CV solution for detecting and tracking people in large retail spaces to drive live Generative AI graphics:
"},{"location":"community_articles.html#hackster-edge-style-fashion-preview-at-the-edge-4-1-2024","title":"Hackster Edge Style: Fashion Preview at the Edge (4-1-2024) ","text":"AI-powered app from Andrei Ciobanu shows virtual try-ons with customer images, enhancing retail shopping using Jetson Orin for speed and privacy:
"},{"location":"community_articles.html#hackster-cooking-meals-with-a-local-ai-assistant-on-jetson-axg-orin-4-1-2024","title":"Hackster Cooking meals with a local AI assistant on Jetson AXG Orin (4-1-2024) ","text":"Dimiter Kendri builds a multimodal, multi AI agent, fully local, conversational chatbot with multi agent research capabilities via speech queries:
"},{"location":"community_articles.html#hackster-realtime-language-segment-anything-on-jetson-orin-3-4-2024","title":"Hackster Realtime Language-Segment-Anything on Jetson Orin (3-4-2024) ","text":"Huy Mai enables Segment Anything (SAM) with natural language prompting using GroundingDINO for object detection:
"},{"location":"community_articles.html#github-japanese-nmt-translation-for-stable-diffusion-2-23-2024","title":"GitHub Japanese NMT Translation for Stable Diffusion (2-23-2024) ","text":"Toshihiko Aoki has created a prompt generator for stable-diffusion-webui
that translates Japanese queries into English using a fine-tuned GPT-2 NMT model before feeding them into Stable Diffusion. Check out the full guide on GitHub under to-aoki/ja-tiny-sd-webui
, including the training dataset and LoRA building!
Your browser does not support the video tag."},{"location":"community_articles.html#hackster-clearwater-underwater-image-enhancement-with-generative-ai-2-16-2024","title":"Hackster ClearWater: Underwater Image Enhancement with Generative AI (2-16-2024) ","text":"Vy Pham has created a novel denoising pipeline using a custom trained Transformer-based diffusion model and GAN upscaler for image enhancement, running on Jetson AGX Orin. It runs interactively in a Streamlit web UI for photo capturing and the processing of images and videos. Great work!
Your browser does not support the video tag."},{"location":"community_articles.html#hackster-ai-powered-application-for-the-blind-and-visually-impaired-12-13-2023","title":"Hackster AI-Powered Application for the Blind and Visually Impaired (12-13-2023) ","text":"Nurgaliyev Shakhizat demonstrates a locally-hosted Blind Assistant Device running on Jetson AGX Orin 64GB Developer Kit for realtime image-to-speech translation:
\u00a0 Find more resources about this project here: [Hackster] [GitHub]
"},{"location":"community_articles.html#daves-armoury-bringing-glados-to-life-with-robotics-and-ai-2-8-2024","title":"Dave's Armoury Bringing GLaDOS to life with Robotics and AI (2-8-2024) ","text":"See how DIY robotics legend Dave Niewinski from davesarmoury.com brings GLaDOS to life using Jetson AGX Orin, running LLMs onboard alongside object + depth tracking, and RIVA ASR/TTS with a custom-trained voice model for speech recognition and synthesis! Using Unitree Z1 arm with 3D printing and StereoLabs ZED2.
\u00a0 Find more resources about this project here: [Forums] [GitHub]
"},{"location":"community_articles.html#hackster-seeed-studios-local-voice-chatbot-puts-a-speech-recognizing-llama-2-llm-on-your-jetson-2-7-2024","title":"Hackster Seeed Studio's Local Voice Chatbot Puts a Speech-Recognizing LLaMa-2 LLM on Your Jetson (2-7-2024) ","text":"Seeed Studio has announced the launch of the Local Voice Chatbot, an NVIDIA Riva- and LLaMa-2-based large language model (LLM) chatbot with voice recognition capabilities \u2014 running entirely locally on NVIDIA Jetson devices, including the company's own reComputer range. Follow the step-by-step guide on the Seeed Studio wiki.
"},{"location":"community_articles.html#youtube-genai-nerds-react-insider-look-at-nvidias-newest-generative-ai-2-6-2024","title":"YouTube GenAI Nerds React - Insider Look at NVIDIA's Newest Generative AI (2-6-2024) ","text":"Watch this panel about the latest trends & tech in edge AI, featuring Kerry Shih from OStream, Jim Benson from JetsonHacks, and Dusty from NVIDIA.
"},{"location":"community_articles.html#nvidia-bringing-generative-ai-to-life-with-nvidia-jetson-11-7-2023","title":"NVIDIA Bringing Generative AI to Life with NVIDIA Jetson (11-7-2023) ","text":"Watch this webinar about deploying LLMs, VLMs, ViTs, and vector databases onboard Jetson Orin for building next-generation applications using Generative AI:
"},{"location":"community_articles.html#jetsonhacks-jetson-ai-labs-generative-ai-playground-10-31-2023","title":"JetsonHacks Jetson AI Labs \u2013 Generative AI Playground (10-31-2023) ","text":"JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model.
"},{"location":"community_articles.html#hackster-vision2audio-giving-the-blind-an-understanding-through-ai-10-15-2023","title":"Hackster Vision2Audio - Giving the blind an understanding through AI (10-15-2023) ","text":"Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people:
"},{"location":"community_articles.html#nvidia-generative-ai-models-at-the-edge-10-19-2023","title":"NVIDIA Generative AI Models at the Edge (10-19-2023) ","text":"Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:
\u00a0 Technical Blog - https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/
"},{"location":"community_articles.html#medium-how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-10-02-2023","title":"Medium How to set up your Jetson device for LLM inference and fine-tuning (10-02-2023) ","text":"Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.
https://medium.com/@michaelyuan_88928/how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-682e36444d43
"},{"location":"community_articles.html#hackster-getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-09-16-2023","title":"Hackster Getting Started with AI on Nvidia Jetson AGX Orin Dev Kit (09-16-2023) ","text":"Nurgaliyev Shakhizat demonstrates llamaspeak
on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
"},{"location":"community_articles.html#hackster-new-ai-tool-is-generating-a-lot-of-buzz-09-13-2023","title":"Hackster New AI Tool Is Generating a Lot of Buzz (09-13-2023) ","text":"Nick Bild provides an insightful introduction to the Jetson Generative AI Playground:
https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598
"},{"location":"community_articles.html#jetsonhacks-use-these-jetson-docker-containers-tutorial-09-04-2023","title":"JetsonHacks Use These! Jetson Docker Containers Tutorial (09-04-2023) ","text":"JetsonHacks has a in-depth tutorial on how to use jetson-containers
and even show text-generation-webui
and stable-diffusion-webui
containers in action!
"},{"location":"community_articles.html#hackster-llama-2-llms-w-nvidia-jetson-and-textgeneration-web-ui-08-17-2023","title":"Hackster LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui (08-17-2023) ","text":"Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit in this Hackster post:
"},{"location":"community_articles.html#hackster-running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-08-14-2023","title":"Hackster Running a ChatGPT-Like LLM-LLaMA2 on a Nvidia Jetson Cluster (08-14-2023) ","text":"Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat:
"},{"location":"community_articles.html#jetsonhacks-speech-ai-on-nvidia-jetson-tutorial-08-07-2023","title":"JetsonHacks Speech AI on NVIDIA Jetson Tutorial (08-07-2023) ","text":"JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.
"},{"location":"community_articles.html#hackster-llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-07-12-2023","title":"Hackster LLM based Multimodal AI w/ Azure Open AI & NVIDIA Jetson (07-12-2023) ","text":"Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.
"},{"location":"community_articles.html#hackster-how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-06-13-2023","title":"Hackster How to Run a ChatGPT-Like LLM on NVIDIA Jetson board (06-13-2023) ","text":"Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.
"},{"location":"cosmos.html","title":"Cosmos - World Foundation Models","text":"Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.
Special thanks to Johnny N\u00fa\u00f1ez Cano for porting the Cosmos and Transformer Engine Jetson! See Cosmos Official page by Nvidia. See Transformer Engine by Nvidia.
What you need
One of the following Jetson devices:
Jetson Thor (XGB) Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
12.26GB
for cosmos
container image Space for models and datasets (>50GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
WARNING
Transformer Engine:
Cosmos is optimized for NVIDIA ADA GPU architecture generations and later due running in FP8. Jetson AGX Orin is based on Ampere. Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later. "},{"location":"cosmos.html#start-container","title":"Start Container","text":"Use this command to automatically run, build, or pull a compatible container image for cosmos:
jetson-containers run $(autotag cosmos)\n
To mount your own directories into the container, use the -v
or --volume
flags:
jetson-containers run -v /path/on/host:/path/in/container $(autotag cosmos)\n
Recommendation (This download all models outside docker container):
git clone --recursive https://github.com/NVIDIA/Cosmos.git\ncd Cosmos\njetson-containers run -it -v $(pwd):/workspace $(autotag cosmos)\n
"},{"location":"cosmos.html#follow-the-instructions-from-cosmos-repository","title":"Follow the instructions from Cosmos repository.","text":"Here is the summarized steps to run the Cosmos models:
Generate a Hugging Face access token. Set the access token to 'Read' permission (default is 'Fine-grained').
huggingface-cli login\n
Download Models:
PYTHONPATH=$(pwd) python3 cosmos1/scripts/download_diffusion.py --model_sizes 7B 14B --model_types Text2World Video2World\n
Run Demo:
PROMPT=\"A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \\\nThe robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \\\nA glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \\\nsuggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \\\nThe camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \\\nfield that keeps the focus on the robot while subtly blurring the background for a cinematic effect.\"\n
PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py \\\n --checkpoint_dir checkpoints \\\n --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \\\n --prompt \"$PROMPT\" \\\n --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \\\n --offload_tokenizer \\\n --offload_diffusion_transformer \\\n --offload_text_encoder_model \\\n --offload_prompt_upsampler \\\n --offload_guardrail_models\n
It will generate a video file in the outputs
directory.
Another example:
PROMPT=\"The video showcases a vibrant, magical garden where flowers bloom dynamically, opening and moving as though responding to a gentle rhythm in nature. \\\nColorful butterflies glide gracefully through the air, and a small, clear stream winds its way through the scene, reflecting the warm glow of sunlight. \\\nA curious rabbit hops along a winding path, leading the viewer to a hidden alcove where a tree with golden, shimmering leaves stands, its branches moving slightly as if alive with energy. \\\nThe entire scene radiates tranquility and wonder, inviting viewers to immerse themselves in the beauty of nature and magic combined.\"\n
PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py \\\n --checkpoint_dir checkpoints \\\n --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \\\n --prompt \"$PROMPT\" \\\n --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \\\n --offload_tokenizer \\\n --offload_diffusion_transformer \\\n --offload_text_encoder_model \\\n --offload_prompt_upsampler \\\n --offload_guardrail_models\n
"},{"location":"getting-started.html","title":"Getting started","text":""},{"location":"hello_ai_world.html","title":"Hello AI World","text":"Hello AI World is an in-depth tutorial series for DNN-based inference and training of image classification, object detection, semantic segmentation, and more. It is built on the jetson-inference
library using TensorRT for optimized performance on Jetson.
It's highly recommended to familiarize yourself with the concepts of machine learning and computer vision before diving into the more advanced topics of generative AI here on the Jetson AI Lab. Many of these models will prove useful to have during your development.
HELLO AI WORLD >> https://github.com/dusty-nv/jetson-inference
"},{"location":"initial_setup_jon.html","title":"Initial Setup Guide for Jetson Orin Nano Developer Kit","text":"Note
This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.
The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.
With the December 2024 software update (JetPack 6.1 (rev.1)), this advanced edge computer delivers up to 70% more performance, making it an even more powerful platform for the era of generative AI.
This guide explains the complete flow from opening the box, updating the firmware if needed, flashing the latest JetPack 6.1 (rev. 1) image on SD card, and the initial software setup, so that you will be ready for tutorials listed on this site and other AI projects.
"},{"location":"initial_setup_jon.html#check-your-inventory","title":"Check your inventory","text":"The following item is needed or highly desired to set up your Jetson Orin Nano Developer Kit. If you don't have them in your inventory, you want to arrange them and return to this guide once they are available.
What not come in the box - What you need/want to prepare
"},{"location":"initial_setup_jon.html#storage","title":"Storage","text":" microSD card (64GB or bigger) NVMe SSD (Optional, but highly recommended for following tutorials on this site) "},{"location":"initial_setup_jon.html#mean-to-access-terminal","title":"Mean to access terminal","text":"You need either of the following set:
DisplayPort cable, DisplayPort capable monitor and a USB keyboard DisplayPort to HDMI cable and HDMI capable monitor (or TV) and a USB keyboard USB to TTL Serial cable (Advanced) "},{"location":"initial_setup_jon.html#alternative-method-sdk-manager","title":"\ud83d\udef8 Alternative method : SDK Manager","text":"In case you have an x86 PC running Ubuntu 22.04 or 20.04, then you can flash your Jetson Orin Nano Developer Kit with the latest firmware and JetPack all at once using NVIDIA SDK Manager.
Also, if you wish to not use a microSD card but rather use a large NVMe SSD for the OS and data, then you need to use SDK Manager to flash the latest JetPack on the NVMe SSD.
flowchart LR\n A(start) --> B{Want to ditch microSD<br>and only use NVMe SSD?}\n B --[YES] --> S[\ud83d\udef8 SDK Manager method]\n B --[No] --> C{Have x86 PC running<br> Ubuntu 22.04/20.04?}\n C --[YES] --> S\n C --[No] --> U[\ud83d\ude80 microSD-only method]\n\n style S stroke-width:3px, fill:#f3e9f2,stroke:#b544c4\n style U stroke-width:3px, fill:#d2e9e5,stroke:#0e7a71
Click the button below to jump to a page that explains the alternative setup method using SDK Manager if you want to set your Jetson Orin Nano Developer Kit with an NVMe SSD or just want to flash all at once with your Ubuntu PC.
\ud83d\udef8 SDK Manager method
Otherwise, continue reading on this page for the microSD-only setup method.
"},{"location":"initial_setup_jon.html#overall-flow-microsd-only-method","title":"Overall flow (microSD-only method)","text":"Jetson Orin Nano Initial Setup Flowchart (microSD-only method)
flowchart\n A(start) --> B{1\ufe0f\u20e3 Check if Jetson UEFI Firmware<br>is newer than version 36.0}\n B --[YES] --> O[6\ufe0f\u20e3 Boot with JetPack 6.1 rev.1 microSD card<br> to schedule firmware update]\n B --[No] --> C[2\ufe0f\u20e3 Boot with JetPack 5.1.3 microSD card<br>to schedule firmware update]\n C --> D[3\ufe0f\u20e3 Reboot] --> E{{Firmware update to 5.0 during reboot}}\n E --> F[4\ufe0f\u20e3 Run QSPI updater] --> G[5\ufe0f\u20e3 Reboot] --> H{{QSPI update during reboot - Firmware 36.4.0}}\n H --> O\n O --> P[7\ufe0f\u20e3 Reboot]\n P --> Q{{Firmware update to 36.4.2 during reboot}}\n Q --> R[8\ufe0f\u20e3 Unlock super performance] \n R --> S(\ud83d\udc4d Start developing on JetPack 6.x) \n\n\n style C fill:#fee\n style D fill:#DEE,stroke:#333\n style E stroke-width:2px,stroke-dasharray: 5 5\n style F stroke-width:4px\n style G fill:#DEE,stroke:#333\n style H stroke-width:2px,stroke-dasharray: 5 5\n style O fill:#fee\n style P fill:#DEE,stroke:#333\n style Q stroke-width:2px,stroke-dasharray: 5 5\n style R fill:#f2d5ff
Note that it will undergo a total of three (3) reboot cycles.
"},{"location":"initial_setup_jon.html#walk-through-video","title":"Walk-through Video","text":"Click here to expand and watch video"},{"location":"initial_setup_jon.html#1-check-if-jetson-uefi-firmware-version-360","title":"1\ufe0f\u20e3 Check if Jetson UEFI Firmware version > 36.0
","text":"Your Jetson Orin Nano Developer Kit may have the latest firmware (\"Jetson UEFI firmware\" on QSPI-NOR flash memory) flashed at the factory.
If not, we need to go through a set of procedures to upgrade to the latest firmware. (Luckily, we can now do this all just on Jetson, meaning we don't need to use a host Ubuntu PC any more!)
So let's first check the version of your Jetson UEFI Firmware. You can take one of the following methods.
Monitor-attached Headless\ud83d\ude01I'm feeling lucky Connect your monitor and USB keyboard to your developer kit. Turn on the developer kit by plugging in the bundled DC power supply Repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor You should see UEFI setup menu screen Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware Below image shows that this Jetson Orin Nano Developer Kit unit has the old UEFI firmware, indicating that the unit first needs its firmware/QSPI updated in order to run JetPack 6.x.
Connect USB to TTL Serial cable onto the following pins on J14
\"button\" header of carrier board located under the Jetson module. On your PC, run your console monitor program and open the USB serial port. Power on the developer kit by plugging in the bundled DC power supply On the PC console, repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor You should see UEFI setup menu screen Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware Warning
Avoid leaving the device in a black screen state for extended periods or attempting to boot repeatedly with an SD card containing an incompatible JetPack version.
There is a chance that accumulated boot failures may trigger the L4T launcher to always boot into recovery kernel.
If this happens, even after switching to the proper JetPack 5.1.3 SD card, it won't boot from the SD card until we manually change the L4T Launcher setting.
Although we don't recommend for the above stated reason, you could skip to 6\ufe0f\u20e3 Boot with JetPack 6.x SD card, and try your luck to see if your Jetson just boots with the JetPack 6.x SD card.
If the Ubuntu desktop does not appear within 3 minutes and the screen remains black (as shown below), it indicates that the unit is running an older UEFI firmware version that is incompatible with the JetPack 6.x SD card.
Turn the unit off, and skip to the next section (2\ufe0f\u20e3 Boot with JetPack 5.1.3 SD card to schedule firmware update).
"},{"location":"initial_setup_jon.html#determine-qspi-update-is-necessary-or-not","title":"Determine QSPI update is necessary or not","text":"Attention
Select the appropriate tab below based on your firmware version you found in the above step.
If you found your Jetson Orin Nano needs its firmware updated to run JetPack 6.x, click \" Firmware < 36.0\" tab, and then additional step 2 to 5 will appear for you to follow.
If you know your Jetson Orin Nano has the latest firmware, stay on \" Firmware 36.x\" tab, and skip to the next section (6\ufe0f\u20e3 Boot with JetPack 6.x SD card)
Firmware < 36.0 Firmware 36.x Your Jetson Orin Nano needs its firmware updated in order to make JetPack 6.x SD card work.
Perform the following steps (2 to 5).
Your Jetson Orin Nano has the latest firmware that is ready for JetPack 6.x SD card.
Skip to the next section (6\ufe0f\u20e3 Boot with JetPack 6.x SD card)
"},{"location":"initial_setup_jon.html#2-boot-with-jetpack-513-sd-card-to-schedule-firmware-update","title":"2\ufe0f\u20e3 Boot with JetPack 5.1.3 SD card to schedule firmware update","text":"First, we need to run JetPack 5.1.3 in order to let its nvidia-l4t-bootloader
package get its bootloader/firmware updater activated, so that the firmware update automatically runs the next time it reboots.
Download SD card image on to your PC
On your PC, download JetPack 5.1.3 image for Jetson Orin Nano Developer Kit from the official JetPack 5.1.3 page or from the below direct link button.
Warning
NVIDIA had updated the JetPack 5.1.3 image on 5/28/2024, as the old version had some issue and the following process did not work.So please download and use the latest image (the new file name is JP513-orin-nano-sd-card-image_b29.zip
).
Jetson Orin Nano Developer KitJetPack 5.1.3 image
Use Balena Etcher to flash image to SD card
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the flashed microSD card into the slot on Jetson module
Power-on
Turn on the Jetson Orin Nano Developer Kit with JetPack 5.1.3 SD card inserted by plugging in the DC power supply.
Complete the initial software setup (oem-config
)
Info
There is a chance that JetPack 5.1.3 SD card does not show any desktop UI on the attached display (the display remains black).
It would still schedule the UEFI firmware update in the background, so when you power cycle the unit, most likely it will perform the firmware update on the next boot up.
So if this happens, skip to the next step (3\ufe0f\u20e3-2).
Ensure firmware update is scheduled.
Once Jetson boots into Jetson Linux system, a background service automatically runs to schedule a firmware update (if needed) to be performed during the next boot-up process.
Once you see the following, or just wait about 5 minutes after powering on to ensure the scheduling is done, reboot.
GUI CUI
$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n
"},{"location":"initial_setup_jon.html#3-reboot-and-observe-firmware-update-to-50","title":"3\ufe0f\u20e3 Reboot and observe firmware update to 5.0
","text":" Reboot
Reboot your Jetson Orin Nano Developer Kit.
GUI CUI On the Ubuntu desktop click the power icon () and select \"Restart...\".
$ sudo reboot\n
Observe firmware update
You should see the following during the boot up process.
With monitor Headless (serial)
Once done, you will boot into JetPack 5.1.3 (again), with underlying firmware updated to 5.0-35550185
.
"},{"location":"initial_setup_jon.html#4-run-qspi-updater-package-to-schedule-qspi-update","title":"4\ufe0f\u20e3 Run QSPI Updater package to schedule QSPI update","text":"Now that your UEFI firmware is updated to 35.5.0 ( = JetPack 5.1.3 ), it is capable of updating the entire QSPI content to make it ready for JetPack 6.x.
We will run a special tool so that the entire QSPI update is scheduled to run automatically on the next boot-up.
Double-check your firmware version is up to date (35.5.0
= JetPack 5.1.3)
Once it reboots back into Jetson Linux system, on Jetson terminal, run the following:
sudo nvbootctrl dump-slots-info\n
You should see something like the following, with the Current version indicating 35.5.0
.
Current version: 35.5.0\nCapsule update status: 0\nCurrent bootloader slot: A\nActive bootloader slot: A\nnum_slots: 2\nslot: 0, status: normal\nslot: 1, status: normal\n
Install QSPI Updater Debian package to trigger the entire QSPI update
On Jetson terminal, run the following:
sudo apt-get install nvidia-l4t-jetson-orin-nano-qspi-updater\n
Installing the nvidia-l4t-jetson-orin-nano-qspi-updater
automatically runs its script to schedule the entire QSPI update to be performed during the next boot process, so that the firmware is ready for JetPack 6.x.
"},{"location":"initial_setup_jon.html#5-reboot-observe-qspi-update-and-power-off","title":"5\ufe0f\u20e3 Reboot, observe QSPI update, and power off","text":" Reboot
Once the QSPI update is scheduled, reboot your Jetson Orin Nano Developer Kit.
Observe update
You can observe the QSPI update during the boot up process.
Power off
Once the update is done, it reboots and tries to boot, however it will get stuck UNLESS you change the SD card to JetPack 6.x one.
Therefore you should just power off the developer kit simply by disconnecting the DC power supply.
Attention
This part may look very confusing as neither the attached monitor nor the debug UART shows any explicit message on what action to take next.
What is going on here is that the Jetson's firmware (inside the QSPI-NOR flash memory) is now updated, ready for the JetPack 6.x SD card, however it is now incompatible with JetPack 5.1.3 SD card left in the Jetson module's slot, so after the reboot it gets stuck in the boot process.
So there is no issue with this boot halt (or endless rebooting). Simply power off the device and insert the new SD card.
"},{"location":"initial_setup_jon.html#6-boot-with-jetpack-6x-sd-card","title":"6\ufe0f\u20e3 Boot with JetPack 6.x SD card","text":"Once we know the onboard firmware is up-to-date and ready for JetPack 6.x, we can boot Jetson Orin Nano Developer Kit with a microSD card for JetPack 6.
Download SD card image on to your PC
On your PC, download the latest JetPack 6.x image for Jetson Orin Nano Developer Kit from the official JetPack page or from the below direct link button.
Jetson Orin Nano Developer KitJetPack 6.1 (rev. 1) image
Use Balena Etcher to flash image to SD card
Insert your microSD card into your PC's SD card slot, and use Balena Etcher to flash the SD card with the image you just downloaded.
If you don't have Balena Etcher on your PC, download from Balena official site.
Insert the JetPack 6.x microSD card into the slot on Jetson module
Power-on by plugging the DC power supply
Complete the initial software setup (oem-config
)
Ensure firmware update is scheduled.
JetPack 6.1 (rev 1) SD card will automatically schedule another firmware/bootloader update in the next (3rd and final) reboot cycle.
GUI CUI
$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n
"},{"location":"initial_setup_jon.html#7-reboot-and-observe-firmware-update-to-3642","title":"7\ufe0f\u20e3 Reboot and observe firmware update to 36.4.2
","text":" Delete old power profile
Attention
If your Jetson Orin Developer Kit was previously running JetPack 6.0 or JetPack 6.1, execute the following command after the final login and reboot your device. This ensures that the MAXN performance mode becomes available on your system.
We need to delete the old power config in preparation for Super config introduction.
sudo rm -rf /etc/nvpmodel.conf\n
Reboot your Jetson Orin Nano Developer Kit on JetPack 6.1 (rev.1) SD card to trigger the final firmware update (to 36.4.2
).
GUI CUI On the Ubuntu desktop click the power icon () and select \"Restart...\".
$ sudo reboot\n
Observe firmware update
You should see the following during the boot up process.
With monitor Headless (serial)
Once done, you will boot into JetPack 6.1 (rev.1) again, with the underlying firmware updated to 36.4.2
, which unlock the Super performance.
"},{"location":"initial_setup_jon.html#8-unlock-super-performance","title":"8\ufe0f\u20e3 Unlock Super Performance","text":"Attention
If your Jetson Orin Developer Kit was previously running JetPack 6.0 or JetPack 6.1, execute the following command after the final login and reboot your device. This ensures that the MAXN performance mode becomes available on your system.
sudo rm -rf /etc/nvpmodel.conf\n
"},{"location":"initial_setup_jon.html#switch-to-maxn-mode","title":"Switch to MAXN mode","text":"Note that the default power mode is 15W. To switch to the new power mode and unlock the increased performance, follow below steps:
Click on the current power mode (15W) by clicking the NVIDIA icon on the right side of the Ubuntu desktop\u2019s top bar. Select Power mode from the menu. Choose MAXN to enable maximum performance.
"},{"location":"initial_setup_jon.html#8-start-developing-on-jetpack-6x","title":"8\ufe0f\u20e3 Start developing on JetPack 6.x","text":"\ud83c\udf8a Congratulations! Your Jetson Orin Nano Developer Kit is set up with JetPack 6.x SD card and you are ready to develop on JetPack 6.x.
"},{"location":"initial_setup_jon.html#next-step","title":"Next step","text":""},{"location":"initial_setup_jon.html#nvme-ssd-installation","title":"NVMe SSD installation","text":"Take a look at this page for installing NVMe SSD and setting up Docker with it.
"},{"location":"initial_setup_jon_sdkm.html","title":"\ud83d\udef8 Initial Setup using SDK Manager","text":"Note
This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.
The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.
This guide explains the alternative method for setting up Jetson Orin Nano Developer Kit by flashing both the latest firmware (QSPI image) and the latest JetPack all at once, using a host x86 PC.
"},{"location":"initial_setup_jon_sdkm.html#check-your-toolbox","title":"Check your toolbox","text":"The following item is needed or highly desired for flashing your Jetson Orin Nano Developer Kit using your host PC.
If you don't have them, check the default microSD-only setup instruction or you want to arrange them first then return to this guide once they are available.
What not come in the box - What you need/want to prepare
"},{"location":"initial_setup_jon_sdkm.html#host-pc","title":"Host PC","text":" x86 PC running Ubuntu 22.04 or Ubuntu 20.04 Source: NVIDIA SDK Manager Host Operating System Compatibility Matrix
"},{"location":"initial_setup_jon_sdkm.html#storage","title":"Storage","text":"Either of the following.
microSD card (64GB or bigger) NVMe SSD (Recommended for better performance)
"},{"location":"initial_setup_jon_sdkm.html#flashing-supply","title":"Flashing supply","text":" USB cable (USB-C plug for Jetson Orin Nano Developer Kit side, other end depends on your PC) jumper pin (or metal paper clip) "},{"location":"initial_setup_jon_sdkm.html#mean-to-access-terminal","title":"Mean to access terminal","text":"You need either of the following set:
DisplayPort cable, DisplayPort capable monitor and a USB keyboard DisplayPort to HDMI cable and HDMI capable monitor (or TV) and a USB keyboard USB to TTL Serial cable (Advanced) "},{"location":"initial_setup_jon_sdkm.html#default-method-microsd-card-only-setup-method","title":"\ud83d\ude80 Default method : microSD card only setup method","text":"In case you do NOT have an x86 PC running Ubuntu 22.04 or 20.04, you can fall back to the default \"microSD card only\" setup method that does not require any host PC.
The decision process can look like this.
flowchart LR\n A(start) --> B{Want to ditch microSD<br>and only use NVMe SSD?}\n B --[YES] --> S[\ud83d\udef8 SDK Manager method]\n B --[No] --> C{Have x86 PC running<br> Ubuntu 22.04/20.04?}\n C --[YES] --> S\n C --[No] --> U[\ud83d\ude80 microSD-only method]\n\n style S stroke-width:3px, fill:#f3e9f2,stroke:#b544c4\n style U stroke-width:3px, fill:#d2e9e5,stroke:#0e7a71
Click the button below to jump to the default setup page.
\ud83d\ude80 microSD-only method
Otherwise, continue reading on this page for the SDK Manager method.
"},{"location":"initial_setup_jon_sdkm.html#overall-flow-sdk-manager-method","title":"Overall flow (SDK Manager method)","text":"Jetson Orin Nano Initial Setup Flowchart (microSD-only method)
flowchart\n A(start) --> B[0\ufe0f\u20e3 Install SDK Manager]\n B --> C[1\ufe0f\u20e3 Connect Target Hardware] \n C --> D[2\ufe0f\u20e3 Select Software Components to Install]\n D --> E[3\ufe0f\u20e3 Download on Host PC]\n E --> F[4\ufe0f\u20e3 Flash]\n F --> G[5\ufe0f\u20e3 Boot and initial setup]\n G --> P(\ud83d\udc4d Start developing on JetPack 6.x)
"},{"location":"initial_setup_jon_sdkm.html#0-install-sdk-manager","title":"0\ufe0f\u20e3 Install SDK Manager","text":"Reference
NVIDIA SDK Manager Documentation : Download and Install Download and Run SDK Manager
On your Ubuntu 22.04 / 20.04 PC, open a terminal and execute the following command.
Select the tab that matches your Ubuntu version.
Ubuntu 22.04 Ubuntu 20.04 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install sdkmanager\n
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install sdkmanager\n
Then launch SDK Manager with the following command.
sdkmanager\n
First time using SDK Manager When using SDK Manager for the first time, log in with your NVIDIA Developer credentials.
Remember to tick the checkbox for \"Stay logged in\" and click \"LOGIN\" button.
A browser window opens. Input your email address that is registered as NVIDIA Developer.
Two factor authentication prompt may pop up.
Once authenticated, the browser windows opens back the SDK Manager app window.
SDK Manager continues to log you in.
SDK Manager launched and the login completed.
"},{"location":"initial_setup_jon_sdkm.html#1-connect-the-target-hardware","title":"1\ufe0f\u20e3 Connect the Target Hardware","text":"Once SDK Manager is launched, connect your Jetson developer kit to your Ubuntu PC and power it on in Forced Recovery mode.
Connect your Jetson Orin Nano Developer Kit to your PC with a USB cable.Note that the USB cable goes into the USB-C port of the carrier board of the developer kit.
While shorting pin 9
and pin 10
of J14
header located below the Jetson module using a jumper pin, insert the DC power supply plug into the DC jack of the carrier board to power it on.
You can use a paper clip like a photo below as well.
Select \"Jetson Orin Nano [8GB developer kit version]\" and hit \"OK\"
Uncheck \"Host Machine\" and it should now look like this
Click \"Continue\" button to proceed to the next step.
"},{"location":"initial_setup_jon_sdkm.html#2-select-software-components-to-install","title":"2\ufe0f\u20e3 Select Software Components to Install]","text":" Leave the only \"Jetson Linux\" component checked, and uncheck everything.
Remember to also check \"I accept the terms and conditions of the license agreements\".
Click \"Continue\" button to proceed to the next step.
It will prompt for the sudo
command password.
"},{"location":"initial_setup_jon_sdkm.html#3-download-on-host-pc","title":"3\ufe0f\u20e3 Download on Host PC","text":" It will start downloading the \"BSP\" package and \"RootFS\" package.
Once downloads are complete, it will untar the package and start generating the images to flash in the background.
Once images are ready, SDK it will open the prompt for flashing.
"},{"location":"initial_setup_jon_sdkm.html#4-flash","title":"4\ufe0f\u20e3 Flash","text":" On the flashing prompt, select \"Runtime\" for \"OEM Configuration\".
On the flashing prompt, select \"NVMe\" if you want to flash Jetson Linux (BSP) to NVMe SSD.
Click \"Flash\" and the prompt popup will change like this.
Flash progress.
DetailsTerminals
Flash successfully completes.
DetailsTerminals
"},{"location":"initial_setup_jon_sdkm.html#5-boot-and-initial-setup","title":"5\ufe0f\u20e3 Boot and Initial Setup","text":" If still plugged, remove the jumper from header (that was used to put it in Forced Recovery mode)
Connect the DisplayPort cable or adapter and USB keyboard and mouse to Jetson Orin Nano Developer Kit, or hook up the USB to TTL Serial cable.
Unplug the power supply and put back in to power cycle.
Jetson should now boot into the Jetson Linux (BSP) of your selected JetPack version from the storage of your choice.
Complete the initial software setup (oem-config
)
"},{"location":"initial_setup_jon_sdkm.html#switch-to-maxn-mode","title":"Switch to MAXN mode","text":"Note that the default power mode is 15W. To switch to the new power mode and unlock the increased performance, follow below steps:
Click on the current power mode (15W) by clicking the NVIDIA icon on the right side of the Ubuntu desktop\u2019s top bar. Select Power mode from the menu. Choose MAXN to enable maximum performance.
\ud83c\udf8a Congratulations! Your Jetson Orin Nano Developer Kit is set up with JetPack 6.x image and you are ready to develop on JetPack 6.x.
"},{"location":"initial_setup_jon_sdkm.html#next-step","title":"Next step","text":"Check this page for \"SSD\" and/or \"Docker\" set up.
"},{"location":"lerobot.html","title":"HuggingFace LeRobot","text":"Let's run HuggingFace LeRobot
to train Transformer-based action diffusion policies and ACT onboard NVIDIA Jetson. These models learn to predict actions for a particular task from visual inputs and prior trajectories, typically collected during teleoperation or in simulation.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 GA (L4T r36.3) JetPack 6.1 (L4T r36.4)
Sufficient storage space (preferably with NVMe SSD).
16.5GB
for lerobot
container image Space for models (>2GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"lerobot.html#work-with-real-world-robots-before-starting-containers","title":"Work with Real-World Robots - Before starting containers","text":"This section gives the guide on how you can work through the LeRobot official example of Getting Started with Real-World Robots (7_get_started_with_real_robot.md
) on your Jetson.
Tip
It's recommended to work on your Jetson in monitor-attached mode.
lerobot
is designed to show camera view in windows and playback TTS audio while capturing dataset, so it is more convenient to setup your Jetson with its monitor (and speakers) attached to Jetson.d
"},{"location":"lerobot.html#a-check-jetson-containers-location","title":"a. Check jetson-container
's location","text":"Through out the course of all the workflows of lerobot
, we will be generating a lot of data, especially for capturing dataset.
We will clone the lerobot
directory on host and mount the directory in the container to keep all the data persistant, but first make sure your jetson-containers
directory is placed on your SSD, not on your eMMC or microSD card.
If you have created the jetson-containers
directory on eMMC or microSD card (likely the case if you first set up your Jetson device without SSD first and later added SSD), then use the rsync
command to move the entire directory under SSD mount point.
rsync -aHAX /home/jetson/jetson-containers/ /ssd/jetson-containers/\n
Then run the installer again.
bash /ssd/jetson-containers/install.sh\n
"},{"location":"lerobot.html#b-create-lerobot-dir-on-host","title":"b. Create lerobot
dir on host","text":"As described above, we will setup the lerobot
directory under data
directory of jetson-containers
for monting it inside the container so that generated data persist.
cd jetson-containers\n./packages/robots/lerobot/clone_lerobot_dir_under_data.sh\n./packages/robots/lerobot/copy_overlay_files_in_data_lerobot.sh\n
"},{"location":"lerobot.html#c-pulseaudio-setup","title":"c. PulseAudio setup","text":"LeRobot's dataset capture flow (control_robot.py
) utilizes Speech Dispatcher to use espeak TTS, in order to give operators audio queues for notifying the status and signaling the next operation. It's actually very helpful.
Speech Dispatcher utilizes Pulse Audio, so rather than just sharing the /dev/snd
device when docker run
(which is good for ALSA), we need to add the following arguments.
--device /dev/snd \\\n -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native \\\n -v ${XDG_RUNTIME_DIR}/pulse:${XDG_RUNTIME_DIR}/pulse \\\n
This is already added to run.sh
of jetson-containers
, however, we need to edit /etc/pulse/default.pa
in order to allow the root user access to the socket file.
sudo vi /etc/pulse/default.pa\n
Find the section loading module-native-protomocl-unix
and add auth-anonymous=1
### Load several protocols\n.ifexists module-esound-protocol-unix.so\nload-module module-esound-protocol-unix auth-anonymous=1\n.endif\nload-module module-nativ\n
Then restart PulseAudio service to make the config take effect.
pulseaudio --kill\npulseaudio --start\n
For troubleshootings or details, please check the docs.md
of speech-dispatcher
package.
"},{"location":"lerobot.html#d-set-udev-rule-for-acm-devices","title":"d. Set udev rule for ACM devices","text":"It is more convenient if the lerobot programs can always find the device of leader and follower arm with unique names.
For that, we set an udev rule so that arms always get assigned the same device name as following. This is first done on Jetson host side.
/dev/ttyACM_kochleader
: Leader arm /dev/ttyACM_kochfollower
: Follower arm First only connect the leader arm to Jetson and record the serial ID by running the following:
ll /dev/serial/by-id/\n
The output should look like this.
lrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_BA98C8C350304A46462E3120FF121B06-if00 -> ../../ttyACM1\n
Then edit the first line of ./99-usb-serial.rules
like the following.
You can find the template of this file under ./packages/robots/lerobot
directory.
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"00000000000000000000000000000000\", SYMLINK+=\"ttyACM_kochfollower\"\n
Now disconnect the leader arm, and then only connect the follower arm to Jetson.
Repeat the same steps to record the serial to edit the second line of 99-usb-serial.rules
file.
$ ll /dev/serial/by-id/\nlrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_483F88DC50304A46462E3120FF0C081A-if00 -> ../../ttyACM0\n$ vi ./packages/robots/lerobot\n
You should have ./99-usb-serial.rules
now looking like this:
SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"483F88DC50304A46462E3120FF0C081A\", SYMLINK+=\"ttyACM_kochfollower\"\n
Finally copy this under /etc/udev/rules.d/
(of host), and restart Jetson.
sudo cp ./99-usb-serial.rules /etc/udev/rules.d/\nsudo reboot\n
After reboot, check if we now have achieved the desired fixed simlinks names for the arms.
ls -l /dev/ttyACM*\n
You should get something like this:
crw-rw---- 1 root dialout 166, 0 Sep 24 17:20 /dev/ttyACM0\ncrw-rw---- 1 root dialout 166, 1 Sep 24 16:13 /dev/ttyACM1\nlrwxrwxrwx 1 root root 7 Sep 24 17:20 /dev/ttyACM_kochfollower -> ttyACM0\nlrwxrwxrwx 1 root root 7 Sep 24 16:13 /dev/ttyACM_kochleader -> ttyACM1\n
"},{"location":"lerobot.html#e-optional-csi-cameras","title":"e. (Optional) CSI cameras","text":"If you plan to use CSI cameras (not USB webcams) for data capture, you will use the new --csi2webcam
options of jetson-containers
, which exposes V4L2loopback devices that performs like USB webcams (MJPEG) for CSI cameras using Jetson's hardware JPEG encoder.
This feature require some packages to be installed.
sudo apt update && sudo apt install v4l2loopback-dkms v4l-utils\n
"},{"location":"lerobot.html#f-increse-the-swap-file-size","title":"f. Increse the swap file size","text":"You may ran out of memory when are setting up to perform ACT model training.
swapoff -a -v\nsudo rm /swfile\nsudo systemctl disable nvzramconfi\nsudo fallocate -l 8G /ssd/8GB.swap\nsudo chmod 600 /ssd/8GB.swap\nsudo mkswap /ssd/8GB.swap\nsudo echo \"/ssd/8GB.swap swap swap defaults 0 0\" >> /etc/fstab\nsudo reboot\n
"},{"location":"lerobot.html#g-starting-the-lerobot-container","title":"g. Starting the lerobot
container","text":"USB webcamsCSI cameras cd jetson-containers\n./run.sh \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
cd jetson-containers\n./run.sh \\\n --csi2webcam --csi-capture-res='1640x1232@30' --csi-output-res='640x480@30' \\\n -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n $(./autotag lerobot)\n
"},{"location":"lerobot.html#work-with-real-world-robots-once-in-container","title":"Work with Real-World Robots - Once in container","text":"JupyerLab tip
Inside the lerobot
container, JupyterLab server process starts.
You can access with http://localhost:8888/
(or http://<IP_ADDRESS>:8888/
from other PC on the same network).
In the notebooks
, there are some Jupyter notebooks for each segment of the official tutorial Getting Started with Real-World Robots (7_get_started_with_real_robot.md
).
Please note that some of them (like notebooks/7-2_real-robot_configure-motors.ipynb
) can be used as a real work notebook to execute python codes and scritps convniently inside the notebook along with instructions (rather than switching to console).
However, keep in mind that you are encouraged to always check the original official tutorial, and some operation like training is much better executed on console.
Bash history tip
Inside the container, on the console, you can press Up key to scroll through some of the frequently used commands pre-registered in bash history.
"},{"location":"lerobot.html#q-setup-audio","title":"q. Setup audio","text":"Check if PulseAudio is available.
pactl info\n
If you need to set the default audio output device, use set-default-sink
.
pactl list short sinks\npactl set-default-sink [SINK_NAME_OR_INDEX]\n
"},{"location":"lerobot.html#1-order-and-assemble-your-koch-v11","title":"1. Order and Assemble your Koch v1.1","text":"You can order the Koch v1.1 kits from ROBOTIS. (Note: they don't come with 3d printed parts)
TODO:
Document Jetson unique hardware setup Share custom 3D print models "},{"location":"lerobot.html#2-configure-motors-calibrate-arms-teleoperate-your-koch-v11","title":"2. Configure motors, calibrate arms, teleoperate your Koch v1.1","text":"Follow the Jupyter notebook 7-2_real-robot_configure-motors.ipynb
.
"},{"location":"lerobot.html#3-record-your-dataset-and-visualize-it","title":"3. Record your Dataset and Visualize it","text":"You should mostly operate on the container's terminal.
Follow the official document's section.
Camera config tip
The official document demonstrates the two camera positions, one at the top (\"phone\") and the other at directly in front facing the arm (\"laptop\").
In our trials, this camera placement worked, but we needed to make the camera zoom-up to the scene so that they capture better spacial resolution.
Another thing worth experimenting is the wrist cam. More to come later.
Tip
Following commands are registered in Bash history inside the lerobot
container.
wandb login\nexport HF_USER=\npython lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/koch_test_$(date +%Y%m%d_%H%M%S) \\\n --tags tutorial \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10\n
Tip
If you plan to perfom training on a different machine, scp
the dataset directory.
To another JetsonTo other PC scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/data/${HF_USER}/\n
scp -r data/lerobot/data/${HF_USER}/koch_test_01/ <USER>@<IP>:/home/<USER>/lerobot/data/${HF_USER}/\n
"},{"location":"lerobot.html#4-train-a-policy-on-your-data","title":"4. Train a policy on your data","text":"You should operate on ther container's terminal.
Follow the official document's section.
Tip
wandb login\nDATA_DIR=data python lerobot/scripts/train.py \\\n dataset_repo_id=${HF_USER}/koch_test \\\n policy=act_koch_real \\\n env=koch_real \\\n hydra.run.dir=outputs/train/act_koch_test \\\n hydra.job.name=act_koch_test \\\n device=cuda \\\n wandb.enable=true\n
Tip
If you perform the training on other Jetson or PC, scp
the outputs directory content back to the orinal Jetson that has the leader and follower arm attached.
scp -r outputs/train/act_koch_test_01/ <USER>@<IP>:/ssd/jetson-containers/data/lerobot/outputs/train/ \n
"},{"location":"lerobot.html#5-evaluate-your-policy","title":"5. Evaluate your policy","text":"You should operate on the container's terminal.
Follow the official document's section.
Tip for a. Use koch.yaml
and our record
function
Modify the command in the bash history to add -p
arugment to points to the policy checkpoint.
python lerobot/scripts/control_robot.py record \\\n --robot-path lerobot/configs/robot/koch.yaml \\\n --fps 30 \\\n --root data \\\n --repo-id ${HF_USER}/eval_koch_test_01 \\\n --tags tutorial eval \\\n --warmup-time-s 5 \\\n --episode-time-s 30 \\\n --reset-time-s 30 \\\n --num-episodes 10 \\\n -p outputs/train/act_koch_test/checkpoints/last/pretrained_model \\\n --run-compute-stats 0\n
Tip for Visualize evaluation afterwards
python lerobot/scripts/visualize_dataset.py \\\n --root data \\\n --repo-id ${HF_USER}/eval_koch_test\n
If everything goes well, you should see
"},{"location":"lerobot.html#basic-walkthrough","title":"Basic Walkthrough","text":"This is from the lerobot top README.md.
"},{"location":"lerobot.html#visualize-datasets","title":"Visualize Datasets","text":"Outside of container, first launch the rerun.io visualization tool that LeRobot uses [\u2197]
pip3 install rerun-sdk\nrerun\n
Then, start the docker container to playback one of these LeRobot datasets.
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/visualize_dataset.py \\\n --repo-id lerobot/pusht \\\n --episode-index 0\n
"},{"location":"lerobot.html#evaluate-a-pretrained-diffusion-policy","title":"Evaluate a Pretrained Diffusion Policy","text":"This will download and run a pre-trained diffusion model on the PushT environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/eval.py \\\n -p lerobot/diffusion_pusht \\\n eval.n_episodes=10 \\\n eval.batch_size=10\n
"},{"location":"lerobot.html#train-your-own-act-policy","title":"Train your own ACT Policy","text":"Next, train ACT on the Aloha manipulation environment [\u2197]
jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n python3 lerobot/scripts/train.py \\\n policy=act \\\n env=aloha \\\n env.task=AlohaInsertion-v0 \\\n dataset_repo_id=lerobot/aloha_sim_insertion_human \n
See Trossen Robotics for dual-arm ALOHA kits, and Robotis for the low-cost Koch v1.1 kit used in the LeRobot tutorial:
HuggingFace LeRobot - Get Started with Real-World Robots (YouTube Playlist)
"},{"location":"llama_vlm.html","title":"Llama 3.2 Vision","text":"The latest additions to Meta's family of foundation LLMs include multimodal vision/language models (VLMs) in 11B and 90B sizes with high-resolution image inputs (1120x1120) and cross-attention with base completion and instruction-tuned chat variants:
Llama-3.2-11B-Vision
Llama-3.2-11B-Vision-Instruct
Llama-3.2-90B-Vision
Llama-3.2-90B-Vision-Instruct
While quantization and optimization efforts are underway, we have started with running the unquantized 11B model in a container based on HuggingFace Transformers that has been updated with the latest support for Llama-3.2-Vision a jump start on trying out these exciting new multimodal models - thanks to Meta for continuing to release open Llama models!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
12.8GB
for llama-vision
container image Space for models (>25GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Request access to the gated models here with your HuggingFace API key.
"},{"location":"llama_vlm.html#code-example","title":"Code Example","text":"Today Llama-3.2-11B-Vision is able to be run on Jetson AGX Orin in FP16 via HuggingFace Transformers. Here's a simple code example from the model card for using it:
import time\nimport requests\nimport torch\n\nfrom PIL import Image\nfrom transformers import MllamaForConditionalGeneration, AutoProcessor\n\nmodel_id = \"meta-llama/Llama-3.2-11B-Vision\"\nmodel = MllamaForConditionalGeneration.from_pretrained(model_id, device_map=\"auto\", torch_dtype=torch.bfloat16)\nprocessor = AutoProcessor.from_pretrained(model_id)\n\nprompt = \"<|image|><|begin_of_text|>If I had to write a haiku for this one\"\nurl = \"https://llava-vl.github.io/static/images/view.jpg\"\nraw_image = Image.open(requests.get(url, stream=True).raw)\n\ninputs = processor(text=prompt, images=raw_image, return_tensors=\"pt\").to(model.device)\noutput = model.generate(**inputs, do_sample=False, max_new_tokens=32)\n
If I had to write a haiku for this one, it would be:\n\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\n
Initial testing seems that Llama-3.2-Vision has more conversational abilities than VLMs typically retain after VQA alignment. This llama_vision.py script has interactive completion and image loading to avoid re-loading the model. It can be launched from the container like this:
jetson-containers run \\\n -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n $(autotag llama-vision) \\\n python3 /opt/llama_vision.py \\\n --model \"meta-llama/Llama-3.2-11B-Vision\" \\\n --image \"/data/images/hoover.jpg\" \\\n --prompt \"I'm out in the\" \\\n --max-new-tokens 32 \\\n --interactive\n
After processing the initial image, it will ask you to submit another prompt or image:
total 4.8346s (39 tokens, 8.07 tokens/sec)\n\nEnter prompt or image path/URL:\n\n>> \n
We will update this page and container as support for the Llama-3.2-Vision architecture is added to quantization APIs like MLC and llama.cpp for GGUF, which will reduce the memory and latency.
"},{"location":"nerf.html","title":"Nerfstudio - Neural Reconstruction","text":"Nerfstudio is a tool for training and visualizing Neural Radiance Fields (NeRF) models for 3D volumetric reconstruction from a collection of images taken of a scene or environment. It's designed to be easy to use and starts generating imagery at the outset, which becomes further refined as training progresses. Neural reconstruction is being applied to edge devices and robotics like with FruitNeRF. This tutorial will guide you through the process of training high-quality NeRF models using nerfstudio on Jetson.
Special thanks to Johnny N\u00fa\u00f1ez Cano for porting the NeRF stack to Jetson! See NeRF in 2023: Theory and Practice by It-Jim for useful background info.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
17.6GB
for nerfstudio
container image Space for models and datasets (>5GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"nerf.html#start-container","title":"Start Container","text":"Use this command to automatically run, build, or pull a compatible container image for nerfstudio:
jetson-containers run $(autotag nerfstudio)\n
To mount your own directories into the container, use the -v
or --volume
flags: jetson-containers run -v /path/on/host:/path/in/container $(autotag nerfstudio)\n
Later a webserver will be reachable at http://localhost:7007
. The official docs include command-line usage and models [\u2197]
"},{"location":"nerf.html#training-your-first-model","title":"Training Your First Model","text":"It's recommended to train a test scene first by following these steps from the getting started tutorial:
# Download some test data\nns-download-data nerfstudio --capture-name=poster\n\n# Train model\nns-train nerfacto --data /data/nerfstudio/models\n
After it loads, you can navigate your browser to http://localhost:7007
(or substitute your Jetson's IP address)
The 3D visualizations will begin rendering after the first training step, and progressively become more refined. This process typically takes an hour on Jetson AGX Orin to complete the default setting of 30,000 training steps. The model checkpoints are saved under the mounted jetson-containers/data/nerfstudio
directory, and generate the same output quality as before.
"},{"location":"nerf.html#fruitnerf","title":"FruitNeRF","text":"The creators behind FruitNeRF fused NeRFs with segmentation for more accurate counting of fruits in 3D. By training a semantic neural radiance field on both RGB and segmentation labels from SAM, the model is robust to occlusions, double-counting, and mistaken or undesirable fruits.
Extended sampling of the volumetric grid can then blob objects for pose estimation, occupancy mapping, and navigation. Such approaches are promising for combining the strengths of NeRFs for 3D reconstruction, segmentation VITs for open-vocabulary classification, and Jetson's onboard compute for high-definition environmental scanning and perception while in the field.
"},{"location":"nerf.html#data-preparation","title":"Data Preparation","text":"We'll use the FruitNeRF datasets as an example of training NeRFs on custom data with nerfstudio. To train a NeRF model on a different scene, you first need to capture a set of images and corresponding camera poses (which can be estimated with the included photogrammetry and image registration tools like COLMAP and OpenSFM as shown here)
The images should be taken from different viewpoints of the scene you want to model, structured in the nerfstudio dataset format. For FruitNeRF, you can just download and extract pre-recorded real or synthetic data from here:
Real DataSynthetic data cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Real.zip\nunzip FruitNeRF_Real.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Synthetic.zip\nunzip FruitNeRF_Synthetic.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n
"},{"location":"nerf.html#training-a-fruitnerf-model","title":"Training a FruitNeRF Model","text":"This command will generate a NeRF on the first tree (there are multiple trees to try from each dataset). You can find the complete usage information and models on the official GitHub repo: github.com/meyerls/FruitNeRF
ns-train nerfacto \\\n --data /data/nerfstudio/FruitNeRF_Dataset/tree_01 \\\n --output-dir /data/nerfstudio/models \\\n --pipeline.datamanager.camera-res-scale-factor 0.5\n
The memory usage depends on the number of images in the dataset and their resolution, so if you're running low on memory try adjusting the scaling factors and image sampling parameters. The simulated data has a lower resolution and can run at full size.
"},{"location":"openvla.html","title":"OpenVLA - Vision/Language Action Models for Embodied Robotics","text":"Fine Tuning and Deployment Guide
The tutorials's goal is to provide optimized quantization and inference for deploying VLA models, along with reference fine-tuning workflows for adapting models for new robots, tasks, and environments. Rigorous performance and accuracy validation is applied in a self-contained sim environment with scenario generation and domain randomization (MimicGen). Future phases will include sim2real with Isaac Lab and ROS2 integration, study of related models like CrossFormer and optimizations to the neural architecture for realtime performance.
\u2705 Quantization and inference optimizations for VLA models \u2705 Accuracy validation of the original OpenVLA-7B weights \u2705 Reference fine-tuning workflow with synthetic data generation \u2705 On-device training with LoRA's on Jetson AGX Orin and full fine-tuning on A100/H100 instances \u2705 85% accuracy on an example block-stacking task with domain randomization \u2705 Sample datasets and test models for reproducing results
Thank you to OpenVLA, Open X-Embodiment, MimicGen, Robosuite and many others with related work for sharing their promising research, models, and tools for advancing physical AI and robotics.
"},{"location":"openvla.html#vla-architecture","title":"VLA Architecture","text":"OpenVLA is a vision/language action model for embodied robotics and behavioral learning built on LLM/VLMs (this base model is a Prismatic VLM using Llama-7B, DINOv2, and SigLIP). Instead of image captioning or visual question/answering, VLA models generate action tokens from camera images and natural language instructions that are used for controlling the robot. Action tokens are discrete token ID's reserved from the text tokenizer's vocabulary that map to continuous values, normalized against the range of motion of each robot. These real-valued tokens are more efficient and accurate than the model outputting numerical data as text in JSON or Pydantic formats, where each digit, decimal point, separator, and whitespace takes an additional token to generate. Other hybrid vision/language models like Florence-2 have adopted similar approaches for continuous-domain prediction using Transformers.
Each action token generated by the model represents a degree-of-freedom of the output coordinate space (i.e. xyz, rotation pose), or a component of the robot that can be controlled (like the gripper). OpenVLA-7B was trained on the Open X-Embodiment dataset for manipulation, with a 7-DoF action space consisting of (delta xyz, delta roll/pitch/yaw, gripper)
. The position and rotation are relative changes to the end-effector (EEF) pose, with an external inverse kinematics (IK) solution like cuMotion solving joint constraints specific to each robotic arm. The gripper dimension is an absolute control between 0 (open) and 1 (closed) that does not recieve further scaling/normalization.
OpenVLA reserves 256 of the least-frequently used tokens out of the Llama-7B vocabulary for action values, which gives it 8-bit resolution over the controls. It has an input image resolution of 224x224 to stacked DINOv2/SigLIP vision encoders that are projected to ~275 input tokens (plus the text prompt), and outputs 7 tokens mapped to (\u0394pos, \u0394rotation, gripper)
coordinates.
"},{"location":"openvla.html#quantization","title":"Quantization","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models and datasets (>15GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Support for OpenVLA has been added to NanoLLM on top of its streaming VLM pipeline with INT4/FP8 quantization using MLC and vision encoders in FP16 with TensorRT. First we'll test the model on BridgeData V2, one of the top weighted datasets from the Open X-Embodiment collection. The model was trained on this data and is used to confirm that the quantization and inference are working correctly during deployment. This is what the dataset looks like, courtesy of their website:
The following command starts the container, downloads the dataset and model (if needed), quantizes it on the first run, and measures the accuracy of the action values against the groundtruth from the dataset using normalized mean-squared error (NRMSE) to unbias the varying ranges each dimension of the action space can have. We extracted a 100-episode subset of the original Bridge data here on HuggingFace Hub, so you don't need to download the entire ~400GB dataset just for these tests.
INT4FP8FP16 jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model openvla/openvla-7b \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model openvla/openvla-7b \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_bridge_fp16.json\n
Quantization Accuracy Latency FPS FP16 95.3% 840 ms 1.19 FP8 95.2% 471 ms 2.12 INT4 90.1% 336 ms 2.97 These results were run on Jetson AGX Orin 64GB with JetPack 6, and we will see later with our fine-tuned model the INT4 accuracy match FP8/FP16.
Each frame, the 7D action vector predicted by the model is printed along with the groundtruth, along with the accuracy, latency, and framerate for that frame. The numbers printed after ~
are the averages of those so far, with the last value reported being the mean over the entire dataset processed.
# INT4\nstep 355 [-0.02692 0.00776 -0.00299 0.08160 0.07292 0.04791 0.99608] accuracy 0.8466 ~0.9017 time=336.2 ms fps=2.96 ~2.97\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP8\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9523 time=469.7 ms fps=2.13 ~2.12\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n\n# FP16\nstep 355 [-0.02392 0.00767 -0.00310 0.08160 0.07692 0.03217 0.99608] accuracy 0.9982 ~0.9531 time=842.3 ms fps=1.19 ~1.18\ngt 355 [-0.02387 0.00760 -0.00318 0.15965 0.07707 0.03281 1.00000]\n
The per-frame metrics and averages can be saved with the --save-stats
argument, and in the interests of time you can cap the amount of episodes processed with --max-episodes
. As mentioned above, the Bridge dataset used was included in the training dataset, and further below we run this again on data we generated not from the training dataset with significant variation. This tool can also load other datasets in RLDS/TFDS format from Open X-Embodiment, and HDF5 from Robomimic/MimicGen. You can also create your own agents and scripts using the exposed APIs from the coding examples below.
"},{"location":"openvla.html#inference-api","title":"Inference API","text":"The code is simple for running VLA inference on camera streams using the NanoLLM library in the container:
VLA on Videofrom nano_llm import NanoLLM\nfrom nano_llm.plugins import VideoSource\n\n# load vision/language action model\nmodel = NanoLLM.from_pretrained(model, quantization='q4f16_ft')\ncamera = VideoSource(video_source, cuda_stream=0)\n\nassert(model.vla) # make sure this is a VLA\n\nwhile True:\n # returns a cudaImage, np.ndarray, or torch.Tensor on the GPU\n image = camera.capture()\n\n if image is None: # in case of timeout, keep trying\n continue\n\n # returns a np.ndarray or torch.Tensor with vla.dof elements\n # for OpenVLA, this is (\u0394x, \u0394y, \u0394z, \u0394roll, \u0394pitch, \u0394yaw, gripper)\n actions = model.vla.predict_action(\n image, \n instruction=\"pick up the nearest object\", \n action_space=\"normalized\",\n return_tensors='np',\n )\n\n # send the actions to your robot controller or IK solver\n ...\n
VLA models are also supported in Agent Studio, which includes the simulator components as well.
"},{"location":"openvla.html#online-validation","title":"Online Validation","text":"Given the challenging task domain, dynamic feedback loops, and computational demands for sim/training/inference, using VLAs for language-guided dexterous manipulation involves a significant increase in complexity over baseline usage of LLMs and VLMs. To go from predicting logits at the token level to actions consistently correct enough over an extended series of frames to form useful behaviors, it's important to cross-check outputs and measure accuracy at each stage of the training/inference workflow to be able to identify the source of potential regressions when they occur.
Unlike typical applications in supervised learning, the metrics for end-task completion and success aren't measured from static pre-recorded datasets that don't account for the temporal domain and feedback from physical interactions along with compounding errors - they require online validation, either in simulation or real-world tests.
Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab
During training the token classification accuracy is measured from the groundtruth action labels (i.e. how many action tokens were predicted exactly right), with the model optimizing to minimize this loss (as is normal for LLMs). Action accuracy in the continuous domain is also is also measured during training from the L1 error of the detokenized real-valued outputs. Continuous action accuracy trends slightly higher than token classification accuracy, as the later does not provide any reward for being closer to the desired result. In practice, these should be >95% accurate at this level for completing tasks successfully in similar environments. To achieve that high degree of accuracy, it seems intentional in the work and related research to overfit the model by training it for many epochs (upwards of 30 epochs on the same 900K episodes for OpenVLA). Transformers are known to recall specific knowledge from few training examples, and are sensitive to overfitting and forgetting previously learned information. As such, LLMs are normally only trained for a few epochs at most to preserved their zero-shot capabilities and ability to generatize to out-of-distribution inputs. During the fine-tuning part of this project, we characterize the impacts on model accuracy and task success from the number of distinct training episodes versus the number of epochs over repeated data.
The actual task success rate doesn't get measured until the inference stage, when it is either connected to a simulator or physically tested in a series of time-consuming trials under similar conditions. We integrated MimicGen directly with the OpenVLA training scripts for an endless source of unseen data, but encountered gradient instabilities after the model had received a significant number of episodes.
"},{"location":"openvla.html#simulation-with-mimicgen","title":"Simulation with MimicGen","text":"MimicGen creates randomized episodes from as few as 10 teleoperated examples by utilizing scene graph information and task/subtask metadata about which objects in the environment are targets of the current subtask, in order to interpolate the original teloperated trajectories into their new random locations and poses. This generates large amounts of unique training data to improve robustness, without needing large amounts of human effort for the robot learning new skills and behaviors.
MimicGen is built on the Robomimic and Robosuite simulators and are able to run onboard Jetson headlessly alongside the VLA, simplifying the setup for reproducibility. The RoboCasa project is built on MimicGen and being integrated with NVIDIA Omniverse and OSMO, and in future work we'd use Isaac Lab for scalability, more accurate physics, and photorealistic rendering.
MimicGen includes 12 tasks like block stacking, pick and place, assembly, and kitchen scenarios. And each type of task has variants increasing in difficulty as learning progresses, which would be interesting to compare curated approaches to the purely random sequencing that OpenVLA uses with Open X-Embodiment. In this phase of the tutorial, we focus on the block stacking task to understand the training requirements and runtime performance needed to master a new task with success rates of >75-80%, similar to the paper. This will help inform scaling to multiple behaviors and more complex scenarios that vary significantly from in-distribution examples like the MimicGen environments (as evidenced by the original OpenVLA weights scoring zero successes in them).
"},{"location":"openvla.html#data-generation","title":"Data Generation","text":"We built MimicGen containers for Jetson from a fork of the code with some patches for aarch64+igpu along with enhancements like generation of natural language labels with random variations for the relevant tasks, along with additional domain randomization for the colors/materials of objects (these environments were added as Stack_D2
, Stack_D3
, and Stack_D4
). For training OpenVLA, the images and labels are saved to disk, whereas later inference is done with online simulation to measure the task success rate. To that effect we integrated MimicGen with Agent Studio for interactively testing the models and quickly dropping in components like ASR for verbally commanding the robot.
Online Training
There's initial support for direct integration of MimicGen in this fork of OpenVLA for live simulation and validation during training and endless episodes without repeating epochs. The models experienced spiking gradients later into LoRA's, and should try again with lower learning rates or by similarly integrating MimicGen into their full fine-tuning script using FDSP for increasing the batch size on dGPU.
This command will generate the specified number of training episodes, saved in Robomimic HDF5 format. We provide the rendered datasets for these on HuggingFace Hub with 1000 and 2500 episodes. OpenVLA suggests only needing 10-150 episodes for fine-tuning and data-efficient adaptation, which perhaps performs similarly in comparable spaces, but we ultimately found insufficient for the MimicGen environments.
jetson-containers run $(autotag nano_llm) \\\n python3 -m mimicgen.generate \\\n --tasks Stack_D4 \\\n --episodes 100 \\\n --output /data/datasets/mimicgen \\\n --cameras agentview \\\n --camera-width 224 \\\n --camera-height 224\n
The HDF5 dataset will be saved to /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5
(which is in a mounted volume under your jetson-containers/data
directory outside of the container), along with a video of sample episodes that were rendered:
This video is actually of Stack_D2 to avoid subjecting everyone to flickering colors. Stack_D4 is used for training and generates blocks with random colors and positions each frame, along with language labels augmented through the random combination of various nouns, adjectives, and verbs that form the instruction (Stack the red block on the green block
, Put the little cube on top
). Stack_D3 randomizes colors/positions each frame, and instructions each episode. Stack_D2 does them all per-episode (which is typically used at runtime). Since OpenVLA uses a single frame at a time with no temporal aspect during training, applying domain randomization per-frame as opposed to per-episode is feasible provides more variance in the dataset. The block-stacking episodes typically come out to be around ~110 frames each, and take around 10-15 seconds to generate per episode on Jetson AGX Orin with per-frame domain randomization, and 5 seconds per episode without domain randomization.
The agentview camera looks onward from the front of the scene towards the robot. There are others available like sideview and eye_in_hand (wrist view) - we tried using the onboard wrist camera, but found the model would too easily veer off track and get 'lost' offscreen. It may be possible for wrist-only to work should the dataset add examples of the robot recovering and returning to a wider vantage point. Other VIT-based embodied models like Octo and CrossFormer use both cameras, and is a future experiment with VLA's based on multi-image VLM's like VILA.
"},{"location":"openvla.html#rlds-conversion","title":"RLDS Conversion","text":"OpenVLA uses datasets in RLDS format (which is based on TFDS), so we provide a converter from HDF5. This extra step can also be time-consuming for a large number of epiodes, like those used here. This is one of the reasons we desire to run MimicGen online with training and performed the initial integration directly with OpenVLA. Unless you are generating different data, you can skip this and use the MimicGen datasets that we uploaded here in RLDS format.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.datasets \\\n --dataset /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5 \\\n --dataset-type mimicgen \\\n --convert rlds \\\n --remap-keys agentview:image \\\n --output /data/datasets/mimicgen/rlds/stack_d4_ep2500\n
This will create a set of tfrecord files under the output directory that are able to be loaded by the OpenVLA training scripts.
"},{"location":"openvla.html#fine-tuning","title":"Fine Tuning","text":"A primary objective of this project is to characterize the training needed to adapt the model to different robots and tasks. Our development primarily consisted of running test LoRA's onboard Jetson AGX Orin 64GB and debugging issues locally, and when the results were encouraging to perform a full fine-tuning with FDSP on multiple A100/H100's from spot instance providers like Brev.dev, Vast.ai, and RunPod. Full fine-tuning on 2x Jetson AGX Orin's was attempted with FDSP, but ran out of memory with all settings that we tried. We provide the test models we trained on HuggingFace for others to try in the inference + sim setup below. Below are the training GPU configurations that were used, along with their batch sizes that maximized memory usage:
Batch Size FPS $/hr Jetson AGX Orin 64GB 8 (LoRA) 1.57 - 2x A100 SMX4 80GB 48 13.07 ~$1.50 8x H100 NVL 94GB 256 92.4 ~$25 The rental fees are ballpark averages over the spot instances available with these GPUs at the time of writing, and becomes quite reasonable when used alongside a Jetson repurposed for training daily test LoRA's on a reduced amount of data. Training until convergence on Jetson and 2xA100 took roughly 24-36 hours depending on the amount of data and number of epochs. We kept to <5 epochs for the full fine-tunes in an attempt to prevent the afformentioned overfitting, instead opting to increase the number of episodes.
Below we provide the steps to run the OpenVLA LoRA training on Jetson, and for the dGPU systems refer to Fully Fine-Tuning OpenVLA. Typically you will launch a spot instance with your provider of choice in a CUDA or PyTorch container, then install the OpenVLA repo and its dependencies with pip, and download your dataset to the system before launching the command (or create a bundled container with it all included to save time). Here's the WandB Dashboard from the full fine-tuning runs that you can inspect, comparing a fewer number of episodes for more epochs, versus a larger number of episodes trained for fewer epochs:
"},{"location":"openvla.html#on-device-lora","title":"On-Device LoRA","text":"The OpenVLA repo provides working training scripts for LoRA/qLoRA and multi-node multi-GPU full fine-tunes using PyTorch FDSP. It was not difficult to go in a make changes and enhancements, some of which we have done for our purposes of on-device training in this fork. Overall we found the process to be more similar than not to training other vision DNNs, just with larger datasets and rigorous validation required of the data pipeline that all the coordinate spaces and transformations matched up at every step of the sim\u2192training\u2192inference workflow.
We built an OpenVLA container for JetPack that runs the LoRA training, which you can find the specific documentation about from the OpenVLA readme (it's also recommended to read their paper which includes many insights into the training process).
jetson-containers run $(autotag openvla) \\\n torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py \\\n --vla_path openvla/openvla-7b \\\n --data_root_dir /data/datasets/mimicgen/rlds \\\n --dataset_name stack_d4_ep2500 \\\n --run_root_dir /data/models/openvla \\\n --lora_rank 32 \\\n --batch_size 8 \\\n --grad_accumulation_steps 2 \\\n --learning_rate 5e-4 \\\n --image_aug False \\\n --save_steps 250 \\\n --epochs 5\n
This will start a TensorBoard server on port 6006 to monitor the training progress. Typically you would set the script running for more epochs than you intend to actually run, so that you can instead stop when the model converges (typically occurring with a loss below 0.5 and token accuracy >95%). This script was adapted so that if you interrupt training by pressing Ctrl+D
from the terminal, it will gracefully stop early and still merge the LoRA weights before exiting. If training is terminated otherwise, we added a merge.py
script that you should run afterwards get the model ready for inference.
"},{"location":"openvla.html#validation","title":"Validation","text":"Now that we have trained our test model (or you can download one from here), let's re-validate it again like we did above on the original OpenVLA model, but this time on unseen data from MimicGen with a different random seed (dusty-nv/mimicgen-stack_d4-ep100
). These commands will download and run the fully fine-tuned checkpoint (on 2500 episodes for 4 epochs) that we released to (dusty-nv/openvla-7b-mimicgen
. If you trained your own model, you can substitute the local path to the HF checkpoint.
INT4FP8FP16 jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q4f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_int4.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api mlc \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --quantization q8f16_ft \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp8.json\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.vla --api hf \\\n --model dusty-nv/openvla-7b-mimicgen \\\n --dataset dusty-nv/bridge_orig_ep100 \\\n --dataset-type rlds \\\n --max-episodes 10 \\\n --save-stats /data/benchmarks/openvla_mimicgen_fp16.json\n
The results from this are collated in the next section along with the end-task success rates. Time to see it in action!
"},{"location":"openvla.html#inference-simulation","title":"Inference + Simulation","text":"To measure how well our model actually performs at completing the task, we spin up a MimicGen environment in Agent Studio that's connected to the VLA model. It counts the number of successful episodes by checking the reward issued by the sim, which is not used by the model but signals when the task was completed. We use a horizon of 200 frames for evaluation, after which it is deemed to be a failure.
INT4FP8FP16 jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-INT4\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP8\n
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP16\n
To start the benchmarking, connect the output of the AutoPrompt
node to the OpenVLA
node. On its own it will run forever - we did it for 100 episodes each, which can take several hours since the sim operates in lock step with the model (future experiments will train on actions accumulated from multiple timesteps and also reduce the model size to improve performance).
Quantization Train Accuracy Val Accuracy Task Success Avg Frames Latency FPS FP16 96.5% 85.4% 86% 132 827 ms 1.20 FP8 96.2% 85.1% 85% 131 467 ms 2.14 INT4 95.4% 84.4% 84% 138 335 ms 2.98 This is using the model fine-tuned on 2500 episodes for 4 epochs, and although the task may have been simple, is evidence of achieving the sought-after success rates of ~85%. Quantization has a negligible ~1% impact while scaling performance almost linearly. The average number of frames is how long it took the robot to complete the task, which efficiency is another important end-metric to evalulate models by (consider the source teleop episodes were ~110 frames long, and we realized after that these averages include failed episodes during evaluation). The training dataset dusty-nv/bridge_orig_ep2500
was used to measure the action Train Accuracy, while the previously unused and distinct dusty-nv/bridge_orig_ep100
was used for Validation Accuracy.
"},{"location":"openvla.html#episodes-vs-epochs","title":"Episodes vs Epochs","text":"Upon measuring the success rates of the other fine-tuned models that were trained on fewer episodes for more epochs, we can see the impact of increasing the size of the dataset:
Episodes Epochs Frames Task Success Avg Frames 500 10 550K 23% 186 1000 6 660K 48% 165 2500 4 1.1M 84% 138 That isn't to say that the 2500-episode model still isn't overfit - it is after learning to always stack the smallr block and essentially ignore the instructions. More task diversity in the training is required, which we can do now that we have the workflow. And we want to branch out into real-world domains instead of test environments in simulation to prove model viability. The amount of variance and data required to achieve generalist models in the challenging problem space of manipulation does raise interesting questions about purely random task learning versus more guided and curated approaches that ramp up in complexity as training progresses.
"},{"location":"openvla.html#future-research","title":"Future Research","text":"Embodied agents are an exciting emerging area at the forefront of robotics and physical AI, with many promising avenues to investigate further. Follow-up publications to OpenVLA include Embodied-CoT and CrossFormer, along with others sure to be in development. In addition to proceeding to physical testing, these areas of interest we'll experiment with as research progresses:
Smaller LLM with higher-resolution vision encoder(s) Multiple image inputs - multiple cameras and/or timesteps Action states from previous frame(s) as input Training on consecutive actions for larger timesteps Similar test model for UGV's in sim Using Isaac Lab and Robocasa sim2real deployment with ROS2 Trade-offs of VLA vs VIT-based approaches "},{"location":"research.html","title":"Jetson AI Lab Research Group","text":"The Jetson AI Lab Research Group is a global collective for advancing open-source Edge ML, open to anyone to join and collaborate with others from the community and leverage each other's work. Our goal is using advanced AI for good in real-world applications in accessible and responsible ways. By coordinating together as a group, we can keep up with the rapidly evolving pace of AI and more quickly arrive at deploying intelligent multimodal agents and autonomous robots into the field.
There are virtual meetings that anyone is welcome to join, offline discussion on the Jetson Projects forum, and guidelines for upstreaming open-source contributions.
Next Meeting - 1/21
The next team meeting is on Tuesday, January 21st at 9am PST - see the invite below or click here to join the meeting in progress.
"},{"location":"research.html#topics-of-interest","title":"Topics of Interest","text":"These are some initial research topics for us to discuss and investigate. This list will vary over time as experiments evolve and the SOTA progresses:
\u2022 Controller LLMs for dynamic pipeline code generation \u2022 Fine-tuning LLM/VLM onboard Jetson AGX Orin 64GB \u2022 HomeAssistant.io integration for smart home [1] [2] \u2022 Continuous multi-image VLM streaming and change detection \u2022 Recurrent LLM architectures (Mamba, RKVW, ect) [1] \u2022 Lightweight low-memory streaming ASR/TTS models \u2022 Diffusion models for image processing and enhancement \u2022 Time Series Forecasting with Transformers [1] [2] \u2022 Guidance, grammars, and guardrails for constrained output \u2022 Inline LLM function calling / plugins from API definitions \u2022 ML DevOps, edge deployment, and orchestration \u2022 Robotics, IoT, and cyberphysical systems integration New topics can be raised to the group either during the meetings or on the forums (people are welcome to work on whatever they want of course)
"},{"location":"research.html#contribution-guidelines","title":"Contribution Guidelines","text":"When experiments are successful, ideally the results will be packaged in such a way that they are easily reusable for others to integrate into their own projects:
Open-source libraries & code on GitHub Models on HuggingFace Hub Containers provided by jetson-containers Discussions on the Jetson Projects forum Documentation & tutorials on Jetson AI Lab Hackster.io for hardware-centric builds Ongoing technical discussions are encouraged to occur on the forums or GitHub Issues, with status updates on projects given during the meetings.
"},{"location":"research.html#meeting-schedule","title":"Meeting Schedule","text":"We'll aim to meet monthly or bi-weekly as a team in virtual meetings that anyone is welcome to join and speak during. We'll discuss the latest updates and experiments that we want to explore. Please remain courteous to others during the calls. We'll stick around after for anyone who has questions or didn't get the chance to be heard.
Tuesday January 21st at 9am PST (1/21/24)
Microsoft Teams - Meeting Link Meeting ID: 264 770 145 196
Passcode: Uwbdgj
Outlook Invite: Jetson AI Lab Research Group.ics
The agenda will be listed here beforehand - post to the forum to add agenda items. The meetings will be recorded so anyone unable to attend live can watch them after.
"},{"location":"research.html#past-meetings","title":"Past Meetings","text":"Recordings Archive The meeting recordings are stored on Google Drive here: https://drive.google.com/drive/folders/18BC7o32jorx_LzZXx5wW0Io_nf1ZwO6X?usp=sharing
October 15, 2024 Topics Covered:
HuggingFace LeRobot (Chitoku Yato) Stanley H1 Humanoid (Dave Niewinski) October 1, 2024 Topics Covered:
ReMEmbR: Long-Horizon Memory for Navigation (Abrar Anwar) Diffusion Policies, Shape LLM, 3D Encoders September 17, 2024 Topics Covered:
NASA JPL - ROSA (Rob Royce & Shehryar Khattak) LeRobot Walkthrough (Chitoku Yato) VLM Agent in Isaac Sim/ROS (Kabilan Kb) September 3, 2024 Topics Covered:
Edge NeRF's and nerfstudio (Johnny N\u00fa\u00f1ez Cano) Review of OpenVLA results (Dustin Franklin) Oculus Interface for Jetson (Al Costa) TAO Toolkit 5.5 August 20, 2024 Topics Covered:
GenAI ROS Nodes for VLM (Khannah Shaltiel) Isaac Sim and Orin Nano with Hardware-in-the-Loop (Kabilan Kb) Miru Edge Deployment Infrastructure (Vedant Nair) August 6, 2024 Topics Covered:
OpenVLA Fine-Tuning Gemma-2-2b (Asier Arranz) Ultralytics YOLOv8 (Lakshantha Dissanayake) July 23, 2024 Topics Covered:
Llama-3 Function & Specs OpenVLA with MimicGen Phi-3 Vision via ONNX (Jambo Chen) OStream GenRunner (Kerry Shih) July 9, 2024 Topics Covered:
OpenVLA Quantization (openvla.github.io) visualnav-transformer (robodhruv/visualnav-transformer) Florence-2, Holoscan, Grammars (Nigel Nelson, florence-2-vision) LLaMa-Factory (hiyouga/LLaMA-Factory) June 25, 2024 Topics Covered:
Function Calling in Agent Studio Jetson Copilot (Chitoku Yato) Jetson Platform Services (Sammy Ochoa) On-device Fine-tuning (Nurgaliyev Shakhizat) June 11, 2024 Topics Covered:
Agent Studio HomeAssistant 2024.6 AWS IoT Greengrass (Romil Shah) May 29, 2024 Topics Covered:
OpenAI-style Tools with NousResearch/Hermes-2-Pro-Llama-3-8B Jetson Copilot with jetrag whisper_trt for Orin Nano May 15, 2024 Topics Covered:
VILA-1.5 on Video Sequences Voicecraft Container (Martin Cerven) JetBot / Nanosaur Updates for Orin Nano (Chitoku Yato & Raffaello Bonghi) Controller LLM & Advanced Function Calling (NousResearch/Hermes-2-Pro-Llama-3-8B
) RAG Samples with LlamaIndex (Chitoku Yato) May 1, 2024 Topics Covered:
Function Calling with Llama-3 Home Assistant / Wyoming (Mieszko Syty) Smart Sorting / Recycling (Alvaro Costa) April 17, 2024 Topics Covered:
Ollama Support for Jetson Devices Home Assistant Integration jetson-container
Updates Upgrading JetBot with Orin Nano April 3, 2024 Project Kickoffs:
Home Assistant Integration Agent Controller LLM ML DevOps, Containers, Core Inferencing "},{"location":"research.html#team-members","title":"Team Members","text":"Below are shown some of the sustaining members of the group who have been working on generative AI in edge computing:
Dustin Franklin, NVIDIA Principal Engineer | Pittsburgh, PA (jetson-inference, jetson-containers) Nurgaliyev Shakhizat Institute of Smart Systems and AI | Kazakhstan (Assistive Devices, Vision2Audio, HPC) Kris Kersey, Kersey Fabrications Embedded Software Engineer | Atlanta, GA (The OASIS Project, AR/VR, 3D Fabrication) Johnny N\u00fa\u00f1ez Cano PhD Researcher in CV/AI | Barcelona, Spain (Recurrent LLMs, Pose & Behavior Analysis) Doruk S\u00f6nmez, ConnectTech Intelligent Video Analytics Engineer | Turkey (NVIDIA DLI Certified Instructor, IVA, VLM) Akash James, Spark Cognition AI Architect, UC Berkeley Researcher | Oakland (NVIDIA AI Ambassador, Personal Assistants) Mieszko Syty, MS/1 Design AI/ML Engineer | Warsaw, Poland (LLM, Home Assistants, ML DevOps) Jim Benson, JetsonHacks DIY Extraordinaire | Los Angeles, CA (AI in Education, RACECAR/J) Chitoku Yato, NVIDIA Jetson AI DevTech | Santa Clara, CA (JetBot, JetRacer, MinDisk, Containers) Dana Sheahen, NVIDIA DLI Curriculum Developer | Santa Clara, CA (AI in Education, Jetson AI Fundamentals) Sammy Ochoa, NVIDIA Jetson AI DevTech | Austin, TX (Metropolis Microservices) John Welsh, NVIDIA (NanoOWL, NanoSAM, JetBot, JetRacer, torch2trt, trt_pose, Knowledge Distillation) Dave Niewinski Dave's Armoury | Waterloo, Ontario (GLaDOS, Fetch, Offroad La-Z-Boy, KUKA Bot) Gary Hilgemann, REBOTNIX CEO & AI Roboticist | L\u00fcnen, Germany (GUSTAV, SPIKE, VisionTools, GenAI) Elaine Wu, Seeed Studio AI & Robotics Partnerships | Shenzhen, China (reComputer, YOLOv8, LocalJARVIS, Voice Bot) Patty Delafuente, NVIDIA Data Scientist & UMBC PhD Student | MD (AI in Education, DLI Robotics Teaching Kit) Song Han, MIT HAN Lab NVIDIA Research | Cambridge, MA (Efficient Large Models, AWQ, VILA) Bryan Hughes, Mimzy AI Founder, Entrepreneur | SF Bay Area (Multimodal Assistants, AI at the Edge) Tianqi Chen, CMU Catalyst OctoML, CTO | Seattle, WA (MLC, Apache TVM, XGBoost) Michael Gr\u00fcner, RidgeRun Team Lead / Architect | Costa Rica (Embedded Vision & AI, Multimedia) Jesse Flot, CMU Robotics Academy Co-Director | Pittsburgh, PA (Applied AI & Robotics, Autonomy Foundations) Paul DeCarlo, Microsoft Professor | University of Houston (Azure IoT, Video Analytics, Microsoft JARVIS) Mike Hansen, Nabu Casa Voice Engineering Lead | Urbandale, Iowa (Home Assistant, Piper TTS, Wyoming) Lakshantha Dissanayake, Ultralytics Embedded CV Engineer | Vancouver, BC (YOLOv8, TensorRT, DeepStream) Kerry Shih, GenAI Nerds Founder, CEO | Los Angeles, CA (Gapi) Ziad Youssfi, CMU ECE Professor | Pittsburgh, PA (ML in Robotics & Embedded Systems) Walter Lucetti, Stereolabs Robotics & Vision Engineer | Paris, France (MyzharBot, ROS2, GStreamer) Raffaello Bonghi, NVIDIA AI & Robotics Engineer | Manchester, UK (Nanosaur, Panther, jetson-stats) Alvaro Costa, ANS Group AI & Robotics Lead | Epsom, UK (TeknTrash, StipraPOD) David Pearson, ConnectTech Embedded Systems Engineer | Ontario, CA (Edge AI Systems, Vision/Language Models) Jason Seawall, Numurus CEO | Seattle, WA (NEPI, Edge AI & Automation) Martin Cerven AI Researcher | Germany (Autonomous Robotics, Voicecraft) Romil Shah, Amazon GenAI IIoT @ AWS | San Jose, CA (aws-samples/genai-at-edge
) Kabilan Kb, Roboticist NVIDIA DLI Ambassador | Tamil Nadu, IN (ROS2 Tutorials, Autonomous Wheelchair)"},{"location":"ros.html","title":"ROS2 Nodes for Generative AI","text":"The ros2_nanollm
package provides ROS2 nodes for running optimized LLM's and VLM's locally inside a container. These are built on NanoLLM and ROS2 Humble for deploying generative AI models onboard your robot with Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm:humble
container image Space for models (>10GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"ros.html#running-the-live-demo","title":"Running the Live Demo","text":"Recommended
Before you start, please review NanoVLM and Live LLaVa demos. For primary documentation, view ROS2 NanoLLM.
Ensure you have a camera device connected
ls /dev/video*\n
Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nano_llm:humble) \\\n ros2 launch ros2_nanollm camera_input_example.launch.py\n
This command will start the launch file of the container.
By default this will load the Efficient-Large-Model/Llama-3-VILA1.5-8B
VLM and publish the image captions and overlay to topics that can be subscribed to by your other nodes, or visualized with RViz or Foxglove. Refer to the ros2_nanollm
repo for documentation on the input/output topics that are exposed.
"},{"location":"ros.html#build-your-own-ros-nodes","title":"Build your own ROS Nodes","text":"To build your own ROS2 node using LLM or VLM, first create a ROS 2 workspace and package in a directory mounted to the container (following the ROS 2 Humble Documentation). Your src folder should then look like this:
\u2514\u2500\u2500 src \n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 launch \n \u2514\u2500\u2500 camera_input.launch.py\n \u251c\u2500\u2500 resource\n \u2514\u2500\u2500 your-package-name\n \u251c\u2500\u2500 your-package-name\n \u2514\u2500\u2500 __init__.py \n \u2514\u2500\u2500 your-node-name_py.py\n \u251c\u2500\u2500 test\n \u2514\u2500\u2500 test_copyright.py\n \u2514\u2500\u2500 test_flake8.py\n \u2514\u2500\u2500 test_pep257.py\n \u251c\u2500\u2500 package.xml\n \u251c\u2500\u2500 setup.cfg\n \u251c\u2500\u2500 setup.py\n \u2514\u2500\u2500 README.md\n
We will create the launch folder, as well as the camera_input.launch.py and your-node-name_py.py files in later steps.
"},{"location":"ros.html#editing-the-setup","title":"Editing the Setup","text":"Let\u2019s begin by editing the setup.py
file. At the top of the file, add
from glob import glob \n
In the setup method, find the data_files=[]
line, and make sure it looks like this:
data_files=[\n ('share/ament_index/resource_index/packages',\n ['resource/' + package_name]),\n ('share/' + package_name, ['package.xml']),\n ('share/' + package_name, glob('launch/*.launch.py')),\n ],\n
Edit the maintainer line with your name. Edit the maintainer email to your email. Edit the description line to describe your package.
maintainer='kshaltiel', \nmaintainter_email='kshaltiel@nvidia.com', \ndescription='YOUR DESCRIPTION', \n
Find the console_scripts
line in the entry_points method. Edit the inside to be:
'your-node-name_py = your-package-name.your-node-name_py:main'\n
For example:
entry_points={\n 'console_scripts': [\n 'nano_llm_py = ros2_nanollm.nano_llm_py:main'\n ],\n },\n
All done for this file!"},{"location":"ros.html#creating-the-node","title":"Creating the Node","text":"Inside your package, under the folder that shares your package's name and contains the __init__.py
file, create a file named after your node. For NanoLLM, this file would be called nano_llm_py.py
.
Paste the following code into the empty file:
import rclpy \nfrom std_msgs.msg import String\nfrom sensor_msgs.msg import Image\nfrom cv_bridge import CvBridge\nfrom PIL import Image as im\nfrom MODEL_NAME import NECESSARY_MODULES\n\nclass Your_Model_Subscriber(Node):\n\n def __init__(self):\n super().__init__('your_model_subscriber')\n\n #EDIT PARAMETERS HERE \n self.declare_parameter('param1', \"param1_value\") \n self.declare_parameter('param2', \"param2_value\")\n\n # Subscriber for input query\n self.query_subscription = self.create_subscription(\n String,\n 'input_query',\n self.query_listener_callback,\n 10)\n self.query_subscription # prevent unused variable warning\n\n # Subscriber for input image\n self.image_subscription = self.create_subscription(\n Image,\n 'input_image',\n self.image_listener_callback,\n 10)\n self.image_subscription # prevent unused variable warning\n\n # To convert ROS image message to OpenCV image\n self.cv_br = CvBridge() \n\n #LOAD THE MODEL\n self.model = INSERT_MODEL.from_pretrained(\"PATH-TO-MODEL\")\n\n #chatHistory var \n self.chat_history = ChatHistory(self.model)\n\n ## PUBLISHER\n self.output_publisher = self.create_publisher(String, 'output', 10)\n self.query = \"Describe the image.\"\n\n def query_listener_callback(self, msg):\n self.query = msg.data\n\n def image_listener_callback(self, data): \n input_query = self.query\n\n # call model with input_query and input_image \n cv_img = self.cv_br.imgmsg_to_cv2(data, 'rgb8')\n PIL_img = im.fromarray(cv_img)\n\n # Parsing input text prompt\n prompt = input_query.strip(\"][()\")\n text = prompt.split(',')\n self.get_logger().info('Your query: %s' % text) #prints the query\n\n #chat history \n self.chat_history.append('user', image=PIL_img)\n self.chat_history.append('user', prompt, use_cache=True)\n embedding, _ = self.chat_history.embed_chat()\n\n #GENERATE OUTPUT\n output = self.model.generate(\n inputs=embedding,\n kv_cache=self.chat_history.kv_cache,\n min_new_tokens = 10,\n streaming = False, \n do_sample = True,\n )\n\n output_msg = String()\n output_msg.data = output\n self.output_publisher.publish(output_msg)\n self.get_logger().info(f\"Published output: {output}\")\n\ndef main(args=None):\n rclpy.init(args=args)\n\n your_model_subscriber = Your_Model_Subscriber()\n\n rclpy.spin(your_model_subscriber)\n\n # Destroy the node explicitly\n # (optional - otherwise it will be done automatically\n # when the garbage collector destroys the node object)\n nano_llm_subscriber.destroy_node()\n rclpy.shutdown()\n\nif __name__ == '__main__':\n main()\n
Edit the import statement at the top of the file to import the necessary modules from the model.
Next, edit the class name and name inside the __init__()
function to reflect the model that will be used.
Find the comment that reads #EDIT PARAMETERS HERE
. Declare all parameters except for the model name following the format in the file. Under the #LOAD THE MODEL
comment, include the path to the model.
Lastly, edit the generate method under the GENERATE OUTPUT
comment to include any additional parameters.
All done for this file!
"},{"location":"ros.html#creating-the-launch-file","title":"Creating the Launch File","text":"Inside your package, create the launch folder. Create your launch file inside of it.
mkdir launch\ncd launch \ntouch camera_input.launch.py\n
You can edit this file externally, and it will update within the container. Paste the following code into the empty file.
from launch import LaunchDescription\nfrom launch_ros.actions import Node\nfrom launch.substitutions import LaunchConfiguration\nfrom launch.actions import DeclareLaunchArgument\n\ndef generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'param1',\n default_value='param1_default',\n description='Description of param1'),\n DeclareLaunchArgument(\n 'param2',\n default_value='param2_default',\n description='Description of param2'),\n ]\n\n\n #Your model parameters \n param1 = LaunchConfiguration('param1')\n param2 = LaunchConfiguration('param2')\n\n #camera node for camera input\n cam2image_node = Node(\n package='image_tools',\n executable='cam2image',\n remappings=[('image', 'input_image')],\n )\n\n #model node\n model_node = Node(\n package='your-package-name', #make sure your package is named this\n executable='your-node-name_py', \n parameters=[{\n 'param1': param1, \n 'param2': param2,\n }]\n )\n\n final_launch_description = launch_args + [cam2image_node] + [model_node]\n\n return LaunchDescription(final_launch_description)\n
Find the required parameters for your model. You can view this by looking at the Model API for your specific model and taking note to how the model is called. For example, NanoLLM retrieves models through the following:
model = NanoLLM.from_pretrained(\n \"meta-llama/Llama-3-8b-hf\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n
The parameters for NanoLLM would be the model name, api, and quantization.
In the generate_launch_description
function, edit the DeclareLaunchArgument
to accomodate for all parameters except the model name. For NanoLLM, this would look like:
def generate_launch_description():\n launch_args = [\n DeclareLaunchArgument( \n 'api',\n default_value='mlc',\n description='The model backend to use'),\n DeclareLaunchArgument(\n 'quantization',\n default_value='q4f16_ft',\n description='The quantization method to use'),\n ]\n
Then edit the lines under #Your model Parameters
to match the parameters of your model, again excluding the model name. Lastly, fill in the code under the #model node
comment with your package name, the name of your node file, and all of your parameters, this time including the model.
All done for this file!
"},{"location":"tensorrt_llm.html","title":"TensorRT-LLM for Jetson","text":"TensorRT-LLM is a high-performance LLM inference library with advanced quantization, attention kernels, and paged KV caching. Initial support for building TensorRT-LLM from source for JetPack 6.1 has been included in the v0.12.0-jetson
branch of the TensorRT-LLM repo for Jetson AGX Orin.
We've provided pre-compiled TensorRT-LLM wheels and containers along with this guide for TensorRT-LLM Deployment on Jetson Orin
What you need
One of the following Jetson devices:
Jetson AGX Orin *Support for other Orin devices is currently undergoing testing.
Running one of the following versions of JetPack:
JetPack 6.1 (L4T r36.4)
Sufficient storage space (preferably with NVMe SSD).
18.5GB
for tensorrt_llm
container image Space for models (>10GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tensorrt_llm.html#building-tensorrt-llm-engine-for-llama","title":"Building TensorRT-LLM Engine for Llama","text":"You can find the steps for converting Llama to TensorRT-LLM under examples/llama
in the repo, and also in the documentation. This script will automate the process for Llama-7B with INT4 quantization applied, and run some generation and performance checks on the model:
jetson-containers run \\\n -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n -e FORCE_BUILD=on \\\n dustynv/tensorrt_llm:0.12-r36.4.0 \\\n /opt/TensorRT-LLM/llama.sh\n
There are many such conversion procedures outlined in the TensorRT-LLM examples for different model architectures.
"},{"location":"tensorrt_llm.html#openai-api-endpoint","title":"OpenAI API Endpoint","text":"TensorRT-LLM has programming APIs for Python and C++ available, but it also includes an example server endpoint for the OpenAI protocol that makes it easy to substitute for other local or cloud model backends.
This will start the TensorRT-LLM container with the server and model that you built above:
jetson-containers run \\\n dustynv/tensorrt_llm:0.12-r36.4.0 \\\n python3 /opt/TensorRT-LLM/examples/apps/openai_server.py \\\n /data/models/tensorrt_llm/Llama-2-7b-chat-hf-gptq\n
Then you can make chat completion requests against it from practically any language or from any connected device. This example shows a simple way of testing it initially from another terminal with curl:
curl http://localhost:8000/v1/completions \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"model\": <model_name>,\n \"prompt\": \"Where is New York?\",\n \"max_tokens\": 16,\n \"temperature\": 0\n }'\n
Or the code included with openai_client.py will handle these requests using the standard openai-python
package that can be installed outside of the container or on another machine.
jetson-containers run \\\n --workdir /opt/TensorRT-LLM/examples/apps \\\n dustynv/tensorrt_llm:0.12-r36.4.0 \\\n python3 openai_client.py --prompt \"Where is New York?\" --api chat\n
The patches in the branch above for TensorRT-LLM 0.12 are a preview release for Jetson AGX Orin, and we'll continue with validating and testing the various settings in TensorRT-LLM. If you need any support, please post to the Jetson Developer Forums..
"},{"location":"tips_ram-optimization.html","title":"RAM Optimization","text":"Running a LLM requires a huge RAM space.
Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models.
Here we share a couple of ways to optimize the system RAM usage.
"},{"location":"tips_ram-optimization.html#disabling-the-desktop-gui","title":"Disabling the Desktop GUI","text":"If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).
You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:
$ sudo init 3 # stop the desktop\n# log your user back into the console (Ctrl+Alt+F1, F2, ect)\n$ sudo init 5 # restart the desktop\n
If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:
To disable desktop on boot
sudo systemctl set-default multi-user.target\n
To enable desktop on boot
sudo systemctl set-default graphical.target\n
"},{"location":"tips_ram-optimization.html#disabling-misc-services","title":"Disabling misc services","text":"sudo systemctl disable nvargus-daemon.service\n
"},{"location":"tips_ram-optimization.html#mounting-swap","title":"Mounting Swap","text":"If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:
If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.
sudo systemctl disable nvzramconfig\nsudo fallocate -l 16G /ssd/16GB.swap\nsudo mkswap /ssd/16GB.swap\nsudo swapon /ssd/16GB.swap\n
Then add the following line to the end of /etc/fstab to make the change persistent:
/ssd/16GB.swap none swap sw 0 0\n
"},{"location":"tips_ssd-docker.html","title":"Tips - SSD + Docker","text":"Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using jetson-containers
, you want to make sure you have a huge storage space for all the containers and the models you will download.
We are going to show how you can install SSD on your Jetson, and set it up for Docker.
"},{"location":"tips_ssd-docker.html#ssd","title":"SSD","text":""},{"location":"tips_ssd-docker.html#physical-installation","title":"Physical installation","text":" Unplug power and any peripherals from the Jetson developer kit. Physically install an NVMe SSD card on the carrier board of your Jetson developer kit, making sure to properly seat the connector and secure with the screw. Reconnect any peripherals, and then reconnect the power supply to turn on the Jetson developer kit. Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:
lspci\n
The output should look like the following:
0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)\n
"},{"location":"tips_ssd-docker.html#format-and-set-up-auto-mount","title":"Format and set up auto-mount","text":" Run lsblk
to find the device name.
lsblk\n
The output should look like the following:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT\nloop0 7:0 0 16M 1 loop \nmmcblk1 179:0 0 59.5G 0 disk \n\u251c\u2500mmcblk1p1 179:1 0 58G 0 part /\n\u251c\u2500mmcblk1p2 179:2 0 128M 0 part \n\u251c\u2500mmcblk1p3 179:3 0 768K 0 part \n\u251c\u2500mmcblk1p4 179:4 0 31.6M 0 part \n\u251c\u2500mmcblk1p5 179:5 0 128M 0 part \n\u251c\u2500mmcblk1p6 179:6 0 768K 0 part \n\u251c\u2500mmcblk1p7 179:7 0 31.6M 0 part \n\u251c\u2500mmcblk1p8 179:8 0 80M 0 part \n\u251c\u2500mmcblk1p9 179:9 0 512K 0 part \n\u251c\u2500mmcblk1p10 179:10 0 64M 0 part \n\u251c\u2500mmcblk1p11 179:11 0 80M 0 part \n\u251c\u2500mmcblk1p12 179:12 0 512K 0 part \n\u251c\u2500mmcblk1p13 179:13 0 64M 0 part \n\u2514\u2500mmcblk1p14 179:14 0 879.5M 0 part \nzram0 251:0 0 1.8G 0 disk [SWAP]\nzram1 251:1 0 1.8G 0 disk [SWAP]\nzram2 251:2 0 1.8G 0 disk [SWAP]\nzram3 251:3 0 1.8G 0 disk [SWAP]\nnvme0n1 259:0 0 238.5G 0 disk \n
Identify the device corresponding to your SSD. In this case, it is nvme0n1
.
Format the SSD, create a mount point, and mount it to the filesystem.
sudo mkfs.ext4 /dev/nvme0n1\n
You can choose any name for the mount point directory. We use /ssd
here, but in jetson-containers
' setup.md documentation, /mnt
is used.
sudo mkdir /ssd\n
sudo mount /dev/nvme0n1 /ssd\n
In order to ensure that the mount persists after boot, add an entry to the fstab
file:
First, identify the UUID for your SSD:
lsblk -f\n
Then, add a new entry to the fstab
file:
sudo vi /etc/fstab\n
Insert the following line, replacing the UUID with the value found from lsblk -f
:
UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2\n
Finally, change the ownership of the /ssd
directory.
sudo chown ${USER}:${USER} /ssd\n
"},{"location":"tips_ssd-docker.html#docker","title":"Docker","text":" Install nvidia-container
package.
Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including nvidia-containers
) and Docker are already pre-installed, so this step can be skipped.
sudo apt update\nsudo apt install -y nvidia-container\n
JetPack 6.x users
If you flash **Jetson Linux (L4T) R36.x (JetPack 6.x) on your Jetson using SDK Manager, and install nvidia-container
using apt
, on JetPack 6.x it no longer automatically installs Docker.
Therefore, you need to run the following to manually install Docker and set it up.
sudo apt update\nsudo apt install -y nvidia-container curl\ncurl https://get.docker.com | sh && sudo systemctl --now enable docker\nsudo nvidia-ctk runtime configure --runtime=docker\n
Restart the Docker service and add your user to the docker
group, so that you don't need to use the command with sudo
.
sudo systemctl restart docker\nsudo usermod -aG docker $USER\nnewgrp docker\n
Add default runtime in /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert the \"default-runtime\": \"nvidia\"
line as following:
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\"\n}\n
Restart Docker
sudo systemctl daemon-reload && sudo systemctl restart docker\n
"},{"location":"tips_ssd-docker.html#migrate-docker-directory-to-ssd","title":"Migrate Docker directory to SSD","text":"Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.
Stop the Docker service.
sudo systemctl stop docker\n
Move the existing Docker folder
sudo du -csh /var/lib/docker/ && \\\n sudo mkdir /ssd/docker && \\\n sudo rsync -axPS /var/lib/docker/ /ssd/docker/ && \\\n sudo du -csh /ssd/docker/ \n
Edit /etc/docker/daemon.json
sudo vi /etc/docker/daemon.json\n
Insert \"data-root\"
line like the following.
{\n \"runtimes\": {\n \"nvidia\": {\n \"path\": \"nvidia-container-runtime\",\n \"runtimeArgs\": []\n }\n },\n \"default-runtime\": \"nvidia\",\n \"data-root\": \"/ssd/docker\"\n}\n
Rename the old Docker data directory
sudo mv /var/lib/docker /var/lib/docker.old\n
Restart the docker daemon
sudo systemctl daemon-reload && \\\n sudo systemctl restart docker && \\\n sudo journalctl -u docker\n
"},{"location":"tips_ssd-docker.html#test-docker-on-ssd","title":"Test Docker on SSD","text":" [Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.
watch -n1 df \n
[Terminal 2] Next, open a new terminal and start Docker pull.
docker pull nvcr.io/nvidia/l4t-base:r35.2.1\n
[Terminal 1] Observe that the disk usage on /ssd
goes up as the container image is downloaded and extracted.
~$ docker image ls\nREPOSITORY TAG IMAGE ID CREATED SIZE\nnvcr.io/nvidia/l4t-base r35.2.1 dc07eb476a1d 7 months ago 713MB\n
"},{"location":"tips_ssd-docker.html#final-verification","title":"Final Verification","text":"Reboot your Jetson, and verify that you observe the following:
~$ sudo blkid | grep nvme\n/dev/nvme0n1: UUID=\"9fc06de1-7cf3-43e2-928a-53a9c03fc5d8\" TYPE=\"ext4\"\n\n~$ df -h\nFilesystem Size Used Avail Use% Mounted on\n/dev/mmcblk1p1 116G 18G 94G 16% /\nnone 3.5G 0 3.5G 0% /dev\ntmpfs 3.6G 108K 3.6G 1% /dev/shm\ntmpfs 734M 35M 699M 5% /run\ntmpfs 5.0M 4.0K 5.0M 1% /run/lock\ntmpfs 3.6G 0 3.6G 0% /sys/fs/cgroup\ntmpfs 734M 88K 734M 1% /run/user/1000\n/dev/nvme0n1 458G 824M 434G 1% /ssd\n\n~$ docker info | grep Root\n Docker Root Dir: /ssd/docker\n\n~$ sudo ls -l /ssd/docker/\ntotal 44\ndrwx--x--x 4 root root 4096 Mar 22 11:44 buildkit\ndrwx--x--- 2 root root 4096 Mar 22 11:44 containers\ndrwx------ 3 root root 4096 Mar 22 11:44 image\ndrwxr-x--- 3 root root 4096 Mar 22 11:44 network\ndrwx--x--- 13 root root 4096 Mar 22 16:20 overlay2\ndrwx------ 4 root root 4096 Mar 22 11:44 plugins\ndrwx------ 2 root root 4096 Mar 22 16:19 runtimes\ndrwx------ 2 root root 4096 Mar 22 11:44 swarm\ndrwx------ 2 root root 4096 Mar 22 16:20 tmp\ndrwx------ 2 root root 4096 Mar 22 11:44 trust\ndrwx-----x 2 root root 4096 Mar 22 16:19 volumes\n\n~$ sudo du -chs /ssd/docker/\n752M /ssd/docker/\n752M total\n\n~$ docker info | grep -e \"Runtime\" -e \"Root\"\n Runtimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2\n Default Runtime: nvidia\n Docker Root Dir: /ssd/docker\n
Your Jetson is now set up with the SSD!
"},{"location":"try.html","title":"Try","text":"Jump to NVIDIA Jetson Store.
"},{"location":"tutorial-intro.html","title":"Tutorial - Introduction","text":""},{"location":"tutorial-intro.html#overview","title":"Overview","text":"Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.
"},{"location":"tutorial-intro.html#text-llm","title":"Text (LLM)","text":"text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui Ollama Get started effortlessly deploying GGUF models for chat and web UI llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! NanoLLM Optimized inferencing library for LLMs, multimodal agents, and speech. Small LLM (SLM) Deploy Small Language Models (SLM) with reduced memory usage and higher throughput. API Examples Learn how to write Python code for doing LLM inference using popular APIs."},{"location":"tutorial-intro.html#text-vision-vlm","title":"Text + Vision (VLM)","text":"Give your locally running LLM an access to vision!
LLaVA Different ways to run LLaVa vision/language model on Jetson for visual understanding. Live LLaVA Run multimodal models interactively on live video streams over a repeating set of prompts. NanoVLM Use mini vision/language models and the optimized multimodal pipeline for live streaming. Llama 3.2 Vision Run Meta's multimodal Llama-3.2-11B-Vision model on Orin with HuggingFace Transformers."},{"location":"tutorial-intro.html#vision-transformers","title":"Vision Transformers","text":"EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson SAM Meta's SAM, Segment Anything model TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation"},{"location":"tutorial-intro.html#image-generation","title":"Image Generation","text":"Cosmos Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Flux + ComfyUI Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. Stable Diffusion Run AUTOMATIC1111's stable-diffusion-webui
to generate images from prompts SDXL Ensemble pipeline consisting of a base model and refiner with enhanced image generation. nerfstudio Experience neural reconstruction and rendering with nerfstudio and onboard training."},{"location":"tutorial-intro.html#audio","title":"Audio","text":"Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR) AudioCraft Meta's AudioCraft, to produce high-quality audio and music Voicecraft Interactive speech editing and zero shot TTS"},{"location":"tutorial-intro.html#rag-vector-database","title":"RAG & Vector Database","text":"NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data LlamaIndex Realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents LlamaIndex Reference application for building your own local AI assistants using LLM, RAG, and VectorDB"},{"location":"tutorial-intro.html#api-integrations","title":"API Integrations","text":"ROS2 Nodes Optimized LLM and VLM provided as ROS2 nodes for robotics Holoscan SDK Use the Holoscan-SDK to run high-throughput, low-latency edge AI pipelines Jetson Platform Services Quickly build microservice driven vision applications with Jetson Platform Services Gapi Workflows Integrating generative AI into real world environments Gapi Micro Services Wrapping models and code to participate in systems Ultralytics YOLOv8 Run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT."},{"location":"tutorial-intro.html#about-nvidia-jetson","title":"About NVIDIA Jetson","text":"Note
We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores RAM(CPU+GPU) 64GB 32GB 8GB Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)"},{"location":"tutorial_api-examples.html","title":"Tutorial - API Examples","text":"It's good to know the code for generating text with LLM inference, and ancillary things like tokenization, chat templates, and prompting. On this page we give Python examples of running various LLM APIs, and their benchmarks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 5 (L4T r35) JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for l4t-text-generation
container image Space for models (>10GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_api-examples.html#transformers","title":"Transformers","text":"The HuggingFace Transformers API is the de-facto API that models are released for, often serving as the reference implementation. It's not terribly fast, but it does have broad model support, and also supports quantization (AutoGPTQ, AWQ). This uses streaming:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer\nfrom threading import Thread\n\nmodel_name='meta-llama/Llama-2-7b-chat-hf'\nmodel = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nstreamer = TextIteratorStreamer(tokenizer)\n\nprompt = [{'role': 'user', 'content': 'Can I get a recipe for French Onion soup?'}]\ninputs = tokenizer.apply_chat_template(\n prompt,\n add_generation_prompt=True,\n return_tensors='pt'\n).to(model.device)\n\nThread(target=lambda: model.generate(inputs, max_new_tokens=256, streamer=streamer)).start()\n\nfor text in streamer:\n print(text, end='', flush=True)\n
To run this (it can be found here), you can mount a directory containing the script or your jetson-containers directory:
jetson-containers run --volume $PWD/packages/llm:/mount --workdir /mount \\\n $(autotag l4t-text-generation) \\\n python3 transformers/test.py\n
We use the l4t-text-generation
container because it includes the quantization libraries in addition to Transformers, for running the quanztized versions of the models like TheBloke/Llama-2-7B-Chat-GPTQ
"},{"location":"tutorial_api-examples.html#benchmarks","title":"Benchmarks","text":"The huggingface-benchmark.py
script will benchmark the models:
./run.sh --volume $PWD/packages/llm/transformers:/mount --workdir /mount \\\n $(./autotag l4t-text-generation) \\\n python3 huggingface-benchmark.py --model meta-llama/Llama-2-7b-chat-hf\n
* meta-llama/Llama-2-7b-chat-hf AVG = 20.7077 seconds, 6.2 tokens/sec memory=10173.45 MB\n* TheBloke/Llama-2-7B-Chat-GPTQ AVG = 12.3922 seconds, 10.3 tokens/sec memory=7023.36 MB\n* TheBloke/Llama-2-7B-Chat-AWQ AVG = 11.4667 seconds, 11.2 tokens/sec memory=4662.34 MB\n
"},{"location":"tutorial_api-examples.html#nanollm","title":"NanoLLM","text":"The NanoLLM
library uses the optimized MLC/TVM library for inference, like on the Benchmarks page:
> NanoLLM Reference Documentationfrom nano_llm import NanoLLM, ChatHistory, ChatTemplates\n\n# load model\nmodel = NanoLLM.from_pretrained(\n model='meta-llama/Meta-Llama-3-8B-Instruct', \n quantization='q4f16_ft', \n api='mlc'\n)\n\n# create the chat history\nchat_history = ChatHistory(model, system_prompt=\"You are a helpful and friendly AI assistant.\")\n\nwhile True:\n # enter the user query from terminal\n print('>> ', end='', flush=True)\n prompt = input().strip()\n\n # add user prompt and generate chat tokens/embeddings\n chat_history.append(role='user', msg=prompt)\n embedding, position = chat_history.embed_chat()\n\n # generate bot reply\n reply = model.generate(\n embedding, \n streaming=True, \n kv_cache=chat_history.kv_cache,\n stop_tokens=chat_history.template.stop,\n max_new_tokens=256,\n )\n\n # append the output stream to the chat history\n bot_reply = chat_history.append(role='bot', text='')\n\n for token in reply:\n bot_reply.text += token\n print(token, end='', flush=True)\n\n print('\\n')\n\n # save the inter-request KV cache \n chat_history.kv_cache = reply.kv_cache\n
This example keeps an interactive chat running with text being entered from the terminal. You can start it like this:
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat.example\n
Or for easy editing from the host device, copy the source into your own script and mount it into the container with the --volume
flag. And for authenticated models, request access through HuggingFace (like with Llama) and substitute your account's API token above.
"},{"location":"tutorial_audiocraft.html","title":"Tutorial - AudioCraft","text":"Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
10.7 GB
for audiocraft
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_audiocraft.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag audiocraft)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
"},{"location":"tutorial_audiocraft.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"AudioCraft repo comes with demo Jupyter notebooks.
On Jupyter Lab navigation pane on the left, double-click demos
folder.
"},{"location":"tutorial_audiocraft.html#audiogen-demo","title":"AudioGen demo","text":"Run cells with Shift + Enter
, first one will download models, which can take some time.
Info
You may encounter an error message like the following when executing the first cell, but you can keep going.
A matching Triton is not available, some optimizations will not be enabled.\nError caught was: No module named 'triton'\n
In the Audio Continuation cells, you can generate continuation based on text, while in Text-conditional Generation you can generate audio based just on text descriptions.
You can also use your own audio as prompt, and use text descriptions to generate continuation:
prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\") # you can upload your own audio\nprompt_duration = 2\nprompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\noutput = model.generate_continuation(prompt_waveform.expand(3, -1, -1), prompt_sample_rate=prompt_sr,descriptions=[\n 'Subway train blowing its horn', # text descriptions for continuation\n 'Horse neighing furiously',\n 'Cat hissing'\n], progress=True)\ndisplay_audio(output, sample_rate=16000)\n
"},{"location":"tutorial_audiocraft.html#musicgen-and-magnet-demos","title":"MusicGen and MAGNeT demos","text":"The two other jupyter notebooks are similar to AuidioGen, where you can generate continuation or generate audio, while using models trained to generate music.
"},{"location":"tutorial_comfyui_flux.html","title":"ComfyUI and Flux on Jetson Orin","text":"Hey there, fellow developer! \ud83d\udc4b I'm excited to share with you our latest project: Flux, an open-source model for image generation. Here at NVIDIA, we're pushing the boundaries to make Flux work seamlessly across all platforms, including our Jetson Orin devices. While we're still fine-tuning the model for the Jetson Orin Nano, we've already got it running smoothly on the Jetson AGX Orin.
In this tutorial, I'm going to walk you through every step needed to get Flux up and running on your Jetson Orin, even if you've just flashed your system. Follow along, and you should have no trouble getting everything set up. And hey, if something doesn't work out, reach out to me\u2014I\u2019ll keep this guide updated to make sure it's always on point.
So, let's dive in and get Flux running on your Jetson!
"},{"location":"tutorial_comfyui_flux.html#1-install-miniconda-and-create-a-python-310-environment","title":"1. Install Miniconda and Create a Python 3.10 Environment","text":"First things first, you'll need to install Miniconda on your Jetson Orin and create a Python 3.10 environment called comfyui
. This will ensure all dependencies are handled properly within an isolated environment.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh\nchmod +x Miniconda3-latest-Linux-aarch64.sh\n./Miniconda3-latest-Linux-aarch64.sh\n\nconda update conda\n\nconda create -n comfyui python=3.10\nconda activate comfyui\n
"},{"location":"tutorial_comfyui_flux.html#2-install-cuda-cudnn-and-tensorrt","title":"2. Install CUDA, cuDNN, and TensorRT","text":"Once your environment is set up, install CUDA 12.4 along with the necessary cuDNN and TensorRT libraries to ensure compatibility and optimal performance on your Jetson Orin.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda-toolkit-12-4 cuda-compat-12-4\nsudo apt-get install cudnn python3-libnvinfer python3-libnvinfer-dev tensorrt\n
"},{"location":"tutorial_comfyui_flux.html#3-verify-and-configure-cuda","title":"3. Verify and Configure CUDA","text":"After installing CUDA, you'll want to verify that the correct version (12.4) is being used and make this change permanent in your environment.
ls -l /usr/local | grep cuda\nsudo ln -s /usr/local/cuda-12.4 /usr/local/cuda\n\nexport PATH=/usr/local/cuda/bin:$PATH\nnvcc --version\n\necho 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc\necho 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc\necho 'export CUDA_PATH=/usr/local/cuda' >> ~/.bashrc\nsource ~/.bashrc\n
"},{"location":"tutorial_comfyui_flux.html#4-compile-and-install-bitsandbytes-with-cuda-support","title":"4. Compile and Install bitsandbytes
with CUDA Support","text":"Now it\u2019s time to compile and install bitsandbytes
with CUDA support. This involves cloning the repository, configuring the build with CMake, compiling using all available cores, and installing the resulting package.
export BNB_CUDA_VERSION=124\nexport LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH\n\ngit clone https://github.com/timdettmers/bitsandbytes.git\ncd bitsandbytes\n\nmkdir -p build\ncd build\ncmake .. -DCOMPUTE_BACKEND=cuda -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4\nmake -j$(nproc)\n\ncd ..\npython setup.py install\n
Verify the installation by importing the package in Python:
python\n>>> import bitsandbytes as bnb\n>>> print(bnb.__version__)\n
"},{"location":"tutorial_comfyui_flux.html#5-install-pytorch-torchvision-and-torchaudio","title":"5. Install PyTorch, TorchVision, and TorchAudio","text":"Next up, install the essential libraries PyTorch
, torchvision
, and torchaudio
for Jetson Orin. You can always check for the latest links here.
pip install http://jetson.webredirect.org/jp6/cu124/+f/5fe/ee5f5d1a75229/torch-2.3.0-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/988/cb71323efff87/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/0aa/a066463c02b4a/torchaudio-2.3.0+952ea74-cp310-cp310-linux_aarch64.whl\n
"},{"location":"tutorial_comfyui_flux.html#6-clone-the-comfyui-repository","title":"6. Clone the ComfyUI Repository","text":"Clone the ComfyUI repository from GitHub to get the necessary source code.
git clone https://github.com/comfyanonymous/ComfyUI.git\ncd ComfyUI\n
"},{"location":"tutorial_comfyui_flux.html#7-update-dependencies","title":"7. Update Dependencies","text":"Make sure all the necessary dependencies are installed by running the requirements.txt
file.
pip install -r requirements.txt\n
"},{"location":"tutorial_comfyui_flux.html#8-resolve-issues-with-numpy","title":"8. Resolve Issues with NumPy","text":"If you encounter issues with NumPy, downgrade to a version below 2.0 to avoid compatibility problems.
pip install \"numpy<2\"\n
"},{"location":"tutorial_comfyui_flux.html#9-run-comfyui","title":"9. Run ComfyUI","text":"Finally, run ComfyUI to ensure everything is set up correctly.
python main.py\n
Great! Now that you\u2019ve got ComfyUI up and running, let's load the workflow to start using the Flux model.
Download the workflow file using this link. And load it from the ComfyUI interface. You\u2019ll need to download the Flux Schnell model flux1-schnell.safetensors
and vae ae.safetensors
from Hugging Face and place the model in the models/unet
folder and vae in models/vae
within ComfyUI. Download clip_l.safetensors
and t5xxl_fp8_e4m3fn.safetensors
from Stability's Hugging Face and place them inside models/clip
folder. Alright, you're all set to launch your first run! Head over to the URL provided by ComfyUI (127.0.0.1:8188) on your Jetson AGX Orin, and hit that Queue Prompt button. The first time might take a little longer as the model loads, but after that, each generation should take around 21 seconds. Plus, you can queue up multiple prompts and let it generate images for hours!!
Happy generating! \ud83c\udf89
ASIER \ud83d\ude80
Some examples:
"},{"location":"tutorial_distillation.html","title":"CLIP model distillation","text":"See \"Jetson Introduction to Knowledge Distillation\" repo's README.md.
https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation
"},{"location":"tutorial_gapi_microservices.html","title":"Tutorial - Gapi Micro Services","text":"A Micro Service is a process that runs a wrapper python script that integrates your custom code/models so they can integrate into Gapi Workflows.
You can run a Micro Service wherever you like and connect it to a Gapi Server via the streaming, hybrid binary+json message protocol.
There are some out-of-the-box \u201cCommunity Micro Services\" that we integrate, test and pack into Docker images. When you run them, they auto integrate, load NVIDIA layers correctly and offer logging to the host system.
Even better, create your own! Just implement an on_message Python handler to process requests and respond. The rest is handled for you.
Think of a Micro Service as a simple wrapper to some code or model you have. It works the same as any other Node inside a Workflow. When it's your Micro Service Node's turn your on_message function will be called. Your script gets the rolling Transaction data for context and you then publish your data directly back into the flow.
"},{"location":"tutorial_gapi_microservices.html#running-the-community-micro-services","title":"Running the Community Micro Services","text":"Requirements for Community Micro Services
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space.
Literally 4KB for your own Micro Service Anywhere from ~4GB to ~10GB for each Community Micro Service #1 Login and go to the Micro Services tab\n#2 Follow the instructions in the blue box on that page to download your custom configuration\n#3 Then follow the instructions below that for installing the Micro Service you want\n
Example of instruction page:
"},{"location":"tutorial_gapi_microservices.html#congrats-you-can-go-through-the-workflow-tips-now","title":"Congrats! You Can Go Through the Workflow Tips Now","text":""},{"location":"tutorial_gapi_microservices.html#creating-your-own-micro-service","title":"Creating Your Own Micro Service","text":"The entire Micro Service zip file is just 4KB with 4 files:
message_handler.py: for you to respond message.py: for the streaming binary/json protocol gapi-ms: as entry point and handler) requirements.txt: defines just asyncio + websockets Full documentation here: Gapi Micro Service Docs. Synopsis below...
#1 Create logical Micro Service in UI and copy the key\n#2 Download the zip file from the UI\n#3 python gapi-ms.py ws://0.0.0.0:8090/gapi-ws [MICROSERVICE_KEY]\n#4 Refresh the UI to confirm it's online\n#5 Edit the message_handler.py to handle binary+json input and change the output\n#6 Add a Micro Service Node to a Workflow and tie it to your Micro Service. Hit Test.\n
"},{"location":"tutorial_gapi_workflows.html","title":"Tutorial - Gapi","text":"Gapi is an embeddable API gateway that creates streaming integrations between AI micro services and the systems that users leverage everyday.
The project's goal is to accelerate the speed of creating pilots and demos of Jetson AI Lab achievements into real world environments
\"On Device\" generative AI doesn't mean it has to live on an island!
Workflow engine with low code UI with dozens of open integrations and customizable clients for mobile web and desktop. Micro service framework for wrapping Jetson containers (Ollama, Whisper, Piper TTS, etc. are done, with more coming). Or wrap your own models/code and integrate it into Workflows. Real-time, hybrid, binary+json messaging smoothens intra-service calls and reduced latency. A fast path to proving generative AI value to stakeholders in their actual environment. "},{"location":"tutorial_gapi_workflows.html#gapi-server","title":"Gapi Server","text":"Embeddable API gateway software that runs in the background with a low code workflow UI for testing. The server is a message hub and state machine for workflow 'nodes' that talk to Micro Services. Think of it as connective-tissue for applications.
A Micro Service is a process that runs some wrapper python scripts that integrates custom code/models into Workflows using a streaming API.
What you need to run Gapi Server on Jetson
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin NX (8GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)
Docker
Sufficient storage space.
Gapi Server will run on other environments. Email us at support@GenAINerds.com if that's something you think is worthwhile.
Explaining the Steps:
1) On the Docker host, create working dir for persistant data 2) Download configuration files 3) Unzip 4) Pull Docker image, create container and start the process (will return console to you) Copy and Run the Steps:
mkdir ~/gapiData && cd ~/gapiData\ncurl -L https://raw.githubusercontent.com/genai-nerds/Gapi/main/gapiConfigs.zip -o gapiConfigs.zip\nunzip -q gapiConfigs.zip\ndocker run -d --name gapi --network host -v ~/gapiData:/opt/gapi/vdata genainerds/gapi:arm64 /bin/bash -c \"cd /opt/gapi/bin && ./startGapi.sh\"\necho \"You may need to hit Enter now. Afterwards the Docker container 'gapi' should be running\"\n
Troubleshooting:
Keep in mind all data read or written is in ~/gapiData Look at ~/gapiData/gapi.log to see what happened (if say the docker run command doesn't work) gapiServerConfig.json has all the initial setup NOTE: You will need to run some Micro Services before doing anything meaningful, so please review the mini tour below but don't do any of it in the UI untill you complete the setup (instructions at the bottom)
"},{"location":"tutorial_gapi_workflows.html#ui","title":"UI","text":""},{"location":"tutorial_gapi_workflows.html#tips-use-case-templates","title":"Tips & Use Case Templates","text":" When you login there will be an array of Tip Workflows that have notes and explain core concepts.
Tips:
Hello World: Basics plus it literally says hello Run a Local LLM: Play with Ollama graphically and connect it to other systems Streaming Speech to Text: PiperTTS Querying a Vector Database: Query a simple set of vectorized text documents Variables, Flow and Logic: Understand how to setup more robust workflows Calling Workflows from Outside Gapi: Configure Webhooks Workflows Calling Your Code: Micro Service Nodes that invoke your code Communications: 3rd party communications like Slack (IM), Twilio (SMS), SendGrid (EMAIL) "},{"location":"tutorial_gapi_workflows.html#workflows","title":"Workflows","text":"Workflows visually connect the execution and flow of data between Nodes.
A Transaction (or single firing) has \"Rolling Input\" data it accumulates as Node to Node steps each talk to Micro Services and APIs. All Nodes enjoy variables and flow control using familiar json and javascript concepts.
Each Node can append or reference the rolling data in the Transaction while making decisions along the way.
Watch live Transactions as they start from clients, webhooks and published messages from Micro Services with visual feedback and debugging.
"},{"location":"tutorial_gapi_workflows.html#apis-to-business-systems","title":"APIs to Business Systems","text":"Gapi can help smoothly integrate generative AI into systems that people already use everyday via APIs. It has the streaming API to Micro Services plus the state management and chops to handle the outward (webhook) style APIs to existing systems.
Our hope is to rally contributions by the community to keep growing the out-of-the-box/tested Nodes but there is a DIY one as well to manually map what you need into your Workflows.
Some of the out-of-the-box API Nodes: Slack (IM), Twilio (SMS), SendGrid (Email), Service Now (Ticketing), DIY Webhook
"},{"location":"tutorial_gapi_workflows.html#micro-services","title":"Micro Services","text":"There are community published Micro Services as well as custom ones you can make yourself. Gapi Server becomes most useful when leveraging them so please follow the How To below.
Current Community Micro Services:
Whisper Ollama Vector Text to Speech Img to Text Complete the Setup: How To Run and/or Create Micro Services
"},{"location":"tutorial_gapi_workflows.html#support-contribute","title":"Support / Contribute","text":"Gapi is a project from the GenAI Nerds and hosted on Github.
Ask a question, support@GenAINerds.com or Say hello, hello@GenAINerds.com Contribute/create tickets on Github "},{"location":"tutorial_holoscan.html","title":"Tutorial - Holoscan SDK","text":"The Holoscan SDK is part of NVIDIA Holoscan, the AI sensor processing platform that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications, from embedded to edge to cloud.
"},{"location":"tutorial_holoscan.html#holoscan-sdk-core-concepts","title":"Holoscan SDK - Core Concepts","text":"A Holoscan SDK Application
is composed of Fragments
, each of which runs a graph of Operators
. The implementation of that graph is sometimes referred to as a pipeline or workflow, which can be visualized below:
"},{"location":"tutorial_holoscan.html#holoscan-sdk-getting-started-on-jetson","title":"Holoscan SDK - Getting Started on Jetson","text":"The best place to get started using the Holoscan SDK is the HoloHub repo. This is a central repository for the NVIDIA Holoscan AI sensor processing community to share apps and extensions.
So, let's walk through how to run the Surgical Tool Tracking example application from HoloHub!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
13.7 GB
for efficientvit
container image 850 Mb
for Tool Tracking ONNX model + example video Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_holoscan.html#launching-a-holoscan-compatible-container","title":"Launching a Holoscan-compatible Container","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build the Holoscan container.
Use the -v
option to mount HoloHub's ./build
and ./data
directories to the local Holoscan package directory so builds and data are cached across runs
jetson-containers run \\\n -v ./packages/holoscan/holohub/data:/opt/nvidia/holohub/data \\\n -v ./packages/holoscan/holohub/build:/opt/nvidia/holohub/build \\\n $(autotag holoscan)\n
"},{"location":"tutorial_holoscan.html#running-holohubs-endoscopy-tool-tracking-app","title":"Running HoloHub's Endoscopy Tool Tracking App","text":"An example application from HoloHub is the Endoscopy Tool Tracking application. This sample application demonstrates how the Holoscan SDK can be used to build an efficient pipeline that streams a video feed, preprocesses the data, runs inference using TensorRT, post-processes the data, and renders the video feed with the inference overlays.
"},{"location":"tutorial_holoscan.html#building-the-app","title":"Building The App","text":"The Holoscan SDK uses CMake to build C++ applications and also leverages CMake to pull and build app dependencies. So, regardless of whether an application is implemented using C++ or Python, many apps will still require that you \"build\" them first.
The Endoscopy Tool Tracking App has both a Python and C++ implementation. Building this app creates the C++ application program, pulls in an example video, and builds the TensorRT engine used for inference.
Go to the HoloHub directory
cd /opt/nvidia/holohub\n
Build the app using the 'run' script ./run build endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#running-the-python-app","title":"Running The Python App","text":"First, add the Holoscan SDK and the HoloHub build directory to your PYTHONPATH environment variable.
export HOLOHUB_BUILD_PATH=/opt/nvidia/holohub/build/endoscopy_tool_tracking\nexport PYTHONPATH=$PYTHONPATH:$HOLOHUB_BUILD_PATH/python/lib:/opt/nvidia/holoscan/python/lib\n
Next, run the application using Python! python3 /opt/nvidia/holohub/applications/endoscopy_tool_tracking/python/endoscopy_tool_tracking.python --data /opt/nvidia/holohub/data/endoscopy/\n
Shortly after launching the application, you will see the HoloViz window that visualizes the sample video and the model's outputs:
"},{"location":"tutorial_holoscan.html#running-the-c-app","title":"Running The C++ App","text":"The C++ app can be run using the run script by specifying the app name:
./run launch endoscopy_tool_tracking\n
"},{"location":"tutorial_holoscan.html#next-steps","title":"Next Steps","text":"Congratulations! You've successfully run a Holoscan SDK application!
To dive deeper into the Holoscan SDK, explore the SDK's documentation on Core Concepts, Holoscan by Example, and Creating an Application.
"},{"location":"tutorial_jetson-copilot.html","title":"Tutorial - Jetson Copilot","text":"Jetson Copilot is a reference application for a local AI assistant, which demonstrates;
Running open-source LLMs (large language models) on device RAG (retrieval-augmented generation) to let LLM have access to your locally indexed knowledge What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6 GB
for jetrag
container image About 4 GB
for downloading some default models (llama3
and mxbai-embed-large
) Info
To run Jetson Copilot, you do not need to have jetson-containers
on your system. It uses the jetrag
container image that is being managed and built on jetson-containers
.
"},{"location":"tutorial_jetson-copilot.html#getting-started","title":"\ud83c\udfc3 Getting started","text":""},{"location":"tutorial_jetson-copilot.html#first-time-setup","title":"First time setup","text":"If this is your first time to run Jetson Copilot on your Jetson, first run setup.sh
to ensure you have all the necessary software installed and the environment set up.
git clone https://github.com/NVIDIA-AI-IOT/jetson-copilot/\ncd jetson-copilot\n./setup_environment.sh\n
It will install the following, if not yet.
Chromium web browser Docker "},{"location":"tutorial_jetson-copilot.html#how-to-start-jetson-copilot","title":"How to start Jetson Copilot","text":"cd jetson-copilot\n./launch_jetson_copilot.sh\n
This will start a Docker container and start a Ollama server and Streamlit app inside the container. It will shows the URL on the console in order to access the web app hosted on your Jetson.
With your web browser on Jetson, open the Local URL (http://localhost:8501
).Or on a PC connected on the same network as on your Jetson, access the Network URL.
Local URL: http://localhost:8501\nNetwork URL: http://10.110.50.252:8501 \n
Info
You will need the Internet connection on Jetson when it launches for the first time, as it will pull the container image (and download the default LLM and embedding model when web UI starts for the first time).
When you access the web UI for the first time, it will dowload the default LLM (llama3
) and the embedding model (mxbai-embed-large
).
Tips
If you are on Ubuntu Desktop, a frameless Chromium window will pop up to access the web app, to make it look like an independent application. You need to close the window as stopping the container on the console won't shutdown Chromium.
"},{"location":"tutorial_jetson-copilot.html#how-to-use-jetson-copilot","title":"\ud83d\udcd6 How to use Jetson Copilot","text":""},{"location":"tutorial_jetson-copilot.html#0-interact-with-the-plain-llama3-8b","title":"0. Interact with the plain Llama3 (8b)","text":"You can use Jetson Copilot just to interact with a LLM withut enabling RAG feature.
By default, Llama3 (8b) model is downloaded when running for the first time and use as the default LLM.
You will be surprized how much a model like Llama3 is capable, but may soon find limitations as it does not have information prior to its cutoff date nor know anything about your specific subject matter.
"},{"location":"tutorial_jetson-copilot.html#1-ask-jetson-related-question-using-pre-built-index","title":"1. Ask Jetson related question using pre-built index","text":"On the side panel, you can toggle \"Use RAG\" on to enable RAG pipeline. The LLM will have an access to a custom knowledge/index that is selected under \"Index\".
As a sample, a pre-build index \"_L4T_README
\" is provided. This is built on all the README text files that supplied in the \"L4T-README\" folder on the Jetson desktop.
It is mounted as /media/<USER_NAME>/L4T-README/
once you execute udisksctl mount -b /dev/disk/by-label/L4T-README
.
You can ask questions like:
What IP address does Jetson gets assigned when connected to a PC via a USB cable in USB Device Mode?\n
"},{"location":"tutorial_jetson-copilot.html#2-build-your-own-index-based-on-your-documents","title":"2. Build your own index based on your documents","text":"You can build your own index based on your local and/or online documents.
First, on the console (or on the desktop) create a directory under Documents
directory to store your documents.
cd jetson-copilot\nmkdir Documents/Jetson-Orin-Nano\ncd Documents/Jetson-Orin-Nano\nwget https://developer.nvidia.com/downloads/assets/embedded/secure/jetson/orin_nano/docs/jetson_orin_nano_devkit_carrier_board_specification_sp.pdf\n
Now back on the web UI, open the side bar, toggle on \"Use RAG\", then click on \"\u2795Build a new index\" to jump to a \"Build Index\" page.
Give a name for the Index you are to build. (e.g. \"JON Carrier Board\") Type in the field and hit Enter
key, then it will check and show what path will be created for your index.
And then from the drop select box under \"Local documents\", select the directory you created and saved your documents in. (e.g. /opt/jetson_copilot/Documents/Jetson-Orin-Nano
).
It will show the summary of files found in the selected directory.
If you want to rather only or additionally supply URLs for the online docuemnts to be ingested, fill the text area with one URL per a line. You can skip this if you are building your index only based on your local documents.
Info
On the sidebar, make sure mxbai-embed-large
is selected for the embedding model.
Use of OpenAI embedding models is not well supported and needs more testing.
Finally, hit \"Build Index\" button. It will show the progress in the drop-down \"status container\", so you can check the status by clicking on it. Once done, it will show the summary of your index and time it took.
You can go back to the home screen to now select the index you just built.
"},{"location":"tutorial_jetson-copilot.html#3-test-different-llm-or-embedding-model","title":"3. Test different LLM or Embedding model","text":"TODO
"},{"location":"tutorial_jetson-copilot.html#development","title":"\ud83c\udfd7\ufe0f Development","text":"Streamlit based web app is very easy to develop.
On web UI, at the top-right of the screen to choose \"Always rerun\" to automatically update your app every time you change the source codes.
See Streamlit Documentation for the detail.
"},{"location":"tutorial_jetson-copilot.html#manually-run-streamlit-app-inside-the-container","title":"Manually run streamlit app inside the container","text":"In case you make more fundamental changes, you can also manually run streamlit app.
cd jetson-copilot\n./launch_dev.sh\n
Once in container;
streamlit run app.py\n
"},{"location":"tutorial_jetson-copilot.html#directory-structure","title":"\ud83e\uddf1 Directory structure","text":"\u2514\u2500\u2500 jetson-copilot\n \u251c\u2500\u2500 launch_jetson_copilot.sh\n \u251c\u2500\u2500 setup_environment.sh\n \u251c\u2500\u2500 Documents \n \u2502 \u2514\u2500\u2500 your_abc_docs\n \u251c\u2500\u2500 Indexes\n \u2502 \u251c\u2500\u2500 _L4T_README\n \u2502 \u2514\u2500\u2500 your_abc_index\n \u251c\u2500\u2500 logs\n \u2502 \u251c\u2500\u2500 container.log\n \u2502 \u2514\u2500\u2500 ollama.log\n \u251c\u2500\u2500 ollama_models\n \u2514\u2500\u2500 Streamlit_app\n \u251c\u2500\u2500 app.py\n \u251c\u2500\u2500 build_index.py\n \u2514\u2500\u2500 download_model.py\n
Following directories inside the jetson-copilot
directory are mounted in the Docker container.
Directory Name Description Docuemtns
Directory to store your documents to be indexed Indexes
Directory to store pre-built (or built-by-you) indexes for LLM to perform RAG on logs
Directory for the app to store log files ollama_models
Directory for the ollama server to store download models stremlit_app
Directory for Python scripts to make up the web app"},{"location":"tutorial_jetson-copilot.html#troubleshooting","title":"\ud83d\udcab Troubleshooting","text":"If you find any issue, please check GitHub Issues of the Jetson Copilot repo.
"},{"location":"tutorial_jps.html","title":"Tutorial - Jetson Platform Services","text":"Jetson Plaform Services (JPS) provide a platform to simplify development, deployment and management of Edge AI applications on NVIDIA Jetson. JPS is a modular & extensible architecture for developers to distill large complex applications into smaller modular microservice with APIs to integrate into other apps & services. At its core are a collection of AI services leveraging generative AI, deep learning, and analytics, which provide state of the art capabilities including video analytics, video understanding and summarization, text based prompting, zero shot detection and spatio temporal analysis of object movement.
VLM Alert Workflow built with JPS
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
To get started with Jetson Platform Services, follow the quickstart guide to install and setup JPS. Then explore the reference workflows to learn how to use DeepStream, Analytics, Generative AI and more with JPS:
1) Quick Start Guide 2) AI Powered Network Video Recorder 3) Zero Shot Detection with NanoOWL 4) Visual Language Model Alerts
The reference workflows demonstrate how to use the microservices provided in JPS to build full end to end systems on your Jetson.
VLM Alert Workflow Architecture
View the links below to learn more about Jetson Platform Services: VLM Alert Blog JPS Product Page JPS Documentation VLM Alert Demo Video
"},{"location":"tutorial_live-llava.html","title":"Tutorial - Live LLaVA","text":"Recommended
Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for in interactive pipeline editor built from live VLMs.
This multimodal agent runs a vision-language model on a live camera feed or video stream, repeatedly applying the same prompts to it:
It uses models like LLaVA or VILA and has been quantized with 4-bit precision. This runs an optimized multimodal pipeline from the NanoLLM
library, including running the CLIP/SigLIP vision encoder in TensorRT, event filters and alerts, and multimodal RAG (see the NanoVLM page for benchmarks)
"},{"location":"tutorial_live-llava.html#running-the-live-llava-demo","title":"Running the Live Llava Demo","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>10GB
) Follow the chat-based LLaVA and NanoVLM tutorials first.
Supported vision/language models:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GB The VideoQuery agent applies prompts to the incoming video feed with the VLM. Navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera (Chrome is recommended with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled)
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
This uses jetson_utils
for video I/O, and for options related to protocols and file formats, see Camera Streaming and Multimedia. In the example above, it captures a V4L2 USB webcam connected to the Jetson (under the device /dev/video0
) and outputs a WebRTC stream.
"},{"location":"tutorial_live-llava.html#processing-a-video-file-or-stream","title":"Processing a Video File or Stream","text":"The example above was running on a live camera, but you can also read and write a video file or network stream by substituting the path or URL to the --video-input
and --video-output
command-line arguments like this:
jetson-containers run \\\n -v /path/to/your/videos:/mount\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /mount/my_video.mp4 \\\n --video-output /mount/output.mp4 \\\n --prompt \"What does the weather look like?\"\n
This example processes and pre-recorded video (in MP4, MKV, AVI, FLV formats with H.264/H.265 encoding), but it also can input/output live network streams like RTP, RTSP, and WebRTC using Jetson's hardware-accelerated video codecs.
"},{"location":"tutorial_live-llava.html#nanodb-integration","title":"NanoDB Integration","text":"If you launch the VideoQuery agent with the --nanodb
flag along with a path to your NanoDB database, it will perform reverse-image search on the incoming feed against the database by re-using the CLIP embeddings generated by the VLM.
To enable this mode, first follow the NanoDB tutorial to download, index, and test the database. Then launch VideoQuery like this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output \\\n --nanodb /data/nanodb/coco/2017\n
You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:
"},{"location":"tutorial_live-llava.html#video-vila","title":"Video VILA","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. The vision/video.py
example keeps a rolling history of frames:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_live-llava.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_live-llava.html#walkthrough-videos","title":"Walkthrough Videos","text":""},{"location":"tutorial_llamaindex.html","title":"Tutorial - LlamaIndex","text":"Let's use LlamaIndex, to realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents!
What you need
One of the following Jetson devices:
Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.5 GB
for llama-index
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_llamaindex.html#how-to-start-a-container-with-samples","title":"How to start a container with samples","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag llama-index:samples)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
You can follow along LlamaIndex_Local-Models_L4T.ipynb
(which is based on the official LlamaIndex tutorial).
"},{"location":"tutorial_llamaspeak.html","title":"Tutorial - llamaspeak","text":"Talk live with Llama using streaming ASR/TTS, and chat about images with Llava!
The NanoLLM
library provides optimized inference for LLM and speech models. It's recommended to run JetPack 6.0 to be able to run the latest containers. The WebChat
agent has responsive conversational abilities and multimodal support for chatting about images with vision/language models, including overlapping ASR/LLM/TTS generation and verbal interruptability.
"},{"location":"tutorial_llamaspeak.html#running-llamaspeak","title":"Running llamaspeak","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>10GB
) Start the Riva server first and test the ASR examples.
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --asr=riva --tts=piper\n
This will start llamaspeak with text LLM and ASR/TTS enabled. You can then navigate your browser to https://IP_ADDRESS:8050
The default port is 8050, but can be changed with --web-port
(and --ws-port
for the websocket port) During bot replies, the TTS model will pause output if you speak a few words in the mic to interrupt it. Request access to the Llama models on HuggingFace and substitute your account's API token above. The code and docs for the WebAgent
that runs llamaspeak can be found in the NanoLLM library. This block diagram shows the speech pipeline with interleaved model generation, user interruption, and streaming I/O:
"},{"location":"tutorial_llamaspeak.html#multimodality","title":"Multimodality","text":"If you load a multimodal vision/language model instead, you can drag images into the chat and ask questions about them:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.web_chat --api=mlc \\\n --model Efficient-Large-Model/VILA-7b \\\n --asr=riva --tts=piper\n
For more info about the supported vision/language models, see the NanoVLM page.
"},{"location":"tutorial_llamaspeak.html#function-calling","title":"Function Calling","text":"There's the ability to define functions from Python that the bot has access to and can invoke based on the chat flow:
This works by using the bot_function()
decorator and adding the API description's to the system prompt:
from nano_llm import NanoLLM, ChatHistory, BotFunctions, bot_function\nfrom datetime import datetime\n\n@bot_function\ndef DATE():\n \"\"\" Returns the current date. \"\"\"\n return datetime.now().strftime(\"%A, %B %-m %Y\")\n\n@bot_function\ndef TIME():\n \"\"\" Returns the current time. \"\"\"\n return datetime.now().strftime(\"%-I:%M %p\")\n\nsystem_prompt = \"You are a helpful and friendly AI assistant.\" + BotFunctions.generate_docs()\n
The system prompt can be autogenerated from the Python docstrings embedded in the functions themselves, and can include parameters that the bot can supply (for example, selectively saving relevant user info to a vector database for RAG like is shown in the video).
For more information about this topic, see the Function Calling section of the NanoLLM documentation.
"},{"location":"tutorial_llava.html","title":"Tutorial - LLaVA","text":"LLaVA is a popular multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries. Llava uses the CLIP vision encoder to transform images into the same embedding space as its LLM (which is the same as Llama architecture). Below we cover different methods to run Llava on Jetson, with increasingly optimized performance:
Chat with Llava using text-generation-webui
Run from the terminal with llava.serve.cli
Quantized GGUF models with llama.cpp
Optimized Multimodal Pipeline with NanoVLM
Llava-13B (Jetson AGX Orin) Quantization Tokens/sec Memory text-generation-webui
4-bit (GPTQ) 2.3 9.7 GB llava.serve.cli
FP16 (None) 4.2 27.7 GB llama.cpp
4-bit (Q4_K) 10.1 9.2 GB NanoVLM
4-bit (MLC) 21.1 8.7 GB In addition to Llava, the NanoVLM
pipeline supports VILA and mini vision models that run on Orin Nano as well.
"},{"location":"tutorial_llava.html#1-chat-with-llava-using-text-generation-webui","title":"1. Chat with Llava using text-generation-webui
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for text-generation-webui
container image Space for models CLIP model : 1.7GB
Llava-v1.5-13B-GPTQ model : 7.25GB
Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_llava.html#download-model","title":"Download Model","text":"jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 download-model.py --output=/data/models/text-generation-webui \\\n TheBloke/llava-v1.5-13B-GPTQ\n
"},{"location":"tutorial_llava.html#start-web-ui-with-multimodal-extension","title":"Start Web UI with Multimodal Extension","text":"jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n python3 server.py --listen \\\n --model-dir /data/models/text-generation-webui \\\n --model TheBloke_llava-v1.5-13B-GPTQ \\\n --multimodal-pipeline llava-v1.5-13b \\\n --loader autogptq \\\n --disable_exllama \\\n --verbose\n
Go to Chat tab, drag and drop an image into the Drop Image Here area, and your question in the text area and hit Generate:
"},{"location":"tutorial_llava.html#result","title":"Result","text":""},{"location":"tutorial_llava.html#2-run-from-the-terminal-with-llavaservecli","title":"2. Run from the terminal with llava.serve.cli
","text":"What you need
One of the following Jetson:
Jetson AGX Orin 64GB Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1GB
for llava
container 14GB
for Llava-7B (or 26GB
for Llava-13B) This example uses the upstream Llava repo to run the original, unquantized Llava models from the command-line. It uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging. See the Llava container readme for more info.
"},{"location":"tutorial_llava.html#llava-v15-7b","title":"llava-v1.5-7b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-7b \\\n --image-file /data/images/hoover.jpg\n
"},{"location":"tutorial_llava.html#llava-v15-13b","title":"llava-v1.5-13b","text":"jetson-containers run $(autotag llava) \\\n python3 -m llava.serve.cli \\\n --model-path liuhaotian/llava-v1.5-13b \\\n --image-file /data/images/hoover.jpg\n
Unquantized 13B may run only on Jetson AGX Orin 64GB due to memory requirements.
"},{"location":"tutorial_llava.html#3-quantized-gguf-models-with-llamacpp","title":"3. Quantized GGUF models with llama.cpp
","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
llama.cpp is one of the faster LLM API's, and can apply a variety of quantization methods to Llava to reduce its memory usage and runtime. Despite its name, it uses CUDA. There are pre-quantized versions of Llava-1.5 available in GGUF format for 4-bit and 5-bit:
mys/ggml_llava-v1.5-7b mys/ggml_llava-v1.5-13b jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/hoover.jpg \\\n --prompt \"What does the sign say\"'\n
Quantization Bits Response Tokens/sec Memory Q4_K
4 The sign says \"Hoover Dam, Exit 9.\" 10.17 9.2 GB Q5_K
5 The sign says \"Hoover Dam exit 9.\" 9.73 10.4 GB A lower temperature like 0.1 is recommended for better quality (--temp 0.1
), and if you omit --prompt
it will describe the image:
jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n /bin/bash -c './llava-cli \\\n --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n --n-gpu-layers 999 \\\n --image /data/images/lake.jpg'\n\nIn this image, a small wooden pier extends out into a calm lake, surrounded by tall trees and mountains. The pier seems to be the only access point to the lake. The serene scene includes a few boats scattered across the water, with one near the pier and the others further away. The overall atmosphere suggests a peaceful and tranquil setting, perfect for relaxation and enjoying nature.\n
You can put your own images in the mounted jetson-containers/data
directory. The C++ code for llava-cli can be found here. The llama-cpp-python bindings also support Llava, however they are slower from Python (potentially handling of the tokens)
"},{"location":"tutorial_llava.html#4-optimized-multimodal-pipeline-with-nanovlm","title":"4. Optimized Multimodal Pipeline with NanoVLM
","text":"What's Next
This section got too long and was moved to the NanoVLM page - check it out there for performance optimizations, mini VLMs, and live streaming!
"},{"location":"tutorial_minigpt4.html","title":"Tutorial - MiniGPT-4","text":"Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
5.8GB
for container image Space for pre-quantized MiniGPT-4 model Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_minigpt4.html#start-minigpt4-container-with-models","title":"Start minigpt4
container with models","text":"To start the MiniGPT4 container and webserver with the recommended models, run this command:
jetson-containers run $(autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 && python3 webui.py \\\n $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \\\n $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'\n
Then, open your web browser and access http://<IP_ADDRESS>:7860
.
"},{"location":"tutorial_minigpt4.html#results","title":"Results","text":""},{"location":"tutorial_mmj.html","title":"Tutorial mmj","text":"17# First steps with Metropolis Microservices for Jetson
NVIDIA Metropolis Microservices for Jetson simplifies the development of vision AI applications, offering a suite of customizable, cloud-native tools. Before diving into this tutorial, ensure you've filled out the Metropolis Microservices for Jetson Early Access form to gain the necessary access to launch the services. This step is crucial as it enables you to utilize all the features and capabilities discussed in this guide.
Perfect for both newcomers and experts, this tutorial provides straightforward steps to kick-start your edge AI projects. Whether you're a student or an ecosystem partner working on a use case, this guide offers a straightforward start for every skill level.
"},{"location":"tutorial_mmj.html#0-install-nvidia-jetson-services","title":"0. Install NVIDIA Jetson Services:","text":"Ok, let's start by installing NVIDIA Jetson Services:
sudo apt install nvidia-jetson-services\n
Let's add some performance hacks that will be needed to run the demo faster and without streaming artifacts:
If you don't have the Orin at max performance, you can use these two commands, a reboot is needed after:
sudo nvpmodel -m 0 \nsudo jetson_clocks\n
After these two commands, a reboot is needed if your Jetson wasn't already in high-performance mode. These are optional, but they fine-tune your network buffers to ensure smoother streaming by optimizing how much data can be sent and received:
sudo sysctl -w net.core.rmem_default=2129920\nsudo sysctl -w net.core.rmem_max=10000000\nsudo sysctl -w net.core.wmem_max=2000000\n
"},{"location":"tutorial_mmj.html#1-download-nvidia-cli-for-jetson","title":"1. Download NVIDIA CLI for Jetson","text":"Download NGC for ARM64 from the NGC for CLI site:
unzip ngccli_arm64.zip\nchmod u+x ngc-cli/ngc\necho \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" >> ~/.bash_profile && source ~/.bash_profile\nngc config set\n
Here it will ask for your API Key, and the organization name, to get those you need to login into NGC and generate an API key here. You should then paste the API key and use the organization name you are using. You can also press [Enter] to select the default values for the remaining options. After this, you should get the message:
Successfully saved NGC configuration to /home/jetson/.ngc/config\n
Then, login with the same API key:
sudo docker login nvcr.io -u \"\\$oauthtoken\" -p <NGC-API-KEY>\n
Now launch the Redis and Ingress services, as we need them for this tutorial.
sudo systemctl start jetson-redis\nsudo systemctl start jetson-ingress\n
"},{"location":"tutorial_mmj.html#2-download-and-launch-nvstreamer","title":"2. Download and launch NVStreamer","text":""},{"location":"tutorial_mmj.html#_1","title":"Tutorial mmj","text":"First, we need to install NVStreamer, an app that streams the videos MMJs will need to run AI on them. Follow this NVStreamer Link (In the top-left, click Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf nvstreamer.tar.gz\ncd nvstreamer\n
Launch it: sudo docker compose -f compose_nvstreamer.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#3-download-ai_nvr-and-launch","title":"3. Download AI_NVR and launch:","text":""},{"location":"tutorial_mmj.html#_2","title":"Tutorial mmj","text":"AI NVR (NGC) Link (Top-left -> Download files.zip)
unzip files.zip\nrm files.zip\ntar -xvf ai_nvr.tar.gz\nsudo cp ai_nvr/config/ai-nvr-nginx.conf /opt/nvidia/jetson/services/ingress/config/\ncd ai_nvr\nsudo docker compose -f compose_agx.yaml up -d --force-recreate\n
"},{"location":"tutorial_mmj.html#4-download-some-sample-videos-and-upload-them-to-nvstreamer","title":"4. Download some sample videos and upload them to NVStreamer","text":"Download them from here.
unzip files.zip\n
Ok, now, this is important to understand, there are 2 web interfaces: The NVStream Streamer Dashboard, running in: http://localhost:31000 The NVStreamer Camera Management Dashboard, running in: http://localhost:31000 So, first we need to upload the file in the Streamer interface, it looks like this:
There, go to File Upload, and drag and drop the file in the upload squared area.
After uploading it, go to the Dashboad option of the left menu, and copy the RTSP URL of the video you just uploaded, you will need it for the Camera Management Dashboard.
Now jump to the Camera Management Dashboard (http://localhost:30080/vst), it looks like this:
Go to the Camera Management option of the menu, then use the Add device manually option, and paste the RTSP URL, add the name of your video to the Name and Location text boxes, so it will be displayed on top of the stream.
Finally, click in the Live Streams option of the left menu, and you should be able to watch your video stream.
"},{"location":"tutorial_mmj.html#5-watch-rtsp-ai-processed-streaming-from-vlc","title":"5. Watch RTSP AI processed streaming from VLC","text":"Open VLC from another computer (localhost doesn't work here), and point to your Jetson Orin's IP address (you should be in the same network, or not having a firewal to access).
The easiest way to get Jetson's ip is launching:
ifconfig\n
And checking the IP of the interface (usually wlan0, inet IP). Then go to rtsp://[JETSON_IP]:8555/ds-test using VLC like this:
"},{"location":"tutorial_mmj.html#6-android-app","title":"6. Android app","text":"There is an Android app that allows you to track events and create areas of interest to monitor, you can find it on Google Play as AI NVR.
Here is a quick walkthough where you can see how to:
Add the IP address of the Jetson Track current events Add new areas of interest Add tripwire to track the flux and direction of events
"},{"location":"tutorial_nano-llm.html","title":"NanoLLM - Optimized LLM Inference","text":"NanoLLM
is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM\u2019s, multimodality, speech services, vector databases with RAG, and web frontends like Agent Studio.
It provides similar APIs to HuggingFace, backed by highly-optimized inference libraries and quantization tools:
NanoLLM Reference Documentationfrom nano_llm import NanoLLM\n\nmodel = NanoLLM.from_pretrained(\n \"meta-llama/Meta-Llama-3-8B-Instruct\", # HuggingFace repo/model name, or path to HF model checkpoint\n api='mlc', # supported APIs are: mlc, awq, hf\n api_token='hf_abc123def', # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)\n quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n\nresponse = model.generate(\"Once upon a time,\", max_new_tokens=128)\n\nfor token in response:\n print(token, end='', flush=True)\n
"},{"location":"tutorial_nano-llm.html#containers","title":"Containers","text":"To test a chat session with Llama from the command-line, install jetson-containers
and run NanoLLM like this:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Llama CLIAgent Studio jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api mlc \\\n --model meta-llama/Meta-Llama-3-8B-Instruct \\\n --prompt \"Can you tell me a joke about llamas?\"\n
jetson-containers run \\\n --env HUGGINGFACE_TOKEN=hf_abc123def \\\n $(autotag nano_llm) \\\n python3 -m nano_llm.studio\n
If you haven't already, request access to the Llama models on HuggingFace and substitute your account's API token above.
"},{"location":"tutorial_nano-llm.html#resources","title":"Resources","text":"Here's an index of the various tutorials & examples using NanoLLM on Jetson AI Lab:
Benchmarks Benchmarking results for LLM, SLM, VLM using MLC/TVM backend. API Examples Python code examples for chat, completion, and multimodal. Documentation Reference documentation for the NanoLLM model and agent APIs. Llamaspeak Talk verbally with LLMs using low-latency ASR/TTS speech models. Small LLM (SLM) Focus on language models with reduced footprint (7B params and below) Live LLaVA Realtime live-streaming vision/language models on recurring prompts. Nano VLM Efficient multimodal pipeline with one-shot image tagging and RAG support. Agent Studio Rapidly design and experiment with creating your own automation agents. OpenVLA Robot learning with Vision/Language Action models and manipulation in simulator."},{"location":"tutorial_nano-vlm.html","title":"NanoVLM - Efficient Multimodal Pipeline","text":"We saw in the previous LLaVA tutorial how to run vision-language models through tools like text-generation-webui
and llama.cpp
. In a similar vein to the SLM page on Small Language Models, here we'll explore optimizing VLMs for reduced memory usage and higher performance that reaches interactive levels (like in Liva LLava). These are great for fitting on Orin Nano and increasing the framerate.
There are 3 model families currently supported: Llava, VILA, and Obsidian (mini VLM)
"},{"location":"tutorial_nano-vlm.html#vlm-benchmarks","title":"VLM Benchmarks","text":"This FPS measures the end-to-end pipeline performance for continuous streaming like with Live Llava (on yes/no question)
"},{"location":"tutorial_nano-vlm.html#multimodal-chat","title":"Multimodal Chat","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f
Running one of the following versions of JetPack:
JetPack 6 (L4T r36)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>10GB
) Supported VLM models in NanoLLM
:
liuhaotian/llava-v1.5-7b
, liuhaotian/llava-v1.5-13b
, liuhaotian/llava-v1.6-vicuna-7b
, liuhaotian/llava-v1.6-vicuna-13b
Efficient-Large-Model/VILA-2.7b
,Efficient-Large-Model/VILA-7b
, Efficient-Large-Model/VILA-13b
Efficient-Large-Model/VILA1.5-3b
,Efficient-Large-Model/Llama-3-VILA1.5-8B
, Efficient-Large-Model/VILA1.5-13b
VILA-2.7b
, VILA1.5-3b
, VILA-7b
, Llava-7b
, and Obsidian-3B
can run on Orin Nano 8GB The optimized NanoLLM
library uses MLC/TVM for quantization and inference provides the highest performance. It efficiently manages the CLIP embeddings and KV cache. You can find Python code for the chat program used in this example here.
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32\n
This starts an interactive console-based chat with Llava, and on the first run the model will automatically be downloaded from HuggingFace and quantized using MLC and W4A16 precision (which can take some time). See here for command-line options.
You'll end up at a >> PROMPT:
in which you can enter the path or URL of an image file, followed by your question about the image. You can follow-up with multiple questions about the same image. Llava does not understand multiple images in the same chat, so when changing images, first reset the chat history by entering clear
or reset
as the prompt. VILA supports multiple images (area of active research)
"},{"location":"tutorial_nano-vlm.html#automated-prompts","title":"Automated Prompts","text":"During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'what does the road sign say?' \\\n --prompt 'what kind of environment is it?' \\\n --prompt 'reset' \\\n --prompt '/data/images/lake.jpg' \\\n --prompt 'please describe the scene.' \\\n --prompt 'are there any hazards to be aware of?'\n
You can also use --prompt /data/prompts/images.json
to run the test sequence, the results of which are in the table below.
"},{"location":"tutorial_nano-vlm.html#results","title":"Results","text":"\u2022 \u00a0 The model responses are with 4-bit quantization enabled, and are truncated to 128 tokens for brevity. \u2022 \u00a0 These chat questions and images are from /data/prompts/images.json
(found in jetson-containers)
"},{"location":"tutorial_nano-vlm.html#json","title":"JSON","text":"When prompted, these models can also output in constrained JSON formats (which the LLaVA authors cover in their LLaVA-1.5 paper), and can be used to programatically query information about the image:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model liuhaotian/llava-v1.5-13b \\\n --prompt '/data/images/hoover.jpg' \\\n --prompt 'extract any text from the image as json'\n\n{\n \"sign\": \"Hoover Dam\",\n \"exit\": \"2\",\n \"distance\": \"1 1/2 mile\"\n}\n
"},{"location":"tutorial_nano-vlm.html#web-ui","title":"Web UI","text":"To use this through a web browser instead, see the llamaspeak tutorial:
"},{"location":"tutorial_nano-vlm.html#live-streaming","title":"Live Streaming","text":"These models can also be used with the Live Llava agent for continuous streaming - just substitute the desired model name below:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.agents.video_query --api=mlc \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-context-len 256 \\\n --max-new-tokens 32 \\\n --video-input /dev/video0 \\\n --video-output webrtc://@:8554/output\n
Then navigate your browser to https://<IP_ADDRESS>:8050
after launching it with your camera. Using Chrome or Chromium is recommended for a stable WebRTC connection, with chrome://flags#enable-webrtc-hide-local-ips-with-mdns
disabled.
The Live Llava tutorial shows how to enable additional features like vector database integration, image tagging, and RAG.
"},{"location":"tutorial_nano-vlm.html#video-sequences","title":"Video Sequences","text":"The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action & behavior analysis, change detection, and other temporal-based vision functions. By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model. The vision/video.py
example shows how to use this:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.vision.video \\\n --model Efficient-Large-Model/VILA1.5-3b \\\n --max-images 8 \\\n --max-new-tokens 48 \\\n --video-input /data/my_video.mp4 \\\n --video-output /data/my_output.mp4 \\\n --prompt 'What changes occurred in the video?'\n
"},{"location":"tutorial_nano-vlm.html#python-code","title":"Python Code","text":"For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs.
You can use this to implement customized prompting techniques and integrate with other vision pipelines. This code applies the same set of prompts to the latest image from the video feed. See here for the version that does multi-image queries on video sequences.
"},{"location":"tutorial_nanodb.html","title":"Tutorial - NanoDB","text":"Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
16GB
for container image 40GB
for MS COCO dataset Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_nanodb.html#how-to-start","title":"How to start","text":""},{"location":"tutorial_nanodb.html#download-coco","title":"Download COCO","text":"Just for an example, let's use MS COCO dataset:
cd jetson-containers\nmkdir -p data/datasets/coco/2017\ncd data/datasets/coco/2017\n\nwget http://images.cocodataset.org/zips/train2017.zip\nwget http://images.cocodataset.org/zips/val2017.zip\nwget http://images.cocodataset.org/zips/unlabeled2017.zip\n\nunzip train2017.zip\nunzip val2017.zip\nunzip unlabeled2017.zip\n
"},{"location":"tutorial_nanodb.html#download-index","title":"Download Index","text":"You can download a pre-indexed NanoDB that was already prepared over the COCO dataset from here:
cd jetson-containers/data\nwget https://nvidia.box.com/shared/static/icw8qhgioyj4qsk832r4nj2p9olsxoci.gz -O nanodb_coco_2017.tar.gz\ntar -xzvf nanodb_coco_2017.tar.gz\n
This allow you to skip the indexing process in the next step, and jump to starting the Web UI.
"},{"location":"tutorial_nanodb.html#indexing-data","title":"Indexing Data","text":"If you didn't download the NanoDB index for COCO from above, we need to build the index by scanning your dataset directory:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --scan /data/datasets/coco/2017 \\\n --path /data/nanodb/coco/2017 \\\n --autosave --validate \n
This will take a few hours on AGX Orin. Once the database has loaded and completed any start-up operations , it will drop down to a >
prompt from which the user can run search queries. You can quickly check the operation by typing your query on this prompt:
> a girl riding a horse\n\n* index=80110 /data/datasets/coco/2017/train2017/000000393735.jpg similarity=0.29991915822029114\n* index=158747 /data/datasets/coco/2017/unlabeled2017/000000189708.jpg similarity=0.29254037141799927\n* index=123846 /data/datasets/coco/2017/unlabeled2017/000000026239.jpg similarity=0.292171448469162\n* index=127338 /data/datasets/coco/2017/unlabeled2017/000000042508.jpg similarity=0.29118549823760986\n* index=77416 /data/datasets/coco/2017/train2017/000000380634.jpg similarity=0.28964102268218994\n* index=51992 /data/datasets/coco/2017/train2017/000000256290.jpg similarity=0.28929752111434937\n* index=228640 /data/datasets/coco/2017/unlabeled2017/000000520381.jpg similarity=0.28642547130584717\n* index=104819 /data/datasets/coco/2017/train2017/000000515895.jpg similarity=0.285491943359375\n
You can press Ctrl+C to exit. For more info about the various options available, see the NanoDB container documentation.
"},{"location":"tutorial_nanodb.html#interactive-web-ui","title":"Interactive Web UI","text":"Spin up the Gradio server:
jetson-containers run $(autotag nanodb) \\\n python3 -m nanodb \\\n --path /data/nanodb/coco/2017 \\\n --server --port=7860\n
Then navigate your browser to http://<IP_ADDRESS>:7860
, and you can enter text search queries as well as drag/upload images:
To use the dark theme, navigate to http://<IP_ADDRESS>:7860/?__theme=dark
instead"},{"location":"tutorial_ollama.html","title":"Tutorial - Ollama","text":"
Ollama is a popular open-source tool that allows users to easily run a large language models (LLMs) locally on their own computer, serving as an accessible entry point to LLMs for many.
It now offers out-of-the-box support for the Jetson platform with CUDA support, enabling Jetson users to seamlessly install Ollama with a single command and start using it immediately.
In this tutorial, we introduce two installation methods: (1) the default native installation using the official Ollama installer, and (2) the Docker container method, which allows users to avoid making changes to their existing system.
The ollama
client can run inside or outside container after starting the server. You can also run an Open WebUI server for supporting web clients. Supports the latest models like Llama-3 and Phi-3 Mini! "},{"location":"tutorial_ollama.html#ollama-server","title":"Ollama Server","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7GB
for ollama
container image Space for models (>5GB
) "},{"location":"tutorial_ollama.html#1-native-install","title":"(1) Native Install","text":"Ollama's official installer already support Jetson and can easily install CUDA-supporting Ollama.
curl -fsSL https://ollama.com/install.sh | sh \n
It create a service to run ollama serve
on start up, so you can start using ollama
command right away.
"},{"location":"tutorial_ollama.html#example-ollama-usage","title":"Example: Ollama usage","text":"ollama\n
"},{"location":"tutorial_ollama.html#example-run-a-model-on-cli","title":"Example: run a model on CLI","text":"ollama run llama3.2:3b\n
"},{"location":"tutorial_ollama.html#2-docker-container-for-ollama-using-jetson-containers","title":"(2) Docker container for ollama
using jetson-containers
","text":"# models cached under jetson-containers/data\njetson-containers run --name ollama $(autotag ollama)\n\n# models cached under your user's home directory\ndocker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0\n
Running either of these will start the local Ollama server as a daemon in the background. It will save the models it downloads under your mounted jetson-containers/data/models/ollama
directory (or another directory that you override with OLLAMA_MODELS
)
Start the Ollama command-line chat client with your desired model (for example: llama3
, phi3
, mistral
)
# if running inside the same container as launched above\n/bin/ollama run phi3\n\n# if launching a new container for the client in another terminal\njetson-containers run $(autotag ollama) /bin/ollama run phi3\n
Or you can install Ollama's binaries for arm64 outside of container (without CUDA, which only the server needs)
# download the latest ollama release for arm64 into /bin\nsudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort=\"version:refname\" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama\nsudo chmod +x /bin/ollama\n\n# use the client like normal outside container\n/bin/ollama run phi3\n
"},{"location":"tutorial_ollama.html#open-webui","title":"Open WebUI","text":"To run an Open WebUI server for client browsers to connect to, use the open-webui
container:
docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main\n
You can then navigate your browser to http://JETSON_IP:8080
, and create a fake account to login (these credentials are only local)
Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the Llava page. It gets roughly half of peak performance versus the faster APIs like NanoLLM, but is generally considered fast enough for text chat.
"},{"location":"tutorial_openwebui.html","title":"Tutorial - Open WebUI","text":"Open WebUI is a versatile, browser-based interface for running and managing large language models (LLMs) locally, offering Jetson developers an intuitive platform to experiment with LLMs on their devices.
It can work with Ollama as a backend as well as other backend that is compatible with OpenAI, which can also run well on Jetson.
"},{"location":"tutorial_openwebui.html#ollama-server","title":"Ollama Server","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7GB
for open-webui
container image sudo docker run -d --network=host \\\n -v ${HOME}/open-webui:/app/backend/data \\\n -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \\\n --name open-webui \\\n --restart always \\ \n ghcr.io/open-webui/open-webui:main\n
"},{"location":"tutorial_openwebui.html#ollama-backend","title":"Ollama backend","text":"If you have installed Ollama, you can just run the Open WebUI docker container without installing any other things.
sudo docker run -d --network=host \\\n -v ${HOME}/open-webui:/app/backend/data \\\n -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \\\n --name open-webui \\\n --restart always \\ \n ghcr.io/open-webui/open-webui:main\n
"},{"location":"tutorial_slm.html","title":"Tutorial - Small Language Models (SLM)","text":"Small Language Models (SLMs) represent a growing class of language models that have <7B parameters - for example StableLM, Phi-2, and Gemma-2B. Their smaller memory footprint and faster performance make them good candidates for deploying on Jetson Orin Nano. Some are very capable with abilities at a similar level as the larger models, having been trained on high-quality curated datasets.
This tutorial shows how to run optimized SLMs with quantization using the NanoLLM
library and MLC/TVM backend. You can run these models through tools like text-generation-webui
and llama.cpp as well, just not as fast - and since the focus of SLMs is reduced computational and memory requirements, here we'll use the most optimized path available. Those shown below have been profiled:
"},{"location":"tutorial_slm.html#slm-benchmarks","title":"SLM Benchmarks","text":"\u2022 \u00a0 The HuggingFace Open LLM Leaderboard is a collection of multitask benchmarks including reasoning & comprehension, math, coding, history, geography, ect. \u2022 \u00a0 The model's memory footprint includes 4-bit weights and KV cache at full context length (factor in extra for process overhead, library code, ect) \u2022 \u00a0 The Chat Model
is the instruction-tuned variant for chatting with in the commands below, as opposed to the base completion model.
Based on user interactions, the recommended models to try are stabilityai/stablelm-zephyr-3b
and princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT
, for having output quality on par with Llama-2-7B and well-optimized neural architectures. These models have also been used as the base for various fine-tunes (for example Nous-Capybara-3B-V1.9
) and mini VLMs. Others may not be particularly coherent.
"},{"location":"tutorial_slm.html#chatting-with-slms","title":"Chatting with SLMs","text":"What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
22GB
for nano_llm
container image Space for models (>5GB
) Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
The nano_llm.chat
program will automatically download and quantize models from HuggingFace like those listed in the table above:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT\n
\u2022 \u00a0 For models requiring authentication, use --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN>
\u2022 \u00a0 Press Ctrl+C twice in succession to exit (once will interrupt bot output)
This will enter into interactive mode where you chat back and forth using the keyboard (entering reset
will clear the chat history)
"},{"location":"tutorial_slm.html#automated-prompts","title":"Automated Prompts","text":"During testing, you can specify prompts on the command-line that will run sequentially:
jetson-containers run $(autotag nano_llm) \\\n python3 -m nano_llm.chat --api=mlc \\\n --model stabilityai/stablelm-zephyr-3b \\\n --max-new-tokens 512 \\\n --prompt 'hi, how are you?' \\\n --prompt 'whats the square root of 900?' \\\n --prompt 'can I get a recipie for french onion soup?'\n
You can also load JSON files containing prompt sequences, like with --prompt /data/prompts/qa.json
(the output of which is below)
"},{"location":"tutorial_slm.html#results","title":"Results","text":"\u2022 \u00a0 The model responses are with 4-bit quantization, and are truncated to 256 tokens for brevity. \u2022 \u00a0 These chat questions are from /data/prompts/qa.json
(found in jetson-containers)
"},{"location":"tutorial_slm.html#nemotron-mini","title":"Nemotron Mini","text":"Nemotron-Mini-4B-Instruct is a 4B SLM tuned for on-device deployment, RAG, and function calling and is based on Minitron-4B (pruned and distilled from Nemotron4 15B). Inference on Jetson is available through HuggingFace Transformers and llama.cpp for quantization. Here's how to run a local OpenAI-compatible server with llama.cpp and 4-bit quantized GGUF:
jetson-containers run $(autotag llama_cpp) \\\n llama-server \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --host 0.0.0.0 \\\n --port 8080\n
For a quick test, you can navigate your browser to http://JETSON_IP:8080
, connect other clients like Open WebUI, or have applications send requests to your server's OpenAI chat completion endpoints (i.e. from openai-python, REST, JavaScript, ect)
You can more easily see the performance with the llama-cli
tool:
jetson-containers run $(autotag llama_cpp) \\\n llama-cli \\\n --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n --gpu-layers 34 \\\n --seed 42 \\\n --ignore-eos \\\n -n 128 \\\n -p \"The meaning to life and the universe is\"\n
# Jetson AGX Orin\nllama_print_timings: load time = 1408.27 ms\nllama_print_timings: sample time = 70.05 ms / 128 runs ( 0.55 ms per token, 1827.32 tokens per second)\nllama_print_timings: prompt eval time = 120.08 ms / 9 tokens ( 13.34 ms per token, 74.95 tokens per second)\nllama_print_timings: eval time = 3303.93 ms / 127 runs ( 26.02 ms per token, 38.44 tokens per second)\nllama_print_timings: total time = 3597.17 ms / 136 tokens\n
The model can also be previewed at build.nvidia.com (example client requests for OpenAI API are also there)
"},{"location":"tutorial_slm.html#llama-32","title":"Llama 3.2","text":"Meta has released multilingual 1B and 3B SLMs in the latest additions to the Llama family with Llama-3.2-1B
and Llama-3.2-3B
. These can be run with INT4 quantization using the latest MLC container for Jetson (dustynv/mlc:0.1.2-r36.3.0
). After having requested access to the models from Meta with your HuggingFace API key, you can download, quantize, and benchmark them with these commands:
HUGGINGFACE_KEY=YOUR_API_KEY \\\nMLC_VERSION=0.1.2 \\\n jetson-containers/packages/llm/mlc/benchmark.sh \\\n meta-llama/Llama-3.2-1B\n
Llama-3.2-1B
\u00a0 Jetson Orin Nano 54.8 tokens/sec, Jetson AGX Orin 163.9 tokens/sec Llama-3.2-3B
\u00a0 Jetson Orin Nano 27.7 tokens/sec, Jetson AGX Orin 80.4 tokens/sec The Llama-3.2 SLMs use the same core Llama architecture as previous Llama releases (except tie_word_embeddings=True
), so it is already supported with quantization and full performance on edge devices. Thanks to Meta for continuing to advance open generative AI models with Llama.
"},{"location":"tutorial_stable-diffusion-xl.html","title":"Tutorial - Stable Diffusion XL","text":"Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities. All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.
What you need
One of the following Jetson devices (SDXL requires >= ~13GB memory)
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image 12.4GB
for SDXL models Have followed the previous stable-diffusion-webui
tutorial and have the webserver container running.
"},{"location":"tutorial_stable-diffusion-xl.html#downloading-sdxl-models","title":"Downloading SDXL Models","text":"Stable Diffusion XL is supported through AUTOMATIC1111's stable-diffusion-webui
with some additional settings. First you need to download the SDXL models to your jetson-containers
data directory (which is automatically mounted into the container)
# run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device\nCONTAINERS_DIR=/path/to/your/jetson-containers\nMODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/\n\nsudo chown -R $USER $MODEL_DIR\n\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors\n
It's assumed that you already have the stable-diffusion-webui
container and webserver running from the previous tutorial.
"},{"location":"tutorial_stable-diffusion-xl.html#sdxl-settings","title":"SDXL Settings","text":"After the models have finished downloading, click the \ud83d\udd04 button to refresh the model list in the web UI. Select sd_xl_base_1.0.safetensors
from the Stable Diffusion checkpoint drop-down:
Then under the Generation tab, expand the Refiner section, and select sd_xl_refiner_1.0.safetensors
from the drop-down:
Guidance on other relevant settings:
Change the width/height to 1024x1024. SDXL works best at higher resolutions, and using 512x512 often results in more simplistic/cartoonish content. Changing image resolutions impacts the actual scene contents of the image, not just the details. The refiner's Switch at
setting dictates the step at which the refiner takes over from the base model. At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image. Typical Sampling steps
are between 20-40 and Switch at
is between 0.7-0.9. This takes experimentation to find the best combination for the characteristics of your desired output. Extensive negative prompts are not as necessary in the same way as SD 1.5 was (e.g. low quality, jpeg artifacts, blurry, ect
) CFG Scale
controls how closely the model conforms to your prompt versus how creative it is. When you get a good image, remember to save your random seed and settings so you can re-generate it later!
"},{"location":"tutorial_stable-diffusion-xl.html#results","title":"Results","text":" photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)
a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)
small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)
small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
"},{"location":"tutorial_stable-diffusion.html","title":"Tutorial - Stable Diffusion","text":"Let's run AUTOMATIC1111's stable-diffusion-webui
on NVIDIA Jetson to generate images from our prompts!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image 4.1GB
for SD 1.5 model "},{"location":"tutorial_stable-diffusion.html#setup-a-container-for-stable-diffusion-webui","title":"Setup a container for stable-diffusion-webui","text":"The jetson-containers project provides pre-built Docker images for stable-diffusion-webui
. You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
"},{"location":"tutorial_stable-diffusion.html#how-to-start","title":"How to start","text":"Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag stable-diffusion-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/stable-diffusion-webui && python3 launch.py \\\n --data=/data/models/stable-diffusion \\\n --enable-insecure-extension-access \\\n --xformers \\\n --listen \\\n --port=7860\n
You should see it downloading the model checkpoint on the first run.
Open your browser and access http://<IP_ADDRESS>:7860
"},{"location":"tutorial_stable-diffusion.html#results-output-examples","title":"Results / Output Examples","text":""},{"location":"tutorial_stable-diffusion.html#stable-diffusion-xl","title":"Stable Diffusion XL","text":"To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!
Want to explore using Python APIs to run diffusion models directly? See jetson-containers/stable-diffusion
.
"},{"location":"tutorial_text-generation.html","title":"Tutorial - text-generation-webui","text":"Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui
on NVIDIA Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.2GB
for container image Spaces for models "},{"location":"tutorial_text-generation.html#set-up-a-container-for-text-generation-webui","title":"Set up a container for text-generation-webui","text":"The jetson-containers project provides pre-built Docker images for text-generation-webui
along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect). You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
Info
JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the stable-diffusion-webui
and text-generation-webui
. You can find the complete article with detailed instructions here.
"},{"location":"tutorial_text-generation.html#how-to-start","title":"How to start","text":"Use jetson-containers run
and autotag
tools to automatically pull or build a compatible container image:
jetson-containers run $(autotag text-generation-webui)\n
The container has a default run command (CMD
) that will automatically start the webserver like this:
cd /opt/text-generation-webui && python3 server.py \\\n --model-dir=/data/models/text-generation-webui \\\n --chat \\\n --listen\n
Open your browser and access http://<IP_ADDRESS>:7860
.
"},{"location":"tutorial_text-generation.html#download-a-model-on-web-ui","title":"Download a model on web UI","text":"See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using download-model.py
jetson-containers run --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \\\n 'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'\n
From within the web UI, select Model tab and navigate to \"Download model or LoRA\" section.
You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub). Then click the Download button.
"},{"location":"tutorial_text-generation.html#gguf-models","title":"GGUF models","text":"The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.
You can download a single model file for a particular quantization, like *.Q4_K_M.bin
. Input the file name and hit Download button.
Model Quantization Memory (MB) TheBloke/Llama-2-7b-Chat-GGUF
llama-2-7b-chat.Q4_K_M.gguf
5,268 TheBloke/Llama-2-13B-chat-GGUF
llama-2-13b-chat.Q4_K_M.gguf
8,609 TheBloke/LLaMA-30b-GGUF
llama-30b.Q4_K_S.gguf
19,045 TheBloke/Llama-2-70B-chat-GGUF
llama-2-70b-chat.Q4_K_M.gguf
37,655
Info
"},{"location":"tutorial_text-generation.html#model-selection-for-jetson-orin-nano","title":"Model selection for Jetson Orin Nano","text":"Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below. The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano. Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.
"},{"location":"tutorial_text-generation.html#load-a-model","title":"Load a model","text":"After you have downloaded a model, click the \ud83d\udd04 button to refresh your model list, and select the model you want to use.
For a GGUF model, remember to
Set n-gpu-layers
to 128
Set n_gqa
to 8
if you using Llama-2-70B (on Jetson AGX Orin 64GB) Then click the Load button.
"},{"location":"tutorial_text-generation.html#chat-template","title":"Chat Template","text":"If you're using a Llama model fine-tuned for chat, like the models listed above (except for LLaMA-30b
), you need to use the oobabooga Instruct mode and set the template. On the Parameters tab, go to the Instruction Template sub-tab, then select Llama-v2
from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)
For the base text completion models (like LLaMA-30b
), use the Default or Notebook tab.
Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior. There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like temperature
and top_p
for controlling randomness.
Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode). Then you can start chatting with the LLM!
"},{"location":"tutorial_text-generation.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html#things-to-do-with-your-llm","title":"Things to do with your LLM","text":"Here are some common test prompts for coding, math, history ect. You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else. Also Llama-2 is quite playful and likes to play games to test its logic abilities!
>> What games do you like to play?\n\nI'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:\n\n- 20 Questions: You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions. The goal is to guess the object in as few questions as possible.\n- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!\n- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!\n- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!\n- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own\n- Jokes: I love to tell jokes and make people laugh with my \"Dad humor\"! Knock knock! *giggles*\n
"},{"location":"tutorial_text-generation.html#model-size-tested","title":"Model size tested","text":"With llama.cpp, GGUF model, 4-bit quantization:
Model size Jetson AGX Orin 64GB Jetson AGX Orin 32GB Jetson Orin Nano 8GB 70B model \u2705 30B model \u2705 \u2705 13B model \u2705 \u2705 7B model \u2705 \u2705 \u2705 Want to explore using Python APIs to run LLMs directly? See jetson-containers
for its LLM related packages and containers.
Limited to 7B model (4-bit quantized).\u00a0\u21a9
"},{"location":"tutorial_ultralytics.html","title":"Tutorial - Ultralytics YOLOv8","text":"Let's run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT.
Ultralytics YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Nano (4GB)
Running one of the following versions of JetPack:
JetPack 4 (L4T r32.x) JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
"},{"location":"tutorial_ultralytics.html#how-to-start","title":"How to start","text":"Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.
JetPack 4JetPack 5JetPack 6 t=ultralytics/ultralytics:latest-jetson-jetpack4\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack5\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
t=ultralytics/ultralytics:latest-jetson-jetpack6\nsudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t\n
"},{"location":"tutorial_ultralytics.html#convert-model-to-tensorrt-and-run-inference","title":"Convert model to TensorRT and run inference","text":"The YOLOv8n model in PyTorch format is converted to TensorRT to run inference with the exported model.
Example
PythonCLI from ultralytics import YOLO\n\n# Load a YOLOv8n PyTorch model\nmodel = YOLO(\"yolov8n.pt\")\n\n# Export the model\nmodel.export(format=\"engine\") # creates 'yolov8n.engine'\n\n# Load the exported TensorRT model\ntrt_model = YOLO(\"yolov8n.engine\")\n\n# Run inference\nresults = trt_model(\"https://ultralytics.com/images/bus.jpg\")\n
# Export a YOLOv8n PyTorch model to TensorRT format\nyolo export model=yolov8n.pt format=engine # creates 'yolov8n.engine'\n\n# Run inference with the exported model\nyolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'\n
Manufacturing Sports Wildlife Vehicle Spare Parts Detection Football Player Detection Tiger pose Detection Note
Visit the Export page to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when dynamic=False
. To change the input source for inference, please refer to Model Prediction page.
"},{"location":"tutorial_ultralytics.html#benchmarks","title":"Benchmarks","text":"Benchmarks of the YOLOv8 variants with TensorRT were run by Seeed Studio on their reComputer systems:
Xavier NX 8GBOrin NX 16GBAGX Orin 32GB Model PyTorch FP32 FP16 INT8 YOLOv8n 32 63 120 167 YOLOv8s 25 26 69 112 YOLOv8m 11 11 33 56 YOLOv8l 6 6 20 38 Model PyTorch FP32 FP16 INT8 YOLOv8n 56 115 204 256 YOLOv8s 53 67 128 196 YOLOv8m 26 31 63 93 YOLOv8l 16 20 42 69 Model PyTorch FP32 FP16 INT8 YOLOv8n 77 192 323 385 YOLOv8s 67 119 213 303 YOLOv8m 40 56 105 145 YOLOv8l 27 38 73.5 114 FP32/FP16/INT8 with TensorRT (frames per second) Original post with the benchmarks are found here "},{"location":"tutorial_ultralytics.html#further-reading","title":"Further reading","text":"To learn more, visit our comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson including benchmarks!
Note
Ultralytics YOLOv8 models are offered under AGPL-3.0 License
which is an OSI-approved open-source license and is ideal for students and enthusiasts, promoting open collaboration and knowledge sharing. See the LICENSE file for more details.
"},{"location":"tutorial_ultralytics.html#one-click-run-ultralytics-yolo-on-jetson-orin-by-seeed-studio-jetson-examples","title":"One-Click Run Ultralytics YOLO on Jetson Orin - by Seeed Studio jetson-examples","text":""},{"location":"tutorial_ultralytics.html#quickstart","title":"Quickstart \u26a1","text":" Install the package:
pip install jetson-examples\n
Restart your reComputer:
sudo reboot\n
Run Ultralytics YOLO on Jetson with one command:
reComputer run ultralytics-yolo\n
Enter http://127.0.0.1:5001
or http://device_ip:5001
in your browser to access the Web UI.
For more details, please read: Jetson-Example: Run Ultralytics YOLO Platform Service on NVIDIA Jetson Orin .
"},{"location":"tutorial_ultralytics.html#follow-ultralytics-to-stay-updated","title":"Follow Ultralytics to stay updated!","text":""},{"location":"tutorial_voicecraft.html","title":"Tutorial - VoiceCraft","text":"Let's run VoiceCraft, a Zero-Shot Speech Editing and Text-to-Speech in the Wild!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
15.6 GB
for voicecraft
container image Space for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_voicecraft.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag voicecraft)\n
The container has a default run command (CMD
) that will automatically start the Gradio app.
Open your browser and access http://<IP_ADDRESS>:7860
.
"},{"location":"tutorial_voicecraft.html#gradio-app","title":"Gradio app","text":"VoiceCraft repo comes with Gradio demo app.
Select which models you want to use, I recommend using 330M_TTSEnhanced on 32GB AGX Orin Click load, if you run it for the first time, models are downloaded from huggingface, otherwise are loaded from /data
folder, where are saved to from previous runs Upload audio file of your choice (MP3/wav) Click transcribe, it will use whisper to get transcription along with start/end time of each word spoken Now you can edit the sentence, or use TTS. Click Run to generate output.
Warning
For TTS it's okay to use only first few seconds of audio as prompt, since it consumes a lot of memory. On AGX 32GB Orin the maximal TTS length of generated audio is around ~16 seconds in headless mode.
"},{"location":"tutorial_voicecraft.html#resources","title":"Resources","text":"If you want to know how it works under the hood, you can read following papers:
VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild High Fidelity Neural Audio Compression Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers "},{"location":"tutorial_whisper.html","title":"Tutorial - Whisper","text":"Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.1 GB
for whisper
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"tutorial_whisper.html#how-to-start","title":"How to start","text":"Use run.sh
and autotag
script to automatically pull or build a compatible container image.
jetson-containers run $(autotag whisper)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server, with SSL enabled.
Open your browser and access https://<IP_ADDRESS>:8888
.
Attention
Note it is https
(not http
).
HTTPS (SSL) connection is needed to allow ipywebrtc
widget to have access to your microphone (for record-and-transcribe.ipynb
).
You will see a warning message like this.
Press \"Advanced\" button and then click on \"Proceed to (unsafe)\" link to proceed to the Jupyter Lab web interface.
The default password for Jupyter Lab is nvidia
.
"},{"location":"tutorial_whisper.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"Whisper repo comes with demo Jupyter notebooks, which you can find under /notebooks/
directory.
jetson-containers
also adds one convenient notebook (record-and-transcribe.ipynb
) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio.
"},{"location":"tutorial_whisper.html#record-and-transcribeipynb","title":"record-and-transcribe.ipynb
","text":"This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's medium
model to transcribe the audio sample.
It uses Jupyter notebook/lab's ipywebrtc
extension to record an audio sample on your web browser.
Attention
When you click the \u23fa botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.
Final check Once done, if you click on the \"\u26a0 Not secure\" part in the URL bar, you should see something like this.
"},{"location":"tutorial_whisper.html#result","title":"Result","text":"Once you go through all the steps, you should see the transcribe result in text like this.
"},{"location":"vit/index.html","title":"Vision Transformers & Computer Vision","text":"You can quickly run these realtime Vision Transformers (ViT) and computer vision models onboard your Jetson:
"},{"location":"vit/index.html#efficient-vit","title":"Efficient ViT","text":""},{"location":"vit/index.html#nanosam","title":"NanoSAM","text":""},{"location":"vit/index.html#nanoowl","title":"NanoOWL","text":""},{"location":"vit/index.html#sam","title":"SAM","text":""},{"location":"vit/index.html#tam","title":"TAM","text":""},{"location":"vit/index.html#ultralytics-yolov8","title":"Ultralytics YOLOv8","text":""},{"location":"vit/tutorial_efficientvit.html","title":"Tutorial - EfficientViT","text":"Let's run MIT Han Lab's EfficientViT on Jetson!
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
10.9 GB
for efficientvit
container image Space for checkpoints Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_efficientvit.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag efficientvit)\n
"},{"location":"vit/tutorial_efficientvit.html#usage-of-efficientvit","title":"Usage of EfficientViT","text":"The official EfficientViT repo shows the complete usage information: https://github.com/mit-han-lab/efficientvit#usage
"},{"location":"vit/tutorial_efficientvit.html#run-examplebenchmark","title":"Run example/benchmark","text":"Inside the container, a small benchmark script benchmark.py
is added under /opt/efficientvit
directory by the jetson-container build process.
It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.
"},{"location":"vit/tutorial_efficientvit.html#download-l2pt-model","title":"Download l2.pt
model","text":"mkdir -p /data/models/efficientvit/sam/\ncd /data/models/efficientvit/sam/\nwget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt\n
The downloaded checkpoint file is stored on the /data/
directory that is mounted from the Docker host.
"},{"location":"vit/tutorial_efficientvit.html#run-benchmark-script","title":"Run benchmark script","text":"cd /opt/efficientvit\npython3 ./benchmark.py\n
At the end you should see a summary like the following.
AVERAGE of 2 runs:\n encoder --- 0.062 sec\n latency --- 0.083 sec\nMemory consumption : 3419.68 MB\n
"},{"location":"vit/tutorial_efficientvit.html#check-the-outputresult","title":"Check the output/result","text":"The output image file (of the last inference result) is stored as /data/benchmarks/efficientvit_sam_demo.png
.
It is stored under /data/
directory that is mounted from the Docker host. So you can go back to your host machine, and check jetson-containers/data/benchmark/
directory.
You should find the output like this.
"},{"location":"vit/tutorial_nanoowl.html","title":"Tutorial - NanoOWL","text":"Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
7.2 GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_nanoowl.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl)\n
"},{"location":"vit/tutorial_nanoowl.html#how-to-run-the-tree-prediction-live-camera-example","title":"How to run the tree prediction (live camera) example","text":" Ensure you have a camera device connected
ls /dev/video*\n
If no video device is found, exit from the container and check if you can see a video device on the host side.
Launch the demo
cd examples/tree_demo\npython3 tree_demo.py ../../data/owl_image_encoder_patch32.engine\n
Info
If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.
python3 -m nanoowl.build_image_encoder_engine \\\n data/owl_image_encoder_patch32.engine\n
Second, open your browser to http://<ip address>:7860
Type whatever prompt you like to see what works!
Here are some examples
Example: [a face [a nose, an eye, a mouth]]
Example: [a face (interested, yawning / bored)]
Example: (indoors, outdoors)
"},{"location":"vit/tutorial_nanoowl.html#result","title":"Result","text":""},{"location":"vit/tutorial_nanosam.html","title":"Tutorial - NanoSAM","text":"Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.3GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_nanosam.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag nanosam)\n
"},{"location":"vit/tutorial_nanosam.html#run-examples","title":"Run examples","text":"Inside the container, you can move to /opt/nanosam
directory, to go through all the examples demonstrated on the repo.
cd /opt/nanosam\n
To run the \"Example 1 - Segment with bounding box\":
python3 examples/basic_usage.py \\\n --image_encoder=\"data/resnet18_image_encoder.engine\" \\\n --mask_decoder=\"data/mobile_sam_mask_decoder.engine\"\n
The result is saved under /opt/nanosam/data/basic_usage_out.jpg
.
To check on your host machine, you can copy that into /data
directory of the container where that is mounted from the host.
cp data/basic_usage_out.jpg /data/\n
Then you can go to your host system, and find the file under jetson-containers/data/basic_usage_out.jpg
"},{"location":"vit/tutorial_nanosam.html#results","title":"Results","text":""},{"location":"vit/tutorial_sam.html","title":"Tutorial - SAM (Segment Anything)","text":"Let's run Meta's SAM
on NVIDIA Jetson.
What you need
One of the following Jetson devices:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f1
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_sam.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag sam)\n
The container has a default run command (CMD
) that will automatically start the Jupyter Lab server.
Open your browser and access http://<IP_ADDRESS>:8888
.
The default password for Jupyter Lab is nvidia
.
"},{"location":"vit/tutorial_sam.html#run-jupyter-notebook","title":"Run Jupyter notebook","text":"In Jupyter Lab, navigate to notebooks
and open automatic_mask_generator_example.py
notebook.
Create a new cell at the top, insert the model download command below and run the cell.
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\n
Then go through executing all the cells below Set-up.
"},{"location":"vit/tutorial_sam.html#results","title":"Results","text":" The biggest vit_h
(2.4GB) model may not ran due to OOM, but vit_l
(1.1GB) runs on Jetson Orin Nano.\u00a0\u21a9
"},{"location":"vit/tutorial_tam.html","title":"Tutorial - SAM (Segment Anything)","text":"Let's run TAM
to perform Segment Anything on videos on NVIDIA Jetson.
What you need
One of the following Jetson:
Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)
Running one of the following versions of JetPack:
JetPack 5 (L4T r35.x)
Sufficient storage space (preferably with NVMe SSD).
6.8GB
for container image Spaces for models Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n
"},{"location":"vit/tutorial_tam.html#how-to-start","title":"How to start","text":"Use the jetson-containers run
and autotag
commands to automatically pull or build a compatible container image.
jetson-containers run $(autotag tam)\n
The container has a default run command (CMD
) that will automatically start TAM's web server.
Open your browser and access http://<IP_ADDRESS>:12212
.
"},{"location":"vit/tutorial_tam.html#tam-web-ui","title":"TAM web UI","text":"Check out the official tutorial to learn how to operate the web UI.
"},{"location":"vit/tutorial_tam.html#results","title":"Results","text":""},{"location":"vit/tutorial_tam.html#troubleshooting","title":"Troubleshooting","text":""},{"location":"vit/tutorial_tam.html#filenotfounderror-errno-2-no-such-file-or-directory-checkpointse2fgvi-hq-cvpr22pth","title":"FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/E2FGVI-HQ-CVPR22.pth'
","text":"You may find the TAM app fails to download a checkpoint file E2FGVI-HQ-CVPR22.pth
.
Downloading checkpoints from Google Drive... tips: If you cannot see the progress bar, please try to download it manuall and put it in the checkpointes directory. E2FGVI-HQ-CVPR22.pth: https://github.com/MCG-NKU/E2FGVI(E2FGVI-HQ model)\nAccess denied with the following error:\n\n Cannot retrieve the public link of the file. You may need to change\n the permission to 'Anyone with the link', or have had many accesses. \n\nYou may still be able to access the file from the browser:\n\n https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \n
You can manually download the checkpoint file on your Docker host machine.
cd jetson-containers/\npip install gdown\nsource ~/.profile\ngdown https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \nmv E2FGVI-HQ-CVPR22.pth ./data/models/tam/\n
And you can try running the TAM container.
jetson-containers run $(autotag tam)\n
"}]}
\ No newline at end of file
diff --git a/site/cosmos.html b/site/cosmos.html
new file mode 100644
index 0000000..887fc7e
--- /dev/null
+++ b/site/cosmos.html
@@ -0,0 +1,2284 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Cosmos - NVIDIA Jetson AI Lab
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Cosmos - World Foundation Models
+Cosmos is a world model development platform that consists of world foundation
+models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs.
+Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run
+inference scripts and generate videos.
+
+
+Special thanks to Johnny NΓΊΓ±ez Cano for porting the Cosmos and Transformer
+Engine Jetson!
+See Cosmos Official page by Nvidia.
+See Transformer Engine by Nvidia.
+
+
+
What you need
+
+
+One of the following Jetson devices:
+Jetson Thor (XGB)
+Jetson AGX Orin (64GB)
+Jetson AGX Orin (32GB)
+
+
+Running one of the following versions of JetPack :
+JetPack 6 (L4T r36.x)
+
+
+Sufficient storage space (preferably with NVMe SSD).
+
+12.26GB
for cosmos
container image
+Space for models and datasets (>50GB
)
+
+
+
+Clone and setup jetson-containers
:
+git clone https://github.com/dusty-nv/jetson-containers
+bash jetson-containers/install.sh
+
+
+
+
+
+
WARNING
+
Transformer Engine :
+
+Cosmos is optimized for NVIDIA ADA GPU architecture generations and later due running in FP8.
+Jetson AGX Orin is based on Ampere.
+Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later.
+
+
+Start Container
+Use this command to automatically run, build, or pull a compatible container image for cosmos:
+jetson-containers run $( autotag cosmos)
+
+To mount your own directories into the container, use the
+-v
or
+--volume
flags:
+jetson-containers run -v /path/on/host:/path/in/container $( autotag cosmos)
+
+Recommendation (This download all models outside docker container):
+git clone --recursive https://github.com/NVIDIA/Cosmos.git
+cd Cosmos
+jetson-containers run -it -v $( pwd ) :/workspace $( autotag cosmos)
+
+Follow the instructions from Cosmos repository.
+Here is the summarized steps to run the Cosmos models:
+Generate a Hugging Face access token. Set the access token to 'Read'
+permission (default is 'Fine-grained').
+
+Download Models:
+PYTHONPATH = $( pwd ) python3 cosmos1/scripts/download_diffusion.py --model_sizes 7B 14B --model_types Text2World Video2World
+
+Run Demo:
+PROMPT = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \
+The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \
+A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \
+suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \
+The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \
+field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
+
+PYTHONPATH = $( pwd ) python3 cosmos1/models/diffusion/inference/text2world.py \
+ --checkpoint_dir checkpoints \
+ --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
+ --prompt " $PROMPT " \
+ --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
+ --offload_tokenizer \
+ --offload_diffusion_transformer \
+ --offload_text_encoder_model \
+ --offload_prompt_upsampler \
+ --offload_guardrail_models
+
+It will generate a video file in the outputs
directory.
+
+
+
+
+Another example:
+PROMPT = "The video showcases a vibrant, magical garden where flowers bloom dynamically, opening and moving as though responding to a gentle rhythm in nature. \
+Colorful butterflies glide gracefully through the air, and a small, clear stream winds its way through the scene, reflecting the warm glow of sunlight. \
+A curious rabbit hops along a winding path, leading the viewer to a hidden alcove where a tree with golden, shimmering leaves stands, its branches moving slightly as if alive with energy. \
+The entire scene radiates tranquility and wonder, inviting viewers to immerse themselves in the beauty of nature and magic combined."
+
+PYTHONPATH = $( pwd ) python3 cosmos1/models/diffusion/inference/text2world.py \
+ --checkpoint_dir checkpoints \
+ --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
+ --prompt " $PROMPT " \
+ --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
+ --offload_tokenizer \
+ --offload_diffusion_transformer \
+ --offload_text_encoder_model \
+ --offload_prompt_upsampler \
+ --offload_guardrail_models
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Back to top
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/site/images/Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient.mp4 b/site/images/Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient.mp4
new file mode 100644
index 0000000..97fbaab
Binary files /dev/null and b/site/images/Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient.mp4 differ
diff --git a/site/images/cosmos_jetson.jpg b/site/images/cosmos_jetson.jpg
new file mode 100644
index 0000000..32184fd
Binary files /dev/null and b/site/images/cosmos_jetson.jpg differ
diff --git a/site/images/text2world_example.mp4 b/site/images/text2world_example.mp4
new file mode 100644
index 0000000..bb0b15c
Binary files /dev/null and b/site/images/text2world_example.mp4 differ
diff --git a/tensorrt_llm.html b/tensorrt_llm.html
index 6218b09..c8edace 100644
--- a/tensorrt_llm.html
+++ b/tensorrt_llm.html
@@ -490,6 +490,13 @@
Image Generation