From 0ec3848d2835013db237fa6adb92abc5bb069cca Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Tue, 19 Nov 2024 23:10:33 +0000
Subject: [PATCH] Deployed 462c267eb3c1bd50dd085af24feae3eca8db92d7 with
 mkdocs-material v9.5.44 and HTML post-processing

---
 search/search_index.json |   2 +-
 sitemap.xml.gz           | Bin 127 -> 127 bytes
 tensorrt_llm.html        |   2 +-
 3 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/search/search_index.json b/search/search_index.json
index 73a95dd..88f01fd 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"agent_studio.html","title":"Agent Studio","text":"<p>Rapidly design and experiment with creating your own automation agents, personal assistants, and edge AI systems in an interactive sandbox for connecting multimodal LLMs, speech and vision transformers, vector databases, prompt templating, and function calling to live sensors and I/O.  Optimized for deployment onboard Jetson with on-device compute, low-latency streaming, and unified memory.</p> \u2022 Edge LLM inference with quantization and KV caching (NanoLLM) \u2022 Realtime vision/language models (ala Live Llava and Video VILA) \u2022 Speech recognition and synthesis (Whisper ASR, Piper TTS, Riva) \u2022 Multimodal vector database from NanoDB \u2022 Audio and video streaming (WebRTC, RTP, RTSP, V4L2) \u2022 Performance monitoring and profiling \u2022 Native bot-callable functions and agent tools \u2022 Extensible plugins with auto-generated UI controls \u2022 Save, load, and export pipeline presets \u2022 Native bot-callable functions and agent tools"},{"location":"agent_studio.html#running-agent-studio","title":"Running Agent Studio","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;5GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol> <pre><code>jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.studio\n</code></pre> <p>This will start the server running on your device.  You can then navigate your browser to <code>https://IP_ADDRESS:8050</code></p> <ul> <li>You can load a preset at startup with the <code>--load</code> flag (like <code>--load /data/nano_llm/presets/xyz.json</code>)         <li>The default port is 8050, but can be changed with <code>--web-port</code> (and <code>--ws-port</code> for the websocket port)</li> <li>Request access to the Llama models on HuggingFace and substitute your account's API token above.</li>"},{"location":"agent_studio.html#dev-mode","title":"Dev Mode","text":"<p>To make code changes without rebuilding the container, clone the NanoLLM sources and then mount them over <code>/opt/NanoLLM</code></p> <pre><code>git clone https://github.com/dusty-nv/NanoLLM\njetson-containers run \\\n  -v ${PWD}/NanoLLM:/opt/NanoLLM \\\n  $(autotag nano_llm)\n</code></pre> <p>You can then edit the source from outside the container.  And in the terminal that starts, you can install other packages from apt/pip/ect.</p>"},{"location":"agent_studio.html#plugins","title":"Plugins","text":"<p>The models, I/O, and logic blocks share a lightweight plugin interface, exposing a <code>process()</code> function and any runtime properties:</p> Custom Plugin<pre><code>from nano_llm import Plugin\n\nclass TextFilter(Plugin):\n    def __init__(self, search: str='bad word', replace: str='censored', **kwargs):\n        \"\"\"\n        A simple plugin performing text replacement.  Write docs because they're extracted for the UI,\n        and for the tool descriptions that go to the bot for function calling (and use the type hints)        \n\n        Args:\n          search: The string to look for an replace in incoming text.\n          replace: What to replace occurances of the search string with.\n        \"\"\"\n        super().__init__(inputs=['text'], outputs=['text'])\n\n        # set these attributes and register them as configurable\n        self.add_parameters(search=search, replace=replace)\n\n    def process(self, input, **kwargs):\n        \"\"\" Substitute the text, and the return value will be sent to the plugins connected to this. \"\"\"\n        return input.replace(self.search, self.replace)\n</code></pre> <p>Typically each plugin runs asynchronously its own thread and queue of data, although some lightweight routines are executed inline.  Usually they're in the same process, but could make requests to microservices.  Websockets are used for clients and the web UI.</p> <p>Below are descriptions of commonly-used components. Help text for these is extracted from their Python docs and shown in the UI, along with settings that you can control.  In the node editor, click on a plugin to open its grid widget or configuration window.</p>"},{"location":"agent_studio.html#llm","title":"LLM","text":"NanoLLMAutoPromptUserPromptTextStream &gt; <code>nano_llm.plugins.NanoLLM</code> <code>(source)</code><pre><code> Load quantized LLM/VLM with MLC (speed), AWQ (quality), or HF Transformers (compatability)\n\n - Inputs\n     * str, list[str], np.ndarray, torch.Tensor, cudaImage, PIL.Image  \n\n - Commands     \n     * /reset /refresh /generate (these inputs can control the chat)\n\n - Outputs\n     * delta    The latest detokenized text to be generated\n     * partial  All of the text generated in the response so far\n     * final    The complete response sent at the end of the request\n     * words    Similar to 'delta', but with at least one word at a time\n     * history  A list of the messages in the chat history (as dictionaries)\n     * tools    Enable the bot to call functions in plugins connected on this channel\n</code></pre> &gt; <code>nano_llm.plugins.AutoPrompt</code> <code>(source)</code><pre><code> Apply a template each time new data comes in, for example '&lt;image&gt; Describe the image' \n would tag each incoming image along with that text prompt (with images, usually for VLM)\n\n AutoPrompt can be used to compose structured messages that reference prior inputs, \n like `Image 1: &lt;image&gt;  Image 2:  &lt;image&gt;` or using &lt;text&gt; for last text recieved:  \n\n    The three most recent replies were:\n      * &lt;text&gt;\n      * &lt;text&gt;\n      * &lt;text&gt; \n\nThese most recent inputs are used in newest to oldest order from a LIFO queue.\n</code></pre> &gt; <code>nano_llm.plugins.UserPrompt</code> <code>(source)</code><pre><code> This plugin is for inputting text on the keyboard, either from the terminal or text box in the UI.\n\n It can also load prompts from a text or json file, and those files can reference other files to load.\n</code></pre> &gt; <code>nano_llm.plugins.TextStream</code> <code>(source)</code><pre><code> Simple plugin for viewing any stream of text from the system in a text box in the UI.  \n\n For sources emanating from ASR or LLM, can apply color highlighting to partial/final responses.\n</code></pre>"},{"location":"agent_studio.html#speech","title":"Speech","text":"Whisper ASRPiper TTSVAD FilterRiva &gt; <code>nano_llm.plugins.WhisperASR</code> <code>(source)</code><pre><code> Whisper streaming speech-to-text with TensorRT (github.com/NVIDIA-AI-IOT/whisper_trt)\n\n Supports these Whisper models:  'tiny' (39M), 'base' (74M), 'small' (244M)\n\n - Inputs\n     * audio    bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n     * final    The text transcript of the entire segment spoken in the input\n     * partial  Rolling partial transcript of the words spoken so far in the audio segment\n</code></pre> <p>The Whisper plugin is designed to be used in a pipeline with the VAD Filter to reduce erroneous output from background noise:</p> <p></p> &gt; <code>nano_llm.plugins.PiperTTS</code> <code>(source)</code><pre><code> Piper text-to-speech using CUDA and onnxruntime (github.com/rhasspy/piper)\n\n Populates a list of the available Piper models that it can download, along with the speaker voices \n for multi-speaker models.  The default model is 'en_US-libritts-high'.  Has controls for voice speed.\n\n - Inputs\n     * str      either individual words, sentences, or an entire passage to synthesize\n\n - Outputs\n     * audio    np.ndarray (int16) at 16KHz sample rate (for low-quality models) or 22.5KHz (for medium/high)\n</code></pre> <p>This PiperTTS preset is available, which uses RateLimit to stream the output at 1x realtime factor (so it can be muted later)</p> <p></p> &gt; <code>nano_llm.plugins.VADFilter</code> <code>(source)</code><pre><code> Voice activity detection model using Silero.  Drops incoming audio unless it exceeds the VAD threshold.\n\n Use this on incoming audio before ASR plugins to reduce spurious transcripts produced from background noise.\n\n - Inputs\n     * audio    bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n     * audio    np.ndarray or torch.Tensor (only when VAD confidence &gt; threshold)\n</code></pre> <p>Riva ASR / TTS services - these run in another container over GRPC.  See these resources for setup:</p> <ul> <li><code>NGC - riva-embedded quickstart guide</code></li> <li><code>jetson-containers - riva container setup</code></li> <li><code>JetsonHacks - Speech AI on NVIDIA Jetson Tutorial</code></li> </ul>"},{"location":"agent_studio.html#audio","title":"Audio","text":"<code>AudioInputDevice</code>USB/I2S soundcard input via PulseAudio/PyAudio (for microphone directly attached to system) <code>AudioOutputDevice</code>USB/I2S soundcard output via PulseAudio/PyAudio (for speakers directly attached to system) <code>AudioRecorder</code>Save an audio stream to WAV file on the server <code>WebAudioIn</code>Recieve audio samples streamed from the client over websockets <code>WebAudioOut</code>Transmit audio samples to the client over websockets"},{"location":"agent_studio.html#video","title":"Video","text":"<code>VideoSource</code>Capture images from camera device (V4L2/CSI), network stream (RTP, RTSP), video file (MP4, MKV, AVI, FLV) <code>VideoOutput</code>Output H264/H265-encoded video to network stream (RTP, RTSP, WebRTC), display, or file (MP4, MKV, AVI, FLV) <code>VideoOverlay</code>Draw text on top of video streams for HUD or OSD-style display <code>RateLimit</code>Throttle transmission to a specified rate (can also be used with audio)"},{"location":"agent_studio.html#database","title":"Database","text":"<code>NanoDB</code>Optimized in-memory multimodal vectorDB for txt2img/img2img similarity search and image tagging [[\u2197]](tutorial_nanodb.md) <code>Deduplicate</code>Filter incoming text against previous entries with embedding model and cosine similarity over a timeout period <code>EventFilter</code>Look for keywords or search strings in streams of text, keeping track of the begin/end times that these occur."},{"location":"agent_studio.html#tips-tricks","title":"Tips &amp; Tricks","text":"<p>Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recreated in the dynamic environment of Agent Studio without needing to touch any code.  Here are some practical tips as you go about trying different combinations of model pipelines:</p>"},{"location":"agent_studio.html#ui","title":"UI","text":"<ul> <li>To add an element to the agent pipeline, either use the menus in the node editor title bar, or right click inside the editor.  </li> <li>Inserting elements via the context menu (right click) will position them at your cursor, so you needn't reposition them.  </li> <li>To remove a component from the pipeline, right click on it and then select the little X button in the top right.  </li> <li>To remove a connection between components, first select the link and then right click for the X button to appear.  </li> <li>Click on a node in the graph to open its grid widget if it implements one.  Click again to open its settings dialog.  </li> </ul>"},{"location":"agent_studio.html#save-load","title":"Save &amp; Load","text":"<ul> <li>Under the Agent menu, there is a Save button which will serialize the current pipeline to JSON.</li> <li>These presets are saved in a mounted directory on the host, under <code>jetson-containers/data/nano_llm/presets</code></li> <li>Some commonly-used subgraphs are already included to quickly add and combine (like ASR, TTS, and video VLM)</li> <li>You can load these via the Agent menu, by right-clicking inside the node editor, or with the <code>--load</code> flag at startup.</li> </ul>"},{"location":"agent_studio.html#memory","title":"Memory","text":"<ul> <li>As you are adding models, keep an eye on the system resources in the top right to make sure you don't run out of memory.  </li> <li>Due to loading times, the models are cached in memory even after you remove them from the node editor.  </li> <li>To actually free the memory, use the <code>Clear Cache</code> button in the Agent menu. </li> </ul>"},{"location":"agent_studio.html#performance","title":"Performance","text":"<ul> <li>You can monitor the system's CPU and GPU usage in the top right corner.  </li> <li>Various statistics are refreshed in the nodes to see performance of each component.</li> <li>Use the <code>RateLimiter</code> plugin to throttle data sources and balance resources.</li> <li>The <code>drop_inputs</code> setting that some plugins expose will always keep them up with the latest request.</li> </ul>"},{"location":"agent_studio.html#templating","title":"Templating","text":"<ul> <li>In your prompts and system templates, you can perform variable substitution like <code>${DATE}</code> and <code>${TIME}</code></li> <li>These can reference plugin attributes and tools by using the class selector:  <code>${HOMEASSISTANT.DEVICES}</code></li> <li>AutoPrompt can queue up inputs for structured messages, like <code>Image 1: &lt;image&gt;  Image 2:  &lt;image&gt;</code> or: <pre><code>The three most recent replies were:\n    * &lt;text&gt;\n    * &lt;text&gt;\n    * &lt;text&gt;\n</code></pre></li> </ul> <p>\ud83e\udd16 Have fun bot building!  If you need help, reach out on the Jetson Forums or GitHub Issues.</p>"},{"location":"benchmarks.html","title":"Benchmarks","text":""},{"location":"benchmarks.html#large-language-models-llm","title":"Large Language Models (LLM)","text":"<p>For running LLM benchmarks, see the <code>MLC</code> container documentation.</p>"},{"location":"benchmarks.html#small-language-models-slm","title":"Small Language Models (SLM)","text":"<p>Small language models are generally defined as having fewer than 7B parameters (Llama-7B shown for reference)  For more data and info about running these models, see the <code>SLM</code> tutorial and <code>MLC</code> container documentation.</p>"},{"location":"benchmarks.html#vision-language-models-vlm","title":"Vision Language Models (VLM)","text":"<p>This measures the end-to-end pipeline performance for continuous streaming like with Live Llava. For more data and info about running these models, see the <code>NanoVLM</code> tutorial.</p>"},{"location":"benchmarks.html#vision-transformers-vit","title":"Vision Transformers (ViT)","text":"<p>VIT performance data from [1] [2] [3]</p>"},{"location":"benchmarks.html#stable-diffusion","title":"Stable Diffusion","text":""},{"location":"benchmarks.html#riva","title":"Riva","text":"<p>For running Riva benchmarks, see ASR Performance and TTS Performance.</p>"},{"location":"benchmarks.html#vector-database","title":"Vector Database","text":"<p>For running vector database benchmarks, see the <code>NanoDB</code> container documentation.</p>"},{"location":"community_articles.html","title":"Community Projects","text":"<p>Below, you'll find a collection of guides, tutorials, and articles contributed by the community showcasing the implementation of generative AI on the Jetson platform.</p>"},{"location":"community_articles.html#github-jetbot-voice-activated-copilot-tools-with-ros2-riva-and-nanollm-9-21-2024","title":"GitHub Jetbot Voice-Activated Copilot Tools with ROS2, RIVA, and NanoLLM <sup>(9-21-2024)</sup>","text":"<p>Jen Hung Ho created ROS2 nodes for ASR/TTS and LLM/VLM on Jetson that can can be used to control JetBot, including customizable voice commands and the execution of advanced actions.  Check it out on GitHub under <code>Jen-Hung-Ho/ros2_jetbot_tools</code> and <code>Jen-Hung-Ho/ros2_jetbot_voice</code> and on the forums here.</p>"},{"location":"community_articles.html#hackster-ai-powered-photorealistic-talking-avatar-4-26-2024","title":"Hackster AI-Powered Photorealistic Talking Avatar <sup>(4-26-2024)</sup>","text":"<p>Nurgaliyev Shakhizat creates an interactive talking avatar using ASR, TTS, LLM, and Audio2Face with NVIDIA Omniverse and Unreal Engine:</p>"},{"location":"community_articles.html#hackster-an-eye-for-an-item-4-26-2024","title":"Hackster An Eye for an Item <sup>(4-26-2024)</sup>","text":"<p>Allen Tao builds an indoor robot with Isaac ROS that maps your home and remembers where things are using SLAM and NanoDB!</p>"},{"location":"community_articles.html#hackster-escalator-people-tracker-4-2-2024","title":"Hackster Escalator People Tracker <sup>(4-2-2024)</sup>","text":"<p>George Profenza &amp; team install local CV solution for detecting and tracking people in large retail spaces to drive live Generative AI graphics:</p>"},{"location":"community_articles.html#hackster-edge-style-fashion-preview-at-the-edge-4-1-2024","title":"Hackster Edge Style: Fashion Preview at the Edge <sup>(4-1-2024)</sup>","text":"<p>AI-powered app from Andrei Ciobanu shows virtual try-ons with customer images, enhancing retail shopping using Jetson Orin for speed and privacy:</p>"},{"location":"community_articles.html#hackster-cooking-meals-with-a-local-ai-assistant-on-jetson-axg-orin-4-1-2024","title":"Hackster Cooking meals with a local AI assistant on Jetson AXG Orin <sup>(4-1-2024)</sup>","text":"<p>Dimiter Kendri builds a multimodal, multi AI agent, fully local, conversational chatbot with multi agent research capabilities via speech queries:</p>"},{"location":"community_articles.html#hackster-realtime-language-segment-anything-on-jetson-orin-3-4-2024","title":"Hackster Realtime Language-Segment-Anything on Jetson Orin <sup>(3-4-2024)</sup>","text":"<p>Huy Mai enables Segment Anything (SAM) with natural language prompting using GroundingDINO for object detection:</p>"},{"location":"community_articles.html#github-japanese-nmt-translation-for-stable-diffusion-2-23-2024","title":"GitHub Japanese NMT Translation for Stable Diffusion <sup>(2-23-2024)</sup>","text":"<p>Toshihiko Aoki has created a prompt generator for <code>stable-diffusion-webui</code> that translates Japanese queries into English using a fine-tuned GPT-2 NMT model before feeding them into Stable Diffusion.  Check out the full guide on GitHub under <code>to-aoki/ja-tiny-sd-webui</code>, including the training dataset and LoRA building!</p>      Your browser does not support the video tag."},{"location":"community_articles.html#hackster-clearwater-underwater-image-enhancement-with-generative-ai-2-16-2024","title":"Hackster ClearWater: Underwater Image Enhancement with Generative AI <sup>(2-16-2024)</sup>","text":"<p>Vy Pham has created a novel denoising pipeline using a custom trained Transformer-based diffusion model and GAN upscaler for image enhancement, running on Jetson AGX Orin.  It runs interactively in a Streamlit web UI for photo capturing and the processing of images and videos.  Great work!</p>      Your browser does not support the video tag."},{"location":"community_articles.html#hackster-ai-powered-application-for-the-blind-and-visually-impaired-12-13-2023","title":"Hackster AI-Powered Application for the Blind and Visually Impaired <sup>(12-13-2023)</sup>","text":"<p>Nurgaliyev Shakhizat demonstrates a locally-hosted Blind Assistant Device running on Jetson AGX Orin 64GB Developer Kit for realtime image-to-speech translation:</p> <p></p> <p> \u00a0 Find more resources about this project here:  [Hackster] [GitHub]</p>"},{"location":"community_articles.html#daves-armoury-bringing-glados-to-life-with-robotics-and-ai-2-8-2024","title":"Dave's Armoury Bringing GLaDOS to life with Robotics and AI <sup>(2-8-2024)</sup>","text":"<p>See how DIY robotics legend Dave Niewinski from davesarmoury.com brings GLaDOS to life using Jetson AGX Orin, running LLMs onboard alongside object + depth tracking, and RIVA ASR/TTS with a custom-trained voice model for speech recognition and synthesis!  Using Unitree Z1 arm with 3D printing and StereoLabs ZED2.</p> <p> \u00a0 Find more resources about this project here:  [Forums] [GitHub]</p>"},{"location":"community_articles.html#hackster-seeed-studios-local-voice-chatbot-puts-a-speech-recognizing-llama-2-llm-on-your-jetson-2-7-2024","title":"Hackster Seeed Studio's Local Voice Chatbot Puts a Speech-Recognizing LLaMa-2 LLM on Your Jetson <sup>(2-7-2024)</sup>","text":"<p>Seeed Studio has announced the launch of the Local Voice Chatbot, an NVIDIA Riva- and LLaMa-2-based large language model (LLM) chatbot with voice recognition capabilities \u2014 running entirely locally on NVIDIA Jetson devices, including the company's own reComputer range.  Follow the step-by-step guide on the Seeed Studio wiki.</p>"},{"location":"community_articles.html#youtube-genai-nerds-react-insider-look-at-nvidias-newest-generative-ai-2-6-2024","title":"YouTube GenAI Nerds React - Insider Look at NVIDIA's Newest Generative AI <sup>(2-6-2024)</sup>","text":"<p>Watch this panel about the latest trends &amp; tech in edge AI, featuring Kerry Shih from OStream, Jim Benson from JetsonHacks, and Dusty from NVIDIA.</p>"},{"location":"community_articles.html#nvidia-bringing-generative-ai-to-life-with-nvidia-jetson-11-7-2023","title":"NVIDIA Bringing Generative AI to Life with NVIDIA Jetson <sup>(11-7-2023)</sup>","text":"<p>Watch this webinar about deploying LLMs, VLMs, ViTs, and vector databases onboard Jetson Orin for building next-generation applications using Generative AI:</p>"},{"location":"community_articles.html#jetsonhacks-jetson-ai-labs-generative-ai-playground-10-31-2023","title":"JetsonHacks Jetson AI Labs \u2013 Generative AI Playground <sup>(10-31-2023)</sup>","text":"<p>JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model. </p>"},{"location":"community_articles.html#hackster-vision2audio-giving-the-blind-an-understanding-through-ai-10-15-2023","title":"Hackster Vision2Audio - Giving the blind an understanding through AI <sup>(10-15-2023)</sup>","text":"<p>Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people:</p> <p></p>"},{"location":"community_articles.html#nvidia-generative-ai-models-at-the-edge-10-19-2023","title":"NVIDIA Generative AI Models at the Edge <sup>(10-19-2023)</sup>","text":"<p>Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:</p> <p> \u00a0 Technical Blog - https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/</p>"},{"location":"community_articles.html#medium-how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-10-02-2023","title":"Medium How to set up your Jetson device for LLM inference and fine-tuning <sup>(10-02-2023)</sup>","text":"<p>Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.</p> <p> https://medium.com/@michaelyuan_88928/how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-682e36444d43</p>"},{"location":"community_articles.html#hackster-getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-09-16-2023","title":"Hackster Getting Started with AI on Nvidia Jetson AGX Orin Dev Kit <sup>(09-16-2023)</sup>","text":"<p>Nurgaliyev Shakhizat demonstrates <code>llamaspeak</code> on Jetson AGX Orin 64GB Developer Kit in this Hackster post:</p> <p></p>"},{"location":"community_articles.html#hackster-new-ai-tool-is-generating-a-lot-of-buzz-09-13-2023","title":"Hackster New AI Tool Is Generating a Lot of Buzz <sup>(09-13-2023)</sup>","text":"<p>Nick Bild provides an insightful introduction to the Jetson Generative AI Playground:</p> <p> https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598</p>"},{"location":"community_articles.html#jetsonhacks-use-these-jetson-docker-containers-tutorial-09-04-2023","title":"JetsonHacks Use These! Jetson Docker Containers Tutorial <sup>(09-04-2023)</sup>","text":"<p>JetsonHacks has a in-depth tutorial on how to use <code>jetson-containers</code> and even show <code>text-generation-webui</code> and <code>stable-diffusion-webui</code> containers in action!</p>"},{"location":"community_articles.html#hackster-llama-2-llms-w-nvidia-jetson-and-textgeneration-web-ui-08-17-2023","title":"Hackster LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui <sup>(08-17-2023)</sup>","text":"<p>Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit in this Hackster post:</p> <p></p>"},{"location":"community_articles.html#hackster-running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-08-14-2023","title":"Hackster Running a ChatGPT-Like LLM-LLaMA2 on a Nvidia Jetson Cluster <sup>(08-14-2023)</sup>","text":"<p>Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat:</p> <p></p>"},{"location":"community_articles.html#jetsonhacks-speech-ai-on-nvidia-jetson-tutorial-08-07-2023","title":"JetsonHacks Speech AI on NVIDIA Jetson Tutorial <sup>(08-07-2023)</sup>","text":"<p>JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.</p>"},{"location":"community_articles.html#hackster-llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-07-12-2023","title":"Hackster LLM based Multimodal AI w/ Azure Open AI &amp; NVIDIA Jetson <sup>(07-12-2023)</sup>","text":"<p>Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.</p> <p></p>"},{"location":"community_articles.html#hackster-how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-06-13-2023","title":"Hackster How to Run a ChatGPT-Like LLM on NVIDIA Jetson board <sup>(06-13-2023)</sup>","text":"<p>Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.</p> <p></p>"},{"location":"getting-started.html","title":"Getting started","text":""},{"location":"hello_ai_world.html","title":"Hello AI World","text":"<p>Hello AI World is an in-depth tutorial series for DNN-based inference and training of image classification, object detection, semantic segmentation, and more.  It is built on the <code>jetson-inference</code> library using TensorRT for optimized performance on Jetson. </p> <p></p> <p>It's highly recommended to familiarize yourself with the concepts of machine learning and computer vision before diving into the more advanced topics of generative AI here on the Jetson AI Lab.  Many of these models will prove useful to have during your development.</p> <p></p> <p></p> <p>HELLO AI WORLD &gt;&gt; https://github.com/dusty-nv/jetson-inference</p>"},{"location":"initial_setup_jon.html","title":"Initial Setup Guide for Jetson Orin Nano Developer Kit","text":"<p>Note</p> <p>This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.</p> <p>The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.</p> <p>This guide explains the complete flow from opening the box, updating the firmware if needed, flashing the latest JetPack 6.0 GA image on SD card, and the initial software setup, so that you will be ready for tutorials listed on this site and other AI projects.</p>"},{"location":"initial_setup_jon.html#check-your-inventory","title":"Check your inventory","text":"<p>Following item is needed or highly desired to set up your Jetson Orin Nano Developer Kit. If you don't have them in your inventory, you want to arrange them and come back to this guide once they are available.</p> <p>What not come in the box - What you need/want to prepare</p>"},{"location":"initial_setup_jon.html#storage","title":"Storage","text":"<ul> <li> microSD card (64GB or bigger) </li> <li> NVMe SSD (Optional, but highly recommended for following tutorials on this site)</li> </ul>"},{"location":"initial_setup_jon.html#mean-to-access-terminal","title":"Mean to access terminal","text":"<p>You need either of the following set:</p> <ul> <li> DisplayPort cable,  DisplayPort capable monitor and an USB keyboard</li> <li> DisplayPort to HDMI cable and HDMI capable monitor (or TV) and an USB keyboard</li> <li> USB to TTL Serial cable  (Advanced)</li> </ul>"},{"location":"initial_setup_jon.html#open-the-box","title":"Open the box","text":"<p>What you find in the box</p> <p></p>"},{"location":"initial_setup_jon.html#jetson-orin-nano-developer-kit","title":"Jetson Orin Nano Developer Kit","text":"<p>The Jetson Orin Nano Developer Kit consists of Jetson Orin Nano module (enlarged SO-DIMM form factor), and the reference carrier board.</p> <p>It is designed to use a microSD card as the primary storage, thus the module (that has a big black heat sink with a fan) has a microSD card slot at the bottom side of the module.</p>"},{"location":"initial_setup_jon.html#19v-dc-power-supply","title":"19V DC power supply","text":""},{"location":"initial_setup_jon.html#overall-flow","title":"Overall flow","text":"<p>Jetson Orin Nano Initial Setup Flowchart (Click to expand)</p> <pre><code>flowchart\n    A(start) --&gt; B{1. Check&lt;br&gt;Manufactured after May 2024?&lt;br&gt;i.e. Jetson UEFI Firmware&lt;br&gt;newer than version 36.0}\n    B --[YES] --&gt; O[6. Flash JetPack 6.0 GA image on microSD card]\n    B --[No] --&gt; C[2. Flash JetPack 5.1.3 image on microSD card]\n    C --&gt; D[3. Reboot] --&gt; E{{Firmware update during reboot}}\n    E --&gt; F[4. Run QSPI updater] --&gt; G[5. Reboot] --&gt; H{{Firmware update during reboot}}\n    H --&gt; O\n    O --&gt; P(7. Start developing on JetPack 6.0) \n\n    style C fill:#fee\n    style D fill:#DEE,stroke:#333\n    style G fill:#DEE,stroke:#333\n    style F stroke-width:4px\n    style E stroke-width:2px,stroke-dasharray: 5 5\n    style H stroke-width:2px,stroke-dasharray: 5 5\n    style O fill:#fee</code></pre>"},{"location":"initial_setup_jon.html#1-check-if-jetson-uefi-firmware-version-363","title":"1. Check if Jetson UEFI Firmware version &gt; <code>36.3</code>","text":"<p>Your Jetson Orin Nano Developer Kit may have the latest firmware (\"Jetson UEFI firmware\" on QSPI-NOR flash memory) flashed at the factory.</p> <p>If not, we need to go through a set of procedures to upgrade to the latest firmware. (Luckily, we can now do this all just on Jetson, meaning we don't need to use a host Ubuntu PC any more!)</p> <p>So let's first check the version of your Jetson UEFI Firmware. You can take one of the following methods.</p>  Monitor-attached Headless\ud83d\ude01I'm feeling lucky <ol> <li>Connect your monitor and USB keyboard to your developer kit.</li> <li>Turn on the developer kit by plugging in the bundled DC power supply</li> <li>Repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor</li> <li>You should see UEFI setup menu screen</li> <li>Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware</li> </ol> <ol> <li>Connect USB to TTL Serial cable onto the following pins on <code>J14</code> \"button\" header of carrier board located under the Jetson module. <ul> <li><code>RXD</code> (Pin 3)  Adafruit adaptor cable Green </li> <li><code>TXD</code> (Pin 4)  Adafruit adaptor cable White </li> <li><code>GND</code> (Pin 7)  Adafruit adaptor cable Black <p>For the detail, refer to Jetson Orin Nano Developer Kit Carrier Board Specification.</p> </li> </ul> </li> <li>On your PC, run your console monitor program and open the USB serial port.</li> <li>Power on the developer kit by plugging in the bundled DC power supply</li> <li>On the PC console, repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor</li> <li>You should see UEFI setup menu screen</li> <li>Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware</li> </ol> <p>You could skip to 6. Flash JetPack 6.0 GA image onto your microSD card, and try your luck to see if your Jetson just boots your Jetson Orin Nano Developer Kit up to the initial software set up (OEM-config).</p>"},{"location":"initial_setup_jon.html#360-upgrade-the-jetson-uefi-firmware-to-36x","title":"[&lt; 36.0] Upgrade the Jetson UEFI firmware to 36.x","text":"<p>Attention</p> <p>Select the appropriate tab below based on your firmware version you found in the above step.</p> <p>If you found your Jetson Orin Nano needs its firmware updated to run JetPack 6, click \" Firmware &lt; 36.0\" tab, and then additional step 2 to 5 will appear for you to follow.</p> <p>If you know your Jetson Orin Nano has the latest firmware, stay on \" Firmware 36.x\" tab, and skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)</p>  Firmware 36.x Firmware &lt; 36.0 <p>Your Jetson Orin Nano has the latest firmware that is ready for JetPack 6 SD card.</p> <p>Skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)</p> <p>Your Jetson Orin Nano needs its firmware updated in order to make JetPack 6 SD card work.</p> <p>Perform the following steps (2 to 5).</p>"},{"location":"initial_setup_jon.html#2-flash-jetpack-513-image-onto-your-microsd-card","title":"2. Flash JetPack 5.1.3 image onto your microSD card","text":"<p>First we need to run JetPack 5.1.3 in order let its <code>nvidia-l4t-bootloader</code> package to get its bootloader/firmware updater activated, so that the firmware update automatically runs the next time it reboots.</p> <ol> <li> <p>Download SD card image on to your PC</p> <p>On your PC, download JetPack 5.1.3 image for Jetson Orin Nano Developer Kit from the official JetPack 5.1.3 page or from the below direct link button.</p> <p>Warning</p> <p>NVIDIA had updated the JetPack 5.1.3 image on 5/28/2024, as the old version had some issue and the following process did not work.So please download and use the latest image (the new file name is <code>JP513-orin-nano-sd-card-image_b29.zip</code>).</p> <p>Jetson Orin Nano Developer KitJetPack 5.1.3 image</p> </li> <li> <p>Use Balena Etcher to flash image to SD card</p> <p>If you don't have Balena Etcher on your PC, download from Balena official site.</p> <p></p> </li> <li> <p>Insert the flashed microSD card into the slot on Jetson module</p> <p></p> </li> </ol>"},{"location":"initial_setup_jon.html#3-power-on-and-reboot-to-ensure-firmware-gets-updated-to-50-jetpack-513","title":"3. Power-on and reboot to ensure firmware gets updated to 5.0 (JetPack 5.1.3)","text":"<ol> <li> <p>Power-on</p> <p>Turn on the Jetson Orin Nano Developer Kit with JetPack 5.1.3 SD card inserted by plugging in the DC power supply.</p> </li> <li> <p>Complete the initial software setup (<code>oem-config</code>)</p> </li> <li> <p>Ensure firmware update is scheduled.</p> <p>Once Jetson boots into Jetson Linux system, a background service automatically runs to schedule a firmware update (if needed) to be performed during the next boot-up process.</p> <p>Once you see the following, or just wait about 5 minutes after powering on to ensure the scheduling is done, reboot.</p>  GUI CUI <p></p> <pre><code>$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n    Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n    Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n    Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n</code></pre> </li> <li> <p>Reboot</p> <p>Reboot your Jetson Orin Nano Developer Kit.</p> <p>You should see the following during the boot up process.</p>  With monitor Headless (serial) <p></p> <p></p> <p>Once done, you will boot into JetPack 5.1.3 (again), with underlying firmware updated to <code>5.0-35550185</code>.</p> </li> </ol>"},{"location":"initial_setup_jon.html#4-install-and-run-qspi-updater-package","title":"4. Install and run QSPI Updater package","text":"<ol> <li> <p>Double-check your firmware version is up to date (<code>35.5.0</code> = JetPack 5.1.3)</p> <p>Once it reboots back into Jetson Linux system, on Jetson terminal, run the following:</p> <pre><code>sudo nvbootctrl dump-slots-info\n</code></pre> <p>You should see something like the following, with the Current version indicating <code>35.5.0</code>.</p> <pre><code>Current version: 35.5.0\nCapsule update status: 0\nCurrent bootloader slot: A\nActive bootloader slot: A\nnum_slots: 2\nslot: 0,             status: normal\nslot: 1,             status: normal\n</code></pre> </li> <li> <p>Install QSPI Updater Debian package to trigger another (final) firmware update</p> <p>On Jetson terminal, run the following:</p> <pre><code>sudo apt-get install nvidia-l4t-jetson-orin-nano-qspi-updater\n</code></pre> <p>Installing the <code>nvidia-l4t-jetson-orin-nano-qspi-updater</code> automatically runs its script to schedule another (final) firmware update to be performed during the next boot process, so that the firmware is ready for JetPack 6.</p> </li> </ol>"},{"location":"initial_setup_jon.html#5-reboot-and-power-off-the-developer-kit","title":"5. Reboot and power-off the developer kit","text":"<ol> <li> <p>Reboot </p> <p>Once the QSPI update is scheduled, reboot your Jetson Orin Nano Developer Kit.</p> </li> <li> <p>Observe update</p> <p>You can observe the update during the boot up process.</p> </li> <li> <p>Power off</p> <p>Once the update is done, it reboots and tries to boot, however it will get stuck UNLESS you change the SD card to JetPack 6 one. </p> <p>Therefore you should just power off the developer kit by disconnecting the DC power supply.</p> <p>Attention</p> <p>This part may look very confusing as neither the attached monitor nor the debug UART shows any explicit message on what action to take next.</p> <p>What is going on here is that the Jetson's firmware (inside the QSPI-NOR flash memory) is now updated, ready for the JetPack 6 SD card, however it is now incompatible with JetPack 5.1.3 SD card left in the Jetson module's slot, so after the reboot it gets stuck in the boot process.</p> <p>So there is nothing problematic with this boot halt (or endless rebooting). We just need to power-off and insert a new SD card.</p> </li> </ol>"},{"location":"initial_setup_jon.html#6-flash-jetpack-60-ga-image-onto-your-microsd-card","title":"6. Flash JetPack 6.0 GA image onto your microSD card","text":"<p>Once we know the onboard firmware is up-to-date and ready for JetPack 6, we can boot Jetson Orin Nano Developer Kit with a microSD card for JetPack 6.</p> <ol> <li> <p>Download SD card image on to your PC</p> <p>On your PC, download JetPack 6.0 GA image for Jetson Orin Nano Developer Kit from the official JetPack 6.0 page or from the below direct link button.</p> <p>Jetson Orin Nano Developer KitJetPack 6.0 GA image</p> </li> <li> <p>Use Balena Etcher to flash image to SD card</p> <p>Insert your microSD card into your PC's SD card slot, and use Balena Etcher to flash the SD card with the image you just downloaded.</p> <p>If you don't have Balena Etcher on your PC, download from Balena official site.</p> <p></p> </li> </ol>"},{"location":"initial_setup_jon.html#7-power-on-to-start-developing-on-jetpack-6","title":"7. Power on to start developing on JetPack 6","text":"<ol> <li> <p>Insert the JetPack 6 microSD card into the slot on Jetson module</p> <p></p> </li> <li> <p>Power-on by plugging the DC power supply</p> </li> <li> <p>Complete the initial software setup (<code>oem-config</code>)</p> </li> </ol> <p>\ud83c\udf8a Congratulations!  Your Jetson Orin Nano Developer Kit is set up with JetPack 6 SD card and you are ready to develop on JetPack 6.</p>"},{"location":"initial_setup_jon.html#next-step","title":"Next step","text":""},{"location":"initial_setup_jon.html#nvme-ssd-installation","title":"NVMe SSD installation","text":"<p>Take a look at this page for installing NVMe SSD and setting up Docker with it.</p>"},{"location":"lerobot.html","title":"HuggingFace LeRobot","text":"<p>Let's run HuggingFace <code>LeRobot</code> to train Transformer-based action diffusion policies and ACT onboard NVIDIA Jetson.  These models learn to predict actions for a particular task from visual inputs and prior trajectories, typically collected during teleoperation or in simulation.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 GA (L4T r36.3) JetPack 6.1 (L4T r36.4)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>16.5GB</code> for <code>lerobot</code> container image</li> <li>Space for models (<code>&gt;2GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"lerobot.html#work-with-real-world-robots-before-starting-containers","title":"Work with Real-World Robots - Before starting containers","text":"<p>This section gives the guide on how you can work through the LeRobot official example of Getting Started with Real-World Robots (<code>7_get_started_with_real_robot.md</code>) on your Jetson.</p> <p>Tip</p> <p>It's recommended to work on your Jetson in monitor-attached mode.</p> <p><code>lerobot</code> is designed to show camera view in windows and playback TTS audio while capturing dataset, so it is more convenient to setup your Jetson with its monitor (and speakers) attached to Jetson.d</p>"},{"location":"lerobot.html#a-check-jetson-containers-location","title":"a. Check <code>jetson-container</code>'s location","text":"<p>Through out the course of all the workflows of <code>lerobot</code>, we will be generating a lot of data, especially for capturing dataset.</p> <p>We will clone the <code>lerobot</code> directory on host and mount the directory in the container to keep all the data persistant, but first make sure your <code>jetson-containers</code> directory is placed on your SSD, not on your eMMC or microSD card.</p> <p>If you have created the <code>jetson-containers</code> directory on eMMC or microSD card (likely the case if you first set up your Jetson device without SSD first and later added SSD), then use the <code>rsync</code> command to move the entire directory under SSD mount point.</p> <pre><code>rsync -aHAX /home/jetson/jetson-containers/ /ssd/jetson-containers/\n</code></pre> <p>Then run the installer again.</p> <pre><code>bash /ssd/jetson-containers/install.sh\n</code></pre>"},{"location":"lerobot.html#b-create-lerobot-dir-on-host","title":"b. Create <code>lerobot</code> dir on host","text":"<p>As described above, we will setup the <code>lerobot</code> directory under <code>data</code> directory of <code>jetson-containers</code> for monting it inside the container so that generated data persist.</p> <pre><code>cd jetson-containers\n./packages/robots/lerobot/clone_lerobot_dir_under_data.sh\n./packages/robots/lerobot/copy_overlay_files_in_data_lerobot.sh\n</code></pre>"},{"location":"lerobot.html#c-pulseaudio-setup","title":"c. PulseAudio setup","text":"<p>LeRobot's dataset capture flow (<code>control_robot.py</code>) utilizes Speech Dispatcher to use espeak TTS, in order to give operators audio queues for notifying the status and signaling the next operation. It's actually very helpful.</p> <p>Speech Dispatcher utilizes Pulse Audio, so rather than just sharing the <code>/dev/snd</code> device when <code>docker run</code> (which is good for ALSA), we need to add the following arguments.</p> <pre><code>   --device /dev/snd \\\n   -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native \\\n   -v ${XDG_RUNTIME_DIR}/pulse:${XDG_RUNTIME_DIR}/pulse \\\n</code></pre> <p>This is already added to <code>run.sh</code> of <code>jetson-containers</code>, however, we need to edit <code>/etc/pulse/default.pa</code> in order to allow the root user access to the socket file.</p> <pre><code>sudo vi /etc/pulse/default.pa\n</code></pre> <p>Find the section loading <code>module-native-protomocl-unix</code> and add <code>auth-anonymous=1</code> </p> <pre><code>### Load several protocols\n.ifexists module-esound-protocol-unix.so\nload-module module-esound-protocol-unix auth-anonymous=1\n.endif\nload-module module-nativ\n</code></pre> <p>Then restart PulseAudio service to make the config take effect.</p> <pre><code>pulseaudio --kill\npulseaudio --start\n</code></pre> <p>For troubleshootings or details, please check the <code>docs.md</code> of <code>speech-dispatcher</code> package.</p>"},{"location":"lerobot.html#d-set-udev-rule-for-acm-devices","title":"d. Set udev rule for ACM devices","text":"<p>It is more convenient if the lerobot programs can always find the device of leader and follower arm with unique names.</p> <p>For that, we set an udev rule so that arms always get assigned the same device name as following. This is first done on Jetson host side. </p> <ul> <li><code>/dev/ttyACM_kochleader</code>   : Leader arm</li> <li><code>/dev/ttyACM_kochfollower</code> : Follower arm</li> </ul> <p>First only connect the leader arm to Jetson and record the serial ID by running the following:</p> <pre><code>ll /dev/serial/by-id/\n</code></pre> <p>The output should look like this.</p> <pre><code>lrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_BA98C8C350304A46462E3120FF121B06-if00 -&gt; ../../ttyACM1\n</code></pre> <p>Then edit the first line of <code>./99-usb-serial.rules</code> like the following.</p> <p>You can find the template of this file under <code>./packages/robots/lerobot</code> directory.</p> <pre><code>SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"00000000000000000000000000000000\", SYMLINK+=\"ttyACM_kochfollower\"\n</code></pre> <p>Now disconnect the leader arm, and then only connect the follower arm to Jetson.</p> <p>Repeat the same steps to record the serial to edit the second line of <code>99-usb-serial.rules</code> file.</p> <pre><code>$ ll /dev/serial/by-id/\nlrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_483F88DC50304A46462E3120FF0C081A-if00 -&gt; ../../ttyACM0\n$ vi ./packages/robots/lerobot\n</code></pre> <p>You should have <code>./99-usb-serial.rules</code> now looking like this:</p> <pre><code>SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"483F88DC50304A46462E3120FF0C081A\", SYMLINK+=\"ttyACM_kochfollower\"\n</code></pre> <p>Finally copy this under <code>/etc/udev/rules.d/</code> (of host), and restart Jetson.</p> <pre><code>sudo cp ./99-usb-serial.rules /etc/udev/rules.d/\nsudo reboot\n</code></pre> <p>After reboot, check if we now have achieved the desired fixed simlinks names for the arms.</p> <pre><code>ls -l /dev/ttyACM*\n</code></pre> <p>You should get something like this:</p> <pre><code>crw-rw---- 1 root dialout 166, 0 Sep 24 17:20 /dev/ttyACM0\ncrw-rw---- 1 root dialout 166, 1 Sep 24 16:13 /dev/ttyACM1\nlrwxrwxrwx 1 root root         7 Sep 24 17:20 /dev/ttyACM_kochfollower -&gt; ttyACM0\nlrwxrwxrwx 1 root root         7 Sep 24 16:13 /dev/ttyACM_kochleader -&gt; ttyACM1\n</code></pre>"},{"location":"lerobot.html#e-optional-csi-cameras","title":"e. (Optional) CSI cameras","text":"<p>If you plan to use CSI cameras (not USB webcams) for data capture, you will use the new <code>--csi2webcam</code> options of <code>jetson-containers</code>, which exposes V4L2loopback devices that performs like USB webcams (MJPEG) for CSI cameras using Jetson's hardware JPEG encoder.</p> <p>This feature require some packages to be installed.</p> <pre><code>sudo apt update &amp;&amp; sudo apt install v4l2loopback-dkms v4l-utils\n</code></pre>"},{"location":"lerobot.html#f-increse-the-swap-file-size","title":"f. Increse the swap file size","text":"<p>You may ran out of memory when are setting up to perform ACT model training.</p> <pre><code>swapoff -a -v\nsudo rm /swfile\nsudo systemctl disable nvzramconfi\nsudo fallocate -l 8G /ssd/8GB.swap\nsudo chmod 600 /ssd/8GB.swap\nsudo mkswap /ssd/8GB.swap\nsudo echo \"/ssd/8GB.swap swap swap defaults 0 0\" &gt;&gt; /etc/fstab\nsudo reboot\n</code></pre>"},{"location":"lerobot.html#g-starting-the-lerobot-container","title":"g. Starting the <code>lerobot</code> container","text":"USB webcamsCSI cameras <pre><code>cd jetson-containers\n./run.sh \\\n  -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n  $(./autotag lerobot)\n</code></pre> <pre><code>cd jetson-containers\n./run.sh \\\n  --csi2webcam --csi-capture-res='1640x1232@30' --csi-output-res='640x480@30' \\\n  -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n  $(./autotag lerobot)\n</code></pre>"},{"location":"lerobot.html#work-with-real-world-robots-once-in-container","title":"Work with Real-World Robots - Once in container","text":"<p>JupyerLab tip</p> <p>Inside the <code>lerobot</code> container, JupyterLab server process starts.</p> <p>You can access with <code>http://localhost:8888/</code> (or <code>http://&lt;IP_ADDRESS&gt;:8888/</code> from other PC on the same network).</p> <p>In the <code>notebooks</code>, there are some Jupyter notebooks for each segment of the official tutorial Getting Started with Real-World Robots (<code>7_get_started_with_real_robot.md</code>).</p> <p></p> <p>Please note that some of them (like <code>notebooks/7-2_real-robot_configure-motors.ipynb</code>) can be used as a real work notebook to execute python codes and scritps convniently inside the notebook along with instructions (rather than switching to console).</p> <p>However, keep in mind that you are encouraged to always check the original official tutorial, and some operation like training is much better executed on console.</p> <p>Bash history tip</p> <p>Inside the container, on the console, you can press Up key to scroll through some of the frequently used commands pre-registered in bash history.</p>"},{"location":"lerobot.html#q-setup-audio","title":"q. Setup audio","text":"<p>Check if PulseAudio is available.</p> <pre><code>pactl info\n</code></pre> <p>If you need to set the default audio output device, use <code>set-default-sink</code>.</p> <pre><code>pactl list short sinks\npactl set-default-sink [SINK_NAME_OR_INDEX]\n</code></pre>"},{"location":"lerobot.html#1-order-and-assemble-your-koch-v11","title":"1. Order and Assemble your Koch v1.1","text":"<p>You can order the Koch v1.1 kits from ROBOTIS. (Note: they don't come with 3d printed parts)</p> <ul> <li>Follower arm </li> <li>Leader arm</li> </ul> <p>TODO:</p> <ul> <li> Document Jetson unique hardware setup</li> <li> Share custom 3D print models</li> </ul>"},{"location":"lerobot.html#2-configure-motors-calibrate-arms-teleoperate-your-koch-v11","title":"2. Configure motors, calibrate arms, teleoperate your Koch v1.1","text":"<p>Follow the Jupyter notebook <code>7-2_real-robot_configure-motors.ipynb</code>.</p>"},{"location":"lerobot.html#3-record-your-dataset-and-visualize-it","title":"3. Record your Dataset and Visualize it","text":"<p>You should mostly operate on the container's terminal.</p> <p>Follow the official document's section.</p> <p>Camera config tip</p> <p>The official document demonstrates the two camera positions, one at the top (\"phone\") and the other at directly in front facing the arm (\"laptop\").</p> <p>In our trials, this camera placement worked, but we needed to make the camera zoom-up to the scene so that they capture better spacial resolution.</p> <p>Another thing worth experimenting is the wrist cam. More to come later.</p> <p>Tip</p> <p>Following commands are registered in Bash history inside the <code>lerobot</code> container.</p> <pre><code>wandb login\nexport HF_USER=\npython lerobot/scripts/control_robot.py record \\\n  --robot-path lerobot/configs/robot/koch.yaml \\\n  --fps 30 \\\n  --root data \\\n  --repo-id ${HF_USER}/koch_test_$(date +%Y%m%d_%H%M%S) \\\n  --tags tutorial \\\n  --warmup-time-s 5 \\\n  --episode-time-s 30 \\\n  --reset-time-s 30 \\\n  --num-episodes 10\n</code></pre> <p>Tip</p> <p>If you plan to perfom training on a different machine, <code>scp</code> the dataset directory.</p> To another JetsonTo other PC <pre><code>scp -r data/lerobot/data/${HF_USER}/koch_test_01/ &lt;USER&gt;@&lt;IP&gt;:/ssd/jetson-containers/data/lerobot/data/${HF_USER}/\n</code></pre> <pre><code>scp -r data/lerobot/data/${HF_USER}/koch_test_01/ &lt;USER&gt;@&lt;IP&gt;:/home/&lt;USER&gt;/lerobot/data/${HF_USER}/\n</code></pre>"},{"location":"lerobot.html#4-train-a-policy-on-your-data","title":"4. Train a policy on your data","text":"<p>You should operate on ther container's terminal.</p> <p>Follow the official document's section.</p> <p>Tip</p> <pre><code>wandb login\nDATA_DIR=data python lerobot/scripts/train.py \\\n    dataset_repo_id=${HF_USER}/koch_test \\\n    policy=act_koch_real \\\n    env=koch_real \\\n    hydra.run.dir=outputs/train/act_koch_test \\\n    hydra.job.name=act_koch_test \\\n    device=cuda \\\n    wandb.enable=true\n</code></pre> <p>Tip</p> <p>If you perform the training on other Jetson or PC, <code>scp</code> the outputs directory content back to the orinal Jetson that has the leader and follower arm attached.</p> <pre><code>scp -r outputs/train/act_koch_test_01/ &lt;USER&gt;@&lt;IP&gt;:/ssd/jetson-containers/data/lerobot/outputs/train/ \n</code></pre>"},{"location":"lerobot.html#5-evaluate-your-policy","title":"5. Evaluate your policy","text":"<p>You should operate on the container's terminal.</p> <p>Follow the official document's section.</p> <p>Tip for a. Use <code>koch.yaml</code> and our <code>record</code> function</p> <p>Modify the command in the bash history to add <code>-p</code> arugment to points to the policy checkpoint.</p> <pre><code>python lerobot/scripts/control_robot.py record \\\n  --robot-path lerobot/configs/robot/koch.yaml \\\n  --fps 30 \\\n  --root data \\\n  --repo-id ${HF_USER}/eval_koch_test_01 \\\n  --tags tutorial eval \\\n  --warmup-time-s 5 \\\n  --episode-time-s 30 \\\n  --reset-time-s 30 \\\n  --num-episodes 10 \\\n  -p outputs/train/act_koch_test/checkpoints/last/pretrained_model \\\n  --run-compute-stats 0\n</code></pre> <p>Tip for Visualize evaluation afterwards</p> <pre><code>python lerobot/scripts/visualize_dataset.py \\\n  --root data \\\n  --repo-id ${HF_USER}/eval_koch_test\n</code></pre> <p></p> <p>If everything goes well, you should see </p>"},{"location":"lerobot.html#basic-walkthrough","title":"Basic Walkthrough","text":"<p>This is from the lerobot top README.md.</p>"},{"location":"lerobot.html#visualize-datasets","title":"Visualize Datasets","text":"<p>Outside of container, first launch the rerun.io visualization tool that LeRobot uses <sup>[\u2197]</sup></p> <pre><code>pip3 install rerun-sdk\nrerun\n</code></pre> <p>Then, start the docker container to playback one of these LeRobot datasets.</p> <pre><code>jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n  python3 lerobot/scripts/visualize_dataset.py \\\n    --repo-id lerobot/pusht \\\n    --episode-index 0\n</code></pre> <p></p>"},{"location":"lerobot.html#evaluate-a-pretrained-diffusion-policy","title":"Evaluate a Pretrained Diffusion Policy","text":"<p>This will download and run a pre-trained diffusion model on the PushT environment <sup>[\u2197]</sup></p> <pre><code>jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n  python3 lerobot/scripts/eval.py \\\n    -p lerobot/diffusion_pusht \\\n    eval.n_episodes=10 \\\n    eval.batch_size=10\n</code></pre>"},{"location":"lerobot.html#train-your-own-act-policy","title":"Train your own ACT Policy","text":"<p>Next, train ACT on the Aloha manipulation environment <sup>[\u2197]</sup></p> <pre><code>jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n  python3 lerobot/scripts/train.py \\\n    policy=act \\\n    env=aloha \\\n    env.task=AlohaInsertion-v0 \\\n    dataset_repo_id=lerobot/aloha_sim_insertion_human \n</code></pre> <p>See Trossen Robotics for dual-arm ALOHA kits, and Robotis for the low-cost Koch v1.1 kit used in the LeRobot tutorial:</p> <p></p> <p>HuggingFace LeRobot - Get Started with Real-World Robots (YouTube Playlist)</p>"},{"location":"llama_vlm.html","title":"Llama 3.2 Vision","text":"<p>The latest additions to Meta's family of foundation LLMs include multimodal vision/language models (VLMs) in 11B and 90B sizes with high-resolution image inputs (1120x1120) and cross-attention with base completion and instruction-tuned chat variants:</p> <ul> <li><code>Llama-3.2-11B-Vision</code></li> <li><code>Llama-3.2-11B-Vision-Instruct</code></li> <li><code>Llama-3.2-90B-Vision</code></li> <li><code>Llama-3.2-90B-Vision-Instruct</code></li> </ul> <p>While quantization and optimization efforts are underway, we have started with running the unquantized 11B model in a container based on HuggingFace Transformers that has been updated with the latest support for Llama-3.2-Vision a jump start on trying out these exciting new multimodal models - thanks to Meta for continuing to release open Llama models!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>12.8GB</code> for <code>llama-vision</code> container image</li> <li>Space for models (<code>&gt;25GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> <li> <p>Request access to the gated models here with your HuggingFace API key.</p> </li> </ol>"},{"location":"llama_vlm.html#code-example","title":"Code Example","text":"<p>Today Llama-3.2-11B-Vision is able to be run on Jetson AGX Orin in FP16 via HuggingFace Transformers.  Here's a simple code example from the model card for using it: </p> <pre><code>import time\nimport requests\nimport torch\n\nfrom PIL import Image\nfrom transformers import MllamaForConditionalGeneration, AutoProcessor\n\nmodel_id = \"meta-llama/Llama-3.2-11B-Vision\"\nmodel = MllamaForConditionalGeneration.from_pretrained(model_id, device_map=\"auto\", torch_dtype=torch.bfloat16)\nprocessor = AutoProcessor.from_pretrained(model_id)\n\nprompt = \"&lt;|image|&gt;&lt;|begin_of_text|&gt;If I had to write a haiku for this one\"\nurl = \"https://llava-vl.github.io/static/images/view.jpg\"\nraw_image = Image.open(requests.get(url, stream=True).raw)\n\ninputs = processor(text=prompt, images=raw_image, return_tensors=\"pt\").to(model.device)\noutput = model.generate(**inputs, do_sample=False, max_new_tokens=32)\n</code></pre> <p></p> <pre><code>If I had to write a haiku for this one, it would be:\n\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\n</code></pre> <p>Initial testing seems that Llama-3.2-Vision has more conversational abilities than VLMs typically retain after VQA alignment.  This llama_vision.py script has interactive completion and image loading to avoid re-loading the model.  It can be launched from the container like this:</p> <pre><code>jetson-containers run \\\n    -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n    $(autotag llama-vision) \\\n      python3 /opt/llama_vision.py \\\n        --model \"meta-llama/Llama-3.2-11B-Vision\" \\\n        --image \"/data/images/hoover.jpg\" \\\n        --prompt \"I'm out in the\" \\\n        --max-new-tokens 32 \\\n        --interactive\n</code></pre> <p>After processing the initial image, it will ask you to submit another prompt or image:</p> <pre><code>total 4.8346s (39 tokens, 8.07 tokens/sec)\n\nEnter prompt or image path/URL:\n\n&gt;&gt; \n</code></pre> <p>We will update this page and container as support for the Llama-3.2-Vision architecture is added to quantization APIs like MLC and llama.cpp for GGUF, which will reduce the memory and latency.</p>"},{"location":"nerf.html","title":"Nerfstudio - Neural Reconstruction","text":"<p>Nerfstudio is a tool for training and visualizing Neural Radiance Fields (NeRF) models for 3D volumetric reconstruction from a collection of images taken of a scene or environment. It's designed to be easy to use and starts generating imagery at the outset, which becomes further refined as training progresses. Neural reconstruction is being applied to edge devices and robotics like with FruitNeRF. This tutorial will guide you through the process of training high-quality NeRF models using nerfstudio on Jetson.</p> <p></p> <p>Special thanks to Johnny N\u00fa\u00f1ez Cano for porting the NeRF stack to Jetson! See NeRF in 2023: Theory and Practice by It-Jim for useful background info.</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>17.6GB</code> for <code>nerfstudio</code> container image</li> <li>Space for models and datasets (<code>&gt;5GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"nerf.html#start-container","title":"Start Container","text":"<p>Use this command to automatically run, build, or pull a compatible container image for nerfstudio:</p> <p><pre><code>jetson-containers run $(autotag nerfstudio)\n</code></pre> To mount your own directories into the container, use the <code>-v</code> or <code>--volume</code> flags:</p> <pre><code>jetson-containers run -v /path/on/host:/path/in/container $(autotag nerfstudio)\n</code></pre> <p>Later a webserver will be reachable at <code>http://localhost:7007</code>.  The official docs include command-line usage and models <sup>[\u2197]</sup></p>"},{"location":"nerf.html#training-your-first-model","title":"Training Your First Model","text":"<p>It's recommended to train a test scene first by following these steps from the getting started tutorial:</p> <pre><code># Download some test data\nns-download-data nerfstudio --capture-name=poster\n\n# Train model\nns-train nerfacto --data /data/nerfstudio/models\n</code></pre> <p>After it loads, you can navigate your browser to <code>http://localhost:7007</code> (or substitute your Jetson's IP address)</p> <p></p> <p>The 3D visualizations will begin rendering after the first training step, and progressively become more refined.  This process typically takes an hour on Jetson AGX Orin to complete the default setting of 30,000 training steps.  The model checkpoints are saved under the mounted <code>jetson-containers/data/nerfstudio</code> directory, and generate the same output quality as before.</p>"},{"location":"nerf.html#fruitnerf","title":"FruitNeRF","text":"<p>The creators behind FruitNeRF fused NeRFs with segmentation for more accurate counting of fruits in 3D.  By training a semantic neural radiance field on both RGB and segmentation labels from SAM, the model is robust to occlusions, double-counting, and mistaken or undesirable fruits. </p> <p>Extended sampling of the volumetric grid can then blob objects for pose estimation, occupancy mapping, and navigation. Such approaches are promising for combining the strengths of NeRFs for 3D reconstruction, segmentation VITs for open-vocabulary classification, and Jetson's onboard compute for high-definition environmental scanning and perception while in the field.</p>"},{"location":"nerf.html#data-preparation","title":"Data Preparation","text":"<p>We'll use the FruitNeRF datasets as an example of training NeRFs on custom data with nerfstudio.  To train a NeRF model on a different scene, you first need to capture a set of images and corresponding camera poses (which can be estimated with the included photogrammetry and image registration tools like COLMAP and OpenSFM as shown here)</p> <p>The images should be taken from different viewpoints of the scene you want to model, structured in the nerfstudio dataset format.  For FruitNeRF, you can just download and extract pre-recorded real or synthetic data from here:</p> Real DataSynthetic data <pre><code>cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Real.zip\nunzip FruitNeRF_Real.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n</code></pre> <pre><code>cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Synthetic.zip\nunzip FruitNeRF_Synthetic.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n</code></pre>"},{"location":"nerf.html#training-a-fruitnerf-model","title":"Training a FruitNeRF Model","text":"<p>This command will generate a NeRF on the first tree (there are multiple trees to try from each dataset).  You can find the complete usage information and models on the official GitHub repo: <code>github.com/meyerls/FruitNeRF</code></p> <pre><code>ns-train nerfacto \\\n  --data /data/nerfstudio/FruitNeRF_Dataset/tree_01 \\\n  --output-dir /data/nerfstudio/models \\\n  --pipeline.datamanager.camera-res-scale-factor 0.5\n</code></pre> <p>The memory usage depends on the number of images in the dataset and their resolution, so if you're running low on memory try adjusting the scaling factors and image sampling parameters. The simulated data has a lower resolution and can run at full size.</p> <p></p>"},{"location":"openvla.html","title":"OpenVLA - Vision/Language Action Models for Embodied Robotics","text":"<p>Fine Tuning and Deployment Guide</p> <p>The tutorials's goal is to provide optimized quantization and inference for deploying VLA models, along with reference fine-tuning workflows for adapting models for new robots, tasks, and environments.  Rigorous performance and accuracy validation is applied in a self-contained sim environment with scenario generation and domain randomization (MimicGen).  Future phases will include sim2real with Isaac Lab and ROS2 integration, study of related models like CrossFormer and optimizations to the neural architecture for realtime performance. </p> <p>\u2705 Quantization and inference optimizations for VLA models \u2705 Accuracy validation of the original OpenVLA-7B weights \u2705 Reference fine-tuning workflow with synthetic data generation \u2705 On-device training with LoRA's on Jetson AGX Orin and full fine-tuning on A100/H100 instances \u2705 85% accuracy on an example block-stacking task with domain randomization \u2705 Sample datasets and test models for reproducing results </p> <p>Thank you to OpenVLA, Open X-Embodiment, MimicGen, Robosuite and many others with related work for sharing their promising research, models, and tools for advancing physical AI and robotics.</p>"},{"location":"openvla.html#vla-architecture","title":"VLA Architecture","text":"<p>OpenVLA is a vision/language action model for embodied robotics and behavioral learning built on LLM/VLMs (this base model is a Prismatic VLM using Llama-7B, DINOv2, and SigLIP).  Instead of image captioning or visual question/answering, VLA models generate action tokens from camera images and natural language instructions that are used for controlling the robot.  Action tokens are discrete token ID's reserved from the text tokenizer's vocabulary that map to continuous values, normalized against the range of motion of each robot. These real-valued tokens are more efficient and accurate than the model outputting numerical data as text in JSON or Pydantic formats, where each digit, decimal point, separator, and whitespace takes an additional token to generate.  Other hybrid vision/language models like Florence-2 have adopted similar approaches for continuous-domain prediction using Transformers.</p> <p>Each action token generated by the model represents a degree-of-freedom of the output coordinate space (i.e. xyz, rotation pose), or a component of the robot that can be controlled (like the gripper). OpenVLA-7B was trained on the Open X-Embodiment dataset for manipulation, with a 7-DoF action space consisting of <code>(delta xyz, delta roll/pitch/yaw, gripper)</code>.  The position and rotation are relative changes to the end-effector (EEF) pose, with an external inverse kinematics (IK) solution like cuMotion solving joint constraints specific to each robotic arm.  The gripper dimension is an absolute control between 0 (open) and 1 (closed) that does not recieve further scaling/normalization.</p> <p></p> <p>OpenVLA reserves 256 of the least-frequently used tokens out of the Llama-7B vocabulary for action values, which gives it 8-bit resolution over the controls.  It has an input image resolution of 224x224 to stacked DINOv2/SigLIP vision encoders that are projected to ~275 input tokens (plus the text prompt), and outputs 7 tokens mapped to <code>(\u0394pos, \u0394rotation, gripper)</code> coordinates.</p>"},{"location":"openvla.html#quantization","title":"Quantization","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models and datasets (<code>&gt;15GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol> <p>Support for OpenVLA has been added to NanoLLM on top of its streaming VLM pipeline with INT4/FP8 quantization using MLC and vision encoders in FP16 with TensorRT.  First we'll test the model on BridgeData V2, one of the top weighted datasets from the Open X-Embodiment collection.  The model was trained on this data and is used to confirm that the quantization and inference are working correctly during deployment.  This is what the dataset looks like, courtesy of their website:</p> <p>The following command starts the container, downloads the dataset and model (if needed), quantizes it on the first run, and measures the accuracy of the action values against the groundtruth from the dataset using normalized mean-squared error (NRMSE) to unbias the varying ranges each dimension of the action space can have.  We extracted a 100-episode subset of the original Bridge data here on HuggingFace Hub, so you don't need to download the entire ~400GB dataset just for these tests.</p> INT4FP8FP16 <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model openvla/openvla-7b \\\n    --quantization q4f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_bridge_int4.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model openvla/openvla-7b \\\n    --quantization q8f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_bridge_fp8.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api hf \\\n    --model openvla/openvla-7b \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_bridge_fp16.json\n</code></pre> Quantization Accuracy Latency FPS FP16 95.3% 840 ms 1.19 FP8 95.2% 471 ms 2.12 INT4 90.1% 336 ms 2.97 <p>These results were run on Jetson AGX Orin 64GB with JetPack 6, and we will see later with our fine-tuned model the INT4 accuracy match FP8/FP16.</p> <p>Each frame, the 7D action vector predicted by the model is printed along with the groundtruth, along with the accuracy, latency, and framerate for that frame.  The numbers printed after <code>~</code> are the averages of those so far, with the last value reported being the mean over the entire dataset processed.  </p> <pre><code># INT4\nstep 355  [-0.02692  0.00776 -0.00299  0.08160  0.07292  0.04791  0.99608]  accuracy 0.8466 ~0.9017  time=336.2 ms  fps=2.96 ~2.97\ngt   355  [-0.02387  0.00760 -0.00318  0.15965  0.07707  0.03281  1.00000]\n\n# FP8\nstep 355  [-0.02392  0.00767 -0.00310  0.08160  0.07692  0.03217  0.99608]  accuracy 0.9982 ~0.9523  time=469.7 ms  fps=2.13 ~2.12\ngt   355  [-0.02387  0.00760 -0.00318  0.15965  0.07707  0.03281  1.00000]\n\n# FP16\nstep 355  [-0.02392  0.00767 -0.00310  0.08160  0.07692  0.03217  0.99608]  accuracy 0.9982 ~0.9531  time=842.3 ms  fps=1.19 ~1.18\ngt   355  [-0.02387  0.00760 -0.00318  0.15965  0.07707  0.03281  1.00000]\n</code></pre> <p>The per-frame metrics and averages can be saved with the <code>--save-stats</code> argument, and in the interests of time you can cap the amount of episodes processed with <code>--max-episodes</code>.  As mentioned above, the Bridge dataset used was included in the training dataset, and further below we run this again on data we generated not from the training dataset with significant variation. This tool can also load other datasets in RLDS/TFDS format from Open X-Embodiment, and HDF5 from Robomimic/MimicGen.  You can also create your own agents and scripts using the exposed APIs from the coding examples below.</p>"},{"location":"openvla.html#inference-api","title":"Inference API","text":"<p>The code is simple for running VLA inference on camera streams using the NanoLLM library in the container:</p> VLA on Video<pre><code>from nano_llm import NanoLLM\nfrom nano_llm.plugins import VideoSource\n\n# load vision/language action model\nmodel = NanoLLM.from_pretrained(model, quantization='q4f16_ft')\ncamera = VideoSource(video_source, cuda_stream=0)\n\nassert(model.vla)  # make sure this is a VLA\n\nwhile True:\n    # returns a cudaImage, np.ndarray, or torch.Tensor on the GPU\n    image = camera.capture()\n\n    if image is None: # in case of timeout, keep trying\n        continue\n\n    # returns a np.ndarray or torch.Tensor with vla.dof elements\n    # for OpenVLA, this is (\u0394x, \u0394y, \u0394z, \u0394roll, \u0394pitch, \u0394yaw, gripper)\n    actions = model.vla.predict_action(\n        image, \n        instruction=\"pick up the nearest object\", \n        action_space=\"normalized\",\n        return_tensors='np',\n    )\n\n    # send the actions to your robot controller or IK solver\n    ...\n</code></pre> <p>VLA models are also supported in Agent Studio, which includes the simulator components as well.</p>"},{"location":"openvla.html#online-validation","title":"Online Validation","text":"<p>Given the challenging task domain, dynamic feedback loops, and computational demands for sim/training/inference, using VLAs for language-guided dexterous manipulation involves a significant increase in complexity over baseline usage of LLMs and VLMs.  To go from predicting logits at the token level to actions consistently correct enough over an extended series of frames to form useful behaviors, it's important to cross-check outputs and measure accuracy at each stage of the training/inference workflow to be able to identify the source of potential regressions when they occur.  </p> <p>Unlike typical applications in supervised learning, the metrics for end-task completion and success aren't measured from static pre-recorded datasets that don't account for the temporal domain and feedback from physical interactions along with compounding errors - they require online validation, either in simulation or real-world tests.  </p> <p> Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab</p> <p>During training the token classification accuracy is measured from the groundtruth action labels (i.e. how many action tokens were predicted exactly right), with the model optimizing to minimize this loss (as is normal for LLMs).  Action accuracy in the continuous domain is also is also measured during training from the L1 error of the detokenized real-valued outputs.  Continuous action accuracy trends slightly higher than token classification accuracy, as the later does not provide any reward for being closer to the desired result.  In practice, these should be &gt;95% accurate at this level for completing tasks successfully in similar environments.  To achieve that high degree of accuracy, it seems intentional in the work and related research to overfit the model by training it for many epochs (upwards of 30 epochs on the same 900K episodes for OpenVLA).  Transformers are known to recall specific knowledge from few training examples, and are sensitive to overfitting and forgetting previously learned information.  As such, LLMs are normally only trained for a few epochs at most to preserved their zero-shot capabilities and ability to generatize to out-of-distribution inputs.  During the fine-tuning part of this project, we characterize the impacts on model accuracy and task success from the number of distinct training episodes versus the number of epochs over repeated data.</p> <p>The actual task success rate doesn't get measured until the inference stage, when it is either connected to a simulator or physically tested in a series of time-consuming trials under similar conditions.  We integrated MimicGen directly with the OpenVLA training scripts for an endless source of unseen data, but encountered gradient instabilities after the model had received a significant number of episodes.</p>"},{"location":"openvla.html#simulation-with-mimicgen","title":"Simulation with MimicGen","text":"<p>MimicGen creates randomized episodes from as few as 10 teleoperated examples by utilizing scene graph information and task/subtask metadata about which objects in the environment are targets of the current subtask, in order to interpolate the original teloperated trajectories into their new random locations and poses.  This generates large amounts of unique training data to improve robustness, without needing large amounts of human effort for the robot learning new skills and behaviors.    </p> <p>MimicGen is built on the Robomimic and Robosuite simulators and are able to run onboard Jetson headlessly alongside the VLA, simplifying the setup for reproducibility.  The RoboCasa project is built on MimicGen and being integrated with NVIDIA Omniverse and OSMO, and in future work we'd use Isaac Lab for scalability, more accurate physics, and photorealistic rendering.</p> <p>MimicGen includes 12 tasks like block stacking, pick and place, assembly, and kitchen scenarios.  And each type of task has variants increasing in difficulty as learning progresses, which would be interesting to compare curated approaches to the purely random sequencing that OpenVLA uses with Open X-Embodiment.  In this phase of the tutorial, we focus on the block stacking task to understand the training requirements and runtime performance needed to master a new task with success rates of &gt;75-80%, similar to the paper.  This will help inform scaling to multiple behaviors and more complex scenarios that vary significantly from in-distribution examples like the MimicGen environments (as evidenced by the original OpenVLA weights scoring zero successes in them).</p>"},{"location":"openvla.html#data-generation","title":"Data Generation","text":"<p>We built MimicGen containers for Jetson from a fork of the code with some patches for aarch64+igpu along with enhancements like generation of natural language labels with random variations for the relevant tasks, along with additional domain randomization for the colors/materials of objects (these environments were added as <code>Stack_D2</code>, <code>Stack_D3</code>, and <code>Stack_D4</code>).  For training OpenVLA, the images and labels are saved to disk, whereas later inference is done with online simulation to measure the task success rate.  To that effect we integrated MimicGen with Agent Studio for interactively testing the models and quickly dropping in components like ASR for verbally commanding the robot.</p> <p>Online Training</p> <p>There's initial support for direct integration of MimicGen in this fork of OpenVLA for live simulation and validation during training and endless episodes without repeating epochs.  The models experienced spiking gradients later into LoRA's, and should try again with lower learning rates or by similarly integrating MimicGen into their full fine-tuning script using FDSP for increasing the batch size on dGPU.  </p> <p>This command will generate the specified number of training episodes, saved in Robomimic HDF5 format. We provide the rendered datasets for these on HuggingFace Hub with 1000 and 2500 episodes.  OpenVLA suggests only needing 10-150 episodes for fine-tuning and data-efficient adaptation, which perhaps performs similarly in comparable spaces, but we ultimately found insufficient for the MimicGen environments.   </p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m mimicgen.generate \\\n      --tasks Stack_D4 \\\n      --episodes 100 \\\n      --output /data/datasets/mimicgen \\\n      --cameras agentview \\\n      --camera-width 224 \\\n      --camera-height 224\n</code></pre> <p>The HDF5 dataset will be saved to <code>/data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5</code> (which is in a mounted volume under your <code>jetson-containers/data</code> directory outside of the container), along with a video of sample episodes that were rendered:</p> <p>This video is actually of Stack_D2 to avoid subjecting everyone to flickering colors.  Stack_D4 is used for training and generates blocks with random colors and positions each frame, along with language labels augmented through the random combination of various nouns, adjectives, and verbs that form the instruction (<code>Stack the red block on the green block</code>, <code>Put the little cube on top</code>).  Stack_D3 randomizes colors/positions each frame, and instructions each episode.  Stack_D2 does them all per-episode (which is typically used at runtime). Since OpenVLA uses a single frame at a time with no temporal aspect during training, applying domain randomization per-frame as opposed to per-episode is feasible provides more variance in the dataset.  The block-stacking episodes typically come out to be around ~110 frames each, and take around 10-15 seconds to generate per episode on Jetson AGX Orin with per-frame domain randomization, and 5 seconds per episode without domain randomization.</p> <p>The agentview camera looks onward from the front of the scene towards the robot. There are others available like sideview and eye_in_hand (wrist view) - we tried using the onboard wrist camera, but found the model would too easily veer off track and get 'lost' offscreen.  It may be possible for wrist-only to work should the dataset add examples of the robot recovering and returning to a wider vantage point.  Other VIT-based embodied models like Octo and CrossFormer use both cameras, and is a future experiment with VLA's based on multi-image VLM's like VILA.</p>"},{"location":"openvla.html#rlds-conversion","title":"RLDS Conversion","text":"<p>OpenVLA uses datasets in RLDS format (which is based on TFDS), so we provide a converter from HDF5.  This extra step can also be time-consuming for a large number of epiodes, like those used here.  This is one of the reasons we desire to run MimicGen online with training and performed the initial integration directly with OpenVLA.  Unless you are generating different data, you can skip this and use the MimicGen datasets that we uploaded here in RLDS format.</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.datasets \\\n        --dataset /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5 \\\n        --dataset-type mimicgen \\\n        --convert rlds \\\n        --remap-keys agentview:image \\\n        --output /data/datasets/mimicgen/rlds/stack_d4_ep2500\n</code></pre> <p>This will create a set of tfrecord files under the output directory that are able to be loaded by the OpenVLA training scripts.</p>"},{"location":"openvla.html#fine-tuning","title":"Fine Tuning","text":"<p>A primary objective of this project is to characterize the training needed to adapt the model to different robots and tasks.  Our development primarily consisted of running test LoRA's onboard Jetson AGX Orin 64GB and debugging issues locally, and when the results were encouraging to perform a full fine-tuning with FDSP on multiple A100/H100's from spot instance providers like Brev.dev, Vast.ai, and RunPod.  Full fine-tuning on 2x Jetson AGX Orin's was attempted with FDSP, but ran out of memory with all settings that we tried. We provide the test models we trained on HuggingFace for others to try in the inference + sim setup below.  Below are the training GPU configurations that were used, along with their batch sizes that maximized memory usage:</p> Batch Size FPS $/hr Jetson AGX Orin 64GB 8 (LoRA) 1.57 - 2x A100 SMX4 80GB 48 13.07 ~$1.50 8x H100 NVL 94GB 256 92.4 ~$25 <p>The rental fees are ballpark averages over the spot instances available with these GPUs at the time of writing, and becomes quite reasonable when used alongside a Jetson repurposed for training daily test LoRA's on a reduced amount of data.  Training until convergence on Jetson and 2xA100 took roughly 24-36 hours depending on the amount of data and number of epochs.  We kept to &lt;5 epochs for the full fine-tunes in an attempt to prevent the afformentioned overfitting, instead opting to increase the number of episodes.</p> <p>Below we provide the steps to run the OpenVLA LoRA training on Jetson, and for the dGPU systems refer to Fully Fine-Tuning OpenVLA.  Typically you will launch a spot instance with your provider of choice in a CUDA or PyTorch container, then install the OpenVLA repo and its dependencies with pip, and download your dataset to the system before launching the command (or create a bundled container with it all included to save time).  Here's the WandB Dashboard from the full fine-tuning runs that you can inspect, comparing a fewer number of episodes for more epochs, versus a larger number of episodes trained for fewer epochs:</p>"},{"location":"openvla.html#on-device-lora","title":"On-Device LoRA","text":"<p>The OpenVLA repo provides working training scripts for LoRA/qLoRA and multi-node multi-GPU full fine-tunes using PyTorch FDSP.  It was not difficult to go in a make changes and enhancements, some of which we have done for our purposes of on-device training in this fork.  Overall we found the process to be more similar than not to training other vision DNNs, just with larger datasets and rigorous validation required of the data pipeline that all the coordinate spaces and transformations matched up at every step of the sim\u2192training\u2192inference workflow.</p> <p>We built an OpenVLA container for JetPack that runs the LoRA training, which you can find the specific documentation about from the OpenVLA readme (it's also recommended to read their paper which includes many insights into the training process).</p> <pre><code>jetson-containers run $(autotag openvla) \\\n  torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py \\\n      --vla_path openvla/openvla-7b \\\n      --data_root_dir /data/datasets/mimicgen/rlds \\\n      --dataset_name stack_d4_ep2500 \\\n      --run_root_dir /data/models/openvla \\\n      --lora_rank 32 \\\n      --batch_size 8 \\\n      --grad_accumulation_steps 2 \\\n      --learning_rate 5e-4 \\\n      --image_aug False \\\n      --save_steps 250 \\\n      --epochs 5\n</code></pre> <p>This will start a TensorBoard server on port 6006 to monitor the training progress.  Typically you would set the script running for more epochs than you intend to actually run, so that you can instead stop when the model converges (typically occurring with a loss below 0.5 and token accuracy &gt;95%). This script was adapted so that if you interrupt training by pressing <code>Ctrl+D</code> from the terminal, it will gracefully stop early and still merge the LoRA weights before exiting.  If training is terminated otherwise, we added a <code>merge.py</code> script that you should run afterwards get the model ready for inference.</p>"},{"location":"openvla.html#validation","title":"Validation","text":"<p>Now that we have trained our test model (or you can download one from here), let's re-validate it again like we did above on the original OpenVLA model, but this time on unseen data from MimicGen with a different random seed (<code>dusty-nv/mimicgen-stack_d4-ep100</code>).  These commands will download and run the fully fine-tuned checkpoint (on 2500 episodes for 4 epochs) that we released to (<code>dusty-nv/openvla-7b-mimicgen</code>.  If you trained your own model, you can substitute the local path to the HF checkpoint.</p> INT4FP8FP16 <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model dusty-nv/openvla-7b-mimicgen \\\n    --quantization q4f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_mimicgen_int4.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model dusty-nv/openvla-7b-mimicgen \\\n    --quantization q8f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_mimicgen_fp8.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api hf \\\n    --model dusty-nv/openvla-7b-mimicgen \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_mimicgen_fp16.json\n</code></pre> <p>The results from this are collated in the next section along with the end-task success rates.  Time to see it in action!    </p>"},{"location":"openvla.html#inference-simulation","title":"Inference + Simulation","text":"<p>To measure how well our model actually performs at completing the task, we spin up a MimicGen environment in Agent Studio that's connected to the VLA model.  It counts the number of successful episodes by checking the reward issued by the sim, which is not used by the model but signals when the task was completed.  We use a horizon of 200 frames for evaluation, after which it is deemed to be a failure.  </p> INT4FP8FP16 <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.studio --load OpenVLA-MimicGen-INT4\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP8\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP16\n</code></pre> <p>To start the benchmarking, connect the output of the <code>AutoPrompt</code> node to the <code>OpenVLA</code> node.  On its own it will run forever - we did it for 100 episodes each, which can take several hours since the sim operates in lock step with the model (future experiments will train on actions accumulated from multiple timesteps and also reduce the model size to improve performance).</p> Quantization Train Accuracy Val Accuracy Task Success Avg Frames Latency FPS FP16 96.5% 85.4% 86% 132 827 ms 1.20 FP8 96.2% 85.1% 85% 131 467 ms 2.14 INT4 95.4% 84.4% 84% 138 335 ms 2.98 <p>This is using the model fine-tuned on 2500 episodes for 4 epochs, and although the task may have been simple, is evidence of achieving the sought-after success rates of ~85%.  Quantization has a negligible ~1% impact while scaling performance almost linearly.  The average number of frames is how long it took the robot to complete the task, which efficiency is another important end-metric to evalulate models by (consider the source teleop episodes were ~110 frames long, and we realized after that these averages include failed episodes during evaluation). The training dataset <code>dusty-nv/bridge_orig_ep2500</code> was used to measure the action Train Accuracy, while the previously unused and distinct <code>dusty-nv/bridge_orig_ep100</code> was used for Validation Accuracy.</p>"},{"location":"openvla.html#episodes-vs-epochs","title":"Episodes vs Epochs","text":"<p>Upon measuring the success rates of the other fine-tuned models that were trained on fewer episodes for more epochs, we can see the impact of increasing the size of the dataset:</p> Episodes Epochs Frames Task Success Avg Frames 500 10 550K 23% 186 1000 6 660K 48% 165 2500 4 1.1M 84% 138 <p>That isn't to say that the 2500-episode model still isn't overfit - it is after learning to always stack the smallr block and essentially ignore the instructions.  More task diversity in the training is required, which we can do now that we have the workflow.  And we want to branch out into real-world domains instead of test environments in simulation to prove model viability.  The amount of variance and data required to achieve generalist models in the challenging problem space of manipulation does raise interesting questions about purely random task learning versus more guided and curated approaches that ramp up in complexity as training progresses.</p>"},{"location":"openvla.html#future-research","title":"Future Research","text":"<p>Embodied agents are an exciting emerging area at the forefront of robotics and physical AI, with many promising avenues to investigate further.  Follow-up publications to OpenVLA include Embodied-CoT and CrossFormer, along with others sure to be in development.  In addition to proceeding to physical testing, these areas of interest we'll experiment with as research progresses:</p> <ul> <li>Smaller LLM with higher-resolution vision encoder(s)</li> <li>Multiple image inputs - multiple cameras and/or timesteps</li> <li>Action states from previous frame(s) as input</li> <li>Training on consecutive actions for larger timesteps</li> <li>Similar test model for UGV's in sim</li> <li>Using Isaac Lab and Robocasa</li> <li>sim2real deployment with ROS2</li> <li>Trade-offs of VLA vs VIT-based approaches</li> </ul>"},{"location":"research.html","title":"Jetson AI Lab Research Group","text":"<p>The Jetson AI Lab Research Group is a global collective for advancing open-source Edge ML, open to anyone to join and collaborate with others from the community and leverage each other's work.  Our goal is using advanced AI for good in real-world applications in accessible and responsible ways.  By coordinating together as a group, we can keep up with the rapidly evolving pace of AI and more quickly arrive at deploying intelligent multimodal agents and autonomous robots into the field.</p> <p>There are virtual meetings that anyone is welcome to join, offline discussion on the Jetson Projects forum, and guidelines for upstreaming open-source contributions. </p> <p>Next Meeting - 11/26</p> <p> The next team meeting is on Tuesday, November 26<sup>th</sup> at 9am PST - see the invite below or click here to join the meeting in progress.</p>"},{"location":"research.html#topics-of-interest","title":"Topics of Interest","text":"<p>These are some initial research topics for us to discuss and investigate. This list will vary over time as experiments evolve and the SOTA progresses:</p> \u2022 Controller LLMs for dynamic pipeline code generation \u2022 Fine-tuning LLM/VLM onboard Jetson AGX Orin 64GB \u2022 HomeAssistant.io integration for smart home [1] [2] \u2022 Continuous multi-image VLM streaming and change detection \u2022 Recurrent LLM architectures (Mamba, RKVW, ect) [1] \u2022 Lightweight low-memory streaming ASR/TTS models \u2022 Diffusion models for image processing and enhancement \u2022 Time Series Forecasting with Transformers [1] [2] \u2022 Guidance, grammars, and guardrails for constrained output \u2022 Inline LLM function calling / plugins from API definitions \u2022 ML DevOps, edge deployment, and orchestration \u2022 Robotics, IoT, and cyberphysical systems integration <p>New topics can be raised to the group either during the meetings or on the forums (people are welcome to work on whatever they want of course)</p>"},{"location":"research.html#contribution-guidelines","title":"Contribution Guidelines","text":"<p>When experiments are successful, ideally the results will be packaged in such a way that they are easily reusable for others to integrate into their own projects:</p> <ul> <li>Open-source libraries &amp; code on GitHub</li> <li>Models on HuggingFace Hub</li> <li>Containers provided by jetson-containers</li> <li>Discussions on the Jetson Projects forum</li> <li>Documentation &amp; tutorials on Jetson AI Lab</li> <li>Hackster.io for hardware-centric builds</li> </ul> <p>Ongoing technical discussions are encouraged to occur on the forums or GitHub Issues, with status updates on projects given during the meetings.</p>"},{"location":"research.html#meeting-schedule","title":"Meeting Schedule","text":"<p>We'll aim to meet monthly or bi-weekly as a team in virtual meetings that anyone is welcome to join and speak during.  We'll discuss the latest updates and experiments that we want to explore.  Please remain courteous to others during the calls.  We'll stick around after for anyone who has questions or didn't get the chance to be heard.</p> <p>Tuesday November 26<sup>th</sup> at 9am PST (11/26/24)</p> <ul> <li>Microsoft Teams - Meeting Link </li> <li>Meeting ID: <code>264 770 145 196</code></li> <li>Passcode: <code>Uwbdgj</code> </li> <li>Outlook Invite:  <code>Jetson AI Lab Research Group.ics</code> </li> </ul> <p>The agenda will be listed here beforehand - post to the forum to add agenda items.  The meetings will be recorded so anyone unable to attend live can watch them after.</p>"},{"location":"research.html#past-meetings","title":"Past Meetings","text":"November 12, 2024 <p>Topics Covered:</p> <ul> <li>SMACC in Isaac Sim (Brett Aldrich, robosoft.ai)</li> <li>Kaya sim2real Agents (Kabilan Kb, DLI Instructor)</li> </ul> October 29, 2024 <p>Topics Covered:</p> <ul> <li>ROSA Connector for Isaac Sim (Rob Royce, NASA JPL)</li> <li>The OASIS Project: A Technical Brief (Kris Kersey, Kersey Fabrications)</li> </ul> October 15, 2024 <p>Topics Covered:</p> <ul> <li>HuggingFace LeRobot (Chitoku Yato)</li> <li>Stanley H1 Humanoid (Dave Niewinski)</li> </ul> October 1, 2024 <p>Topics Covered:</p> <ul> <li>ReMEmbR: Long-Horizon Memory for Navigation (Abrar Anwar)</li> <li>Diffusion Policies, Shape LLM, 3D Encoders</li> </ul> September 17, 2024 <p>Topics Covered:</p> <ul> <li>NASA JPL - ROSA (Rob Royce &amp; Shehryar Khattak)</li> <li>LeRobot Walkthrough (Chitoku Yato)</li> <li>VLM Agent in Isaac Sim/ROS (Kabilan Kb)</li> </ul> September 3, 2024 <p>Topics Covered:</p> <ul> <li>Edge NeRF's and nerfstudio (Johnny N\u00fa\u00f1ez Cano)</li> <li>Review of OpenVLA results (Dustin Franklin)</li> <li>Oculus Interface for Jetson (Al Costa)</li> <li>TAO Toolkit 5.5</li> </ul> August 20, 2024 <p>Topics Covered:</p> <ul> <li>GenAI ROS Nodes for VLM (Khannah Shaltiel)</li> <li>Isaac Sim and Orin Nano with Hardware-in-the-Loop (Kabilan Kb)</li> <li>Miru Edge Deployment Infrastructure (Vedant Nair)</li> </ul> August 6, 2024 <p>Topics Covered:</p> <ul> <li>OpenVLA Fine-Tuning</li> <li>Gemma-2-2b (Asier Arranz)</li> <li>Ultralytics YOLOv8 (Lakshantha Dissanayake)</li> </ul> July 23, 2024 <p>Topics Covered:</p> <ul> <li>Llama-3 Function &amp; Specs</li> <li>OpenVLA with MimicGen</li> <li>Phi-3 Vision via ONNX (Jambo Chen)</li> <li>OStream GenRunner (Kerry Shih)</li> </ul> July 9, 2024 <p>Topics Covered:</p> <ul> <li>OpenVLA Quantization (openvla.github.io)</li> <li>visualnav-transformer (robodhruv/visualnav-transformer)</li> <li>Florence-2, Holoscan, Grammars (Nigel Nelson, florence-2-vision)</li> <li>LLaMa-Factory (hiyouga/LLaMA-Factory)</li> </ul> June 25, 2024 <p>Topics Covered:</p> <ul> <li>Function Calling in Agent Studio</li> <li>Jetson Copilot (Chitoku Yato)</li> <li>Jetson Platform Services (Sammy Ochoa)</li> <li>On-device Fine-tuning (Nurgaliyev Shakhizat)</li> </ul> June 11, 2024 <p>Topics Covered:</p> <ul> <li>Agent Studio</li> <li>HomeAssistant 2024.6</li> <li>AWS IoT Greengrass (Romil Shah)</li> </ul> May 29, 2024 <p>Topics Covered:</p> <ul> <li>OpenAI-style Tools with NousResearch/Hermes-2-Pro-Llama-3-8B</li> <li>Jetson Copilot with jetrag</li> <li>whisper_trt for Orin Nano</li> </ul> May 15, 2024 <p>Topics Covered:</p> <ul> <li>VILA-1.5 on Video Sequences</li> <li>Voicecraft Container (Martin Cerven)</li> <li>JetBot / Nanosaur Updates for Orin Nano (Chitoku Yato &amp; Raffaello Bonghi)</li> <li>Controller LLM &amp; Advanced Function Calling (<code>NousResearch/Hermes-2-Pro-Llama-3-8B</code>)</li> <li>RAG Samples with LlamaIndex (Chitoku Yato)</li> </ul> May 1, 2024 <p>Topics Covered:</p> <ul> <li>Function Calling with Llama-3</li> <li>Home Assistant / Wyoming (Mieszko Syty)</li> <li>Smart Sorting / Recycling (Alvaro Costa)</li> </ul> April 17, 2024 <p>Topics Covered:</p> <ul> <li>Ollama Support for Jetson Devices <li>Home Assistant Integration</li> <li><code>jetson-container</code> Updates</li> <li>Upgrading JetBot with Orin Nano</li> April 3, 2024 <p>Project Kickoffs:</p> <ul> <li>Home Assistant Integration</li> <li>Agent Controller LLM <li>ML DevOps, Containers, Core Inferencing</li>"},{"location":"research.html#team-members","title":"Team Members","text":"<p>Below are shown some of the sustaining members of the group who have been working on generative AI in edge computing:</p>              Dustin Franklin, NVIDIA                  Principal Engineer | Pittsburgh, PA                 (jetson-inference, jetson-containers)                           Nurgaliyev Shakhizat                  Institute of Smart Systems and AI | Kazakhstan                 (Assistive Devices, Vision2Audio, HPC)                           Kris Kersey, Kersey Fabrications                  Embedded Software Engineer | Atlanta, GA                 (The OASIS Project, AR/VR, 3D Fabrication)                           Johnny N\u00fa\u00f1ez Cano                  PhD Researcher in CV/AI | Barcelona, Spain                 (Recurrent LLMs, Pose &amp; Behavior Analysis)                           Doruk S\u00f6nmez, ConnectTech                  Intelligent Video Analytics Engineer | Turkey                 (NVIDIA DLI Certified Instructor, IVA, VLM)                           Akash James, Spark Cognition                  AI Architect, UC Berkeley Researcher | Oakland                 (NVIDIA AI Ambassador, Personal Assistants)                           Mieszko Syty, MS/1 Design                  AI/ML Engineer | Warsaw, Poland                 (LLM, Home Assistants, ML DevOps)                           Jim Benson, JetsonHacks                  DIY Extraordinaire | Los Angeles, CA                 (AI in Education, RACECAR/J)                           Chitoku Yato, NVIDIA                  Jetson AI DevTech | Santa Clara, CA                 (JetBot, JetRacer, MinDisk, Containers)                           Dana Sheahen, NVIDIA                  DLI Curriculum Developer | Santa Clara, CA                 (AI in Education, Jetson AI Fundamentals)                           Sammy Ochoa, NVIDIA                  Jetson AI DevTech | Austin, TX                 (Metropolis Microservices)                           John Welsh, NVIDIA                  (NanoOWL, NanoSAM, JetBot, JetRacer, torch2trt, trt_pose, Knowledge Distillation)                           Dave Niewinski Dave's Armoury | Waterloo, Ontario                 (GLaDOS, Fetch, Offroad La-Z-Boy, KUKA Bot)                           Gary Hilgemann, REBOTNIX                  CEO &amp; AI Roboticist | L\u00fcnen, Germany                 (GUSTAV, SPIKE, VisionTools, GenAI)                           Elaine Wu, Seeed Studio                  AI &amp; Robotics Partnerships | Shenzhen, China                 (reComputer, YOLOv8, LocalJARVIS, Voice Bot)                           Patty Delafuente, NVIDIA                  Data Scientist &amp; UMBC PhD Student | MD                 (AI in Education, DLI Robotics Teaching Kit)                           Song Han, MIT HAN Lab NVIDIA Research | Cambridge, MA                 (Efficient Large Models, AWQ, VILA)                           Bryan Hughes, Mimzy AI                  Founder, Entrepreneur | SF Bay Area                 (Multimodal Assistants, AI at the Edge)                           Tianqi Chen, CMU Catalyst OctoML, CTO | Seattle, WA                 (MLC, Apache TVM, XGBoost)                           Michael Gr\u00fcner, RidgeRun                  Team Lead / Architect | Costa Rica                 (Embedded Vision &amp; AI, Multimedia)                           Jesse Flot, CMU Robotics Academy                  Co-Director | Pittsburgh, PA                 (Applied AI &amp; Robotics, Autonomy Foundations)                           Paul DeCarlo, Microsoft                  Professor | University of Houston                 (Azure IoT, Video Analytics, Microsoft JARVIS)                           Mike Hansen, Nabu Casa                  Voice Engineering Lead | Urbandale, Iowa                 (Home Assistant, Piper TTS, Wyoming)              Lakshantha Dissanayake, Ultralytics                  Embedded CV Engineer | Vancouver, BC                 (YOLOv8, TensorRT, DeepStream)                           Kerry Shih, GenAI Nerds                  Founder, CEO | Los Angeles, CA                 (Gapi)                           Ziad Youssfi, CMU                  ECE Professor | Pittsburgh, PA                 (ML in Robotics &amp; Embedded Systems)                           Walter Lucetti, Stereolabs                  Robotics &amp; Vision Engineer | Paris, France                 (MyzharBot, ROS2, GStreamer)                           Raffaello Bonghi, NVIDIA                  AI &amp; Robotics Engineer | Manchester, UK                 (Nanosaur, Panther, jetson-stats)                           Alvaro Costa, ANS Group                  AI &amp; Robotics Lead | Epsom, UK                 (TeknTrash, StipraPOD)                           David Pearson, ConnectTech                  Embedded Systems Engineer | Ontario, CA                 (Edge AI Systems, Vision/Language Models)                           Jason Seawall, Numurus                  CEO | Seattle, WA                 (NEPI, Edge AI &amp; Automation)                           Martin Cerven                  AI Researcher | Germany                 (Autonomous Robotics, Voicecraft)                           Romil Shah, Amazon                  GenAI IIoT @ AWS | San Jose, CA                 (<code>aws-samples/genai-at-edge</code>)                           Kabilan Kb, Roboticist                  NVIDIA DLI Ambassador | Tamil Nadu, IN                 (ROS2 Tutorials, Autonomous Wheelchair)"},{"location":"ros.html","title":"ROS2 Nodes for Generative AI","text":"<p>The <code>ros2_nanollm</code> package provides ROS2 nodes for running optimized LLM's and VLM's locally inside a container.  These are built on NanoLLM and ROS2 Humble for deploying generative AI models onboard your robot with Jetson.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm:humble</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"ros.html#running-the-live-demo","title":"Running the Live Demo","text":"<p>Recommended</p> <p>Before you start, please review NanoVLM and Live LLaVa demos.  For primary documentation, view ROS2 NanoLLM.</p> <ol> <li> <p>Ensure you have a camera device connected</p> <pre><code>ls /dev/video*\n</code></pre> </li> <li> <p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image. </p> <pre><code>jetson-containers run $(autotag nano_llm:humble) \\\n    ros2 launch ros2_nanollm camera_input_example.launch.py\n</code></pre> <p>This command will start the launch file of the container. </p> </li> </ol> <p>By default this will load the <code>Efficient-Large-Model/Llama-3-VILA1.5-8B</code> VLM and publish the image captions and overlay to topics that can be subscribed to by your other nodes, or visualized with RViz or Foxglove.  Refer to the <code>ros2_nanollm</code> repo for documentation on the input/output topics that are exposed.</p>"},{"location":"ros.html#build-your-own-ros-nodes","title":"Build your own ROS Nodes","text":"<p>To build your own ROS2 node using LLM or VLM, first create a ROS 2 workspace and package in a directory mounted to the container (following the ROS 2 Humble Documentation).  Your src folder should then look like this: </p> <pre><code>    \u2514\u2500\u2500 src    \n        \u2514\u2500\u2500 your-package-name\n            \u251c\u2500\u2500 launch     \n                    \u2514\u2500\u2500 camera_input.launch.py\n            \u251c\u2500\u2500 resource\n                    \u2514\u2500\u2500 your-package-name\n            \u251c\u2500\u2500 your-package-name\n                    \u2514\u2500\u2500 __init__.py \n                    \u2514\u2500\u2500 your-node-name_py.py\n            \u251c\u2500\u2500 test\n                    \u2514\u2500\u2500 test_copyright.py\n                    \u2514\u2500\u2500 test_flake8.py\n                    \u2514\u2500\u2500 test_pep257.py\n            \u251c\u2500\u2500 package.xml\n            \u251c\u2500\u2500 setup.cfg\n            \u251c\u2500\u2500 setup.py\n            \u2514\u2500\u2500 README.md\n</code></pre> <p>We will create the launch folder, as well as the camera_input.launch.py and your-node-name_py.py files in later steps. </p>"},{"location":"ros.html#editing-the-setup","title":"Editing the Setup","text":"<p>Let\u2019s begin by editing the <code>setup.py</code> file. At the top of the file, add </p> <pre><code>from glob import glob \n</code></pre> <p>In the setup method, find the <code>data_files=[]</code> line, and make sure it looks like this: </p> <pre><code>data_files=[\n       ('share/ament_index/resource_index/packages',\n           ['resource/' + package_name]),\n       ('share/' + package_name, ['package.xml']),\n   ('share/' + package_name, glob('launch/*.launch.py')),\n   ],\n</code></pre> <p>Edit the maintainer line with your name. Edit the maintainer email to your email. Edit the description line to describe your package. </p> <pre><code>maintainer='kshaltiel', \nmaintainter_email='kshaltiel@nvidia.com', \ndescription='YOUR DESCRIPTION',  \n</code></pre> <p>Find the <code>console_scripts</code> line in the entry_points method. Edit the inside to be: </p> <pre><code>'your-node-name_py = your-package-name.your-node-name_py:main'\n</code></pre> <p>For example:  <pre><code>entry_points={\n       'console_scripts': [\n       'nano_llm_py = ros2_nanollm.nano_llm_py:main'\n       ],\n   },\n</code></pre> All done for this file!</p>"},{"location":"ros.html#creating-the-node","title":"Creating the Node","text":"<p>Inside your package, under the folder that shares your package's name and contains the <code>__init__.py</code> file, create a file named after your node. For NanoLLM, this file would be called <code>nano_llm_py.py</code>. </p> <p>Paste the following code into the empty file: </p> <pre><code>import rclpy \nfrom std_msgs.msg import String\nfrom sensor_msgs.msg import Image\nfrom cv_bridge import CvBridge\nfrom PIL import Image as im\nfrom MODEL_NAME import NECESSARY_MODULES\n\nclass Your_Model_Subscriber(Node):\n\n    def __init__(self):\n        super().__init__('your_model_subscriber')\n\n        #EDIT PARAMETERS HERE \n        self.declare_parameter('param1', \"param1_value\") \n        self.declare_parameter('param2', \"param2_value\")\n\n        # Subscriber for input query\n        self.query_subscription = self.create_subscription(\n            String,\n            'input_query',\n            self.query_listener_callback,\n            10)\n        self.query_subscription  # prevent unused variable warning\n\n        # Subscriber for input image\n        self.image_subscription = self.create_subscription(\n            Image,\n            'input_image',\n            self.image_listener_callback,\n            10)\n        self.image_subscription  # prevent unused variable warning\n\n        # To convert ROS image message to OpenCV image\n        self.cv_br = CvBridge() \n\n        #LOAD THE MODEL\n        self.model = INSERT_MODEL.from_pretrained(\"PATH-TO-MODEL\")\n\n        #chatHistory var \n        self.chat_history = ChatHistory(self.model)\n\n        ##  PUBLISHER\n        self.output_publisher = self.create_publisher(String, 'output', 10)\n        self.query = \"Describe the image.\"\n\n    def query_listener_callback(self, msg):\n        self.query = msg.data\n\n    def image_listener_callback(self, data): \n        input_query = self.query\n\n        # call model with input_query and input_image \n        cv_img = self.cv_br.imgmsg_to_cv2(data, 'rgb8')\n        PIL_img = im.fromarray(cv_img)\n\n        # Parsing input text prompt\n        prompt = input_query.strip(\"][()\")\n        text = prompt.split(',')\n        self.get_logger().info('Your query: %s' % text) #prints the query\n\n        #chat history \n        self.chat_history.append('user', image=PIL_img)\n        self.chat_history.append('user', prompt, use_cache=True)\n        embedding, _ = self.chat_history.embed_chat()\n\n    #GENERATE OUTPUT\n        output = self.model.generate(\n            inputs=embedding,\n            kv_cache=self.chat_history.kv_cache,\n            min_new_tokens = 10,\n            streaming = False, \n            do_sample = True,\n        )\n\n        output_msg = String()\n        output_msg.data = output\n        self.output_publisher.publish(output_msg)\n        self.get_logger().info(f\"Published output: {output}\")\n\ndef main(args=None):\n    rclpy.init(args=args)\n\n    your_model_subscriber = Your_Model_Subscriber()\n\n    rclpy.spin(your_model_subscriber)\n\n    # Destroy the node explicitly\n    # (optional - otherwise it will be done automatically\n    # when the garbage collector destroys the node object)\n    nano_llm_subscriber.destroy_node()\n    rclpy.shutdown()\n\nif __name__ == '__main__':\n    main()\n</code></pre> <p>Edit the import statement at the top of the file to import the necessary modules from the model. </p> <p>Next, edit the class name and name inside the <code>__init__()</code> function to reflect the model that will be used. </p> <p>Find the comment that reads <code>#EDIT PARAMETERS HERE</code>. Declare all parameters except for the model name following the format in the file. Under the <code>#LOAD THE MODEL</code> comment, include the path to the model. </p> <p>Lastly, edit the generate method under the <code>GENERATE OUTPUT</code> comment to include any additional parameters. </p> <p>All done for this file!</p>"},{"location":"ros.html#creating-the-launch-file","title":"Creating the Launch File","text":"<p>Inside your package, create the launch folder. Create your launch file inside of it. </p> <pre><code>mkdir launch\ncd launch \ntouch camera_input.launch.py\n</code></pre> <p>You can edit this file externally, and it will update within the container. Paste the following code into the empty file. </p> <pre><code>from launch import LaunchDescription\nfrom launch_ros.actions import Node\nfrom launch.substitutions import LaunchConfiguration\nfrom launch.actions import DeclareLaunchArgument\n\ndef generate_launch_description():\n    launch_args = [\n        DeclareLaunchArgument( \n            'param1',\n            default_value='param1_default',\n            description='Description of param1'),\n        DeclareLaunchArgument(\n            'param2',\n            default_value='param2_default',\n            description='Description of param2'),\n    ]\n\n\n    #Your model parameters \n    param1 = LaunchConfiguration('param1')\n    param2 = LaunchConfiguration('param2')\n\n    #camera node for camera input\n    cam2image_node = Node(\n            package='image_tools',\n            executable='cam2image',\n            remappings=[('image', 'input_image')],\n    )\n\n    #model node\n    model_node = Node(\n            package='your-package-name', #make sure your package is named this\n            executable='your-node-name_py', \n            parameters=[{\n                'param1': param1, \n                'param2': param2,\n            }]\n    )\n\n    final_launch_description = launch_args + [cam2image_node] + [model_node]\n\n    return LaunchDescription(final_launch_description)\n</code></pre> <p>Find the required parameters for your model. You can view this by looking at the Model API for your specific model and taking note to how the model is called. For example, NanoLLM retrieves models through the following: </p> <pre><code>model = NanoLLM.from_pretrained(\n   \"meta-llama/Llama-3-8b-hf\",  # HuggingFace repo/model name, or path to HF model checkpoint\n   api='mlc',                   # supported APIs are: mlc, awq, hf\n   quantization='q4f16_ft'      # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n</code></pre> <p>The parameters for NanoLLM would be the model name, api, and quantization. </p> <p>In the <code>generate_launch_description</code> function, edit the <code>DeclareLaunchArgument</code> to accomodate for all parameters except the model name. For NanoLLM, this would look like: </p> <pre><code>def generate_launch_description():\n    launch_args = [\n        DeclareLaunchArgument( \n            'api',\n            default_value='mlc',\n            description='The model backend to use'),\n        DeclareLaunchArgument(\n            'quantization',\n            default_value='q4f16_ft',\n            description='The quantization method to use'),\n    ]\n</code></pre> <p>Then edit the lines under <code>#Your model Parameters</code> to match the parameters of your model, again excluding the model name. Lastly, fill in the code under the <code>#model node</code> comment with your package name, the name of your node file, and all of your parameters, this time including the model. </p> <p>All done for this file!</p>"},{"location":"tensorrt_llm.html","title":"TensorRT-LLM for Jetson","text":"<p>TensorRT-LLM is a high-performance LLM inference library with advanced quantization, attention kernels, and paged KV caching.  Initial support for building TensorRT-LLM from source for JetPack 6.1 has been included in the <code>v0.12.0-jetson</code> branch of the TensorRT-LLM repo for Jetson AGX Orin.</p> <p></p> <p>We've provided pre-compiled TensorRT-LLM wheels and containers along with this guide for <code>TensorRT-LLM Deployment on Jetson Orin</code></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin *Support for other Orin devices is currently undergoing testing.</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6.1 (L4T r36.4)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>18.5GB</code> for <code>tensorrt_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tensorrt_llm.html#building-tensorrt-llm-engine-for-llama","title":"Building TensorRT-LLM Engine for Llama","text":"<p>You can find the steps for converting Llama to TensorRT-LLM under <code>examples/llama</code> in the repo, and also in the documentation.  This script will automate the process for Llama-7B with INT4 quantization applied, and run some generation and performance checks on the model:</p> <pre><code>jetson-containers run \\\n  -e HUGGINGFACE_TOKEN=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 \\\n  -e FORCE_BUILD=on \\\n  dustynv/tensorrt_llm:0.12-r36.4.0 \\\n    /opt/TensorRT-LLM/llama.sh\n</code></pre> <p>There are many such conversion procedures outlined in the TensorRT-LLM examples for different model architectures.  </p>"},{"location":"tensorrt_llm.html#openai-api-endpoint","title":"OpenAI API Endpoint","text":"<p>TensorRT-LLM has programming APIs for Python and C++ available, but it also includes an example server endpoint for the OpenAI protocol that makes it easy to substitute for other local or cloud model backends.  </p> <p>This will start the TensorRT-LLM container with the server and model that you built above:</p> <pre><code>jetson-containers run \\\n  dustynv/tensorrt_llm:0.12-r36.4.0 \\\n  python3 /opt/TensorRT-LLM/examples/apps/openai_server.py \\\n    /data/models/tensorrt_llm/Llama-2-7b-chat-hf-gptq\n</code></pre> <p>Then you can make chat completion requests against it from practically any language or from any connected device.  This example shows a simple way of testing it initially from another terminal with curl:</p> <pre><code>curl http://localhost:8000/v1/completions \\\n    -H \"Content-Type: application/json\" \\\n    -d '{\n        \"model\": &lt;model_name&gt;,\n        \"prompt\": \"Where is New York?\",\n        \"max_tokens\": 16,\n        \"temperature\": 0\n    }'\n</code></pre> <p>Or the code included with openai_client.py will handle these requests using the standard <code>openai-python</code> package that can be installed outside of the container or on another machine.</p> <pre><code>jetson-containers run \\\n  --workdir /opt/TensorRT-LLM/examples/apps \\\n  dustynv/tensorrt_llm:0.12-r36.4.0 \\\n    python3 openai_client.py --prompt \"Where is New York?\" --api chat\n</code></pre> <p>The patches in the branch above for TensorRT-LLM 0.12 are a preview release for Jetson AGX Orin, and we'll continue with validating and testing the various settings in TensorRT-LLM.  If you need any support, please post to the Jetson Developer Forums..</p>"},{"location":"tips_ram-optimization.html","title":"RAM Optimization","text":"<p>Running a LLM requires a huge RAM space.</p> <p>Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models. </p> <p>Here we share a couple of ways to optimize the system RAM usage. </p>"},{"location":"tips_ram-optimization.html#disabling-the-desktop-gui","title":"Disabling the Desktop GUI","text":"<p>If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).</p> <p>You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:</p> <pre><code>$ sudo init 3     # stop the desktop\n# log your user back into the console (Ctrl+Alt+F1, F2, ect)\n$ sudo init 5     # restart the desktop\n</code></pre> <p>If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:</p> <ul> <li> <p>To disable desktop on boot</p> <pre><code>sudo systemctl set-default multi-user.target\n</code></pre> </li> <li> <p>To enable desktop on boot</p> <pre><code>sudo systemctl set-default graphical.target\n</code></pre> </li> </ul>"},{"location":"tips_ram-optimization.html#disabling-misc-services","title":"Disabling misc services","text":"<pre><code>sudo systemctl disable nvargus-daemon.service\n</code></pre>"},{"location":"tips_ram-optimization.html#mounting-swap","title":"Mounting Swap","text":"<p>If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:</p> <p>If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.</p> <pre><code>sudo systemctl disable nvzramconfig\nsudo fallocate -l 16G /ssd/16GB.swap\nsudo mkswap /ssd/16GB.swap\nsudo swapon /ssd/16GB.swap\n</code></pre> <p>Then add the following line to the end of /etc/fstab to make the change persistent:</p> <pre><code>/ssd/16GB.swap  none  swap  sw 0  0\n</code></pre>"},{"location":"tips_ssd-docker.html","title":"Tips - SSD + Docker","text":"<p>Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using <code>jetson-containers</code>, you want to make sure you have a huge storage space for all the containers and the models you will download.  </p> <p>We are going to show how you can install SSD on your Jetson, and set it up for Docker.</p>"},{"location":"tips_ssd-docker.html#ssd","title":"SSD","text":""},{"location":"tips_ssd-docker.html#physical-installation","title":"Physical installation","text":"<ol> <li>Unplug power and any peripherals from the Jetson developer kit.</li> <li>Physically install an NVMe SSD card on the carrier board of your Jetson developer kit, making sure to properly seat the connector and secure with the screw.</li> <li>Reconnect any peripherals, and then reconnect the power supply to turn on the Jetson developer kit.</li> <li> <p>Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:</p> <pre><code>lspci\n</code></pre> <p>The output should look like the following:</p> <pre><code>0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#format-and-set-up-auto-mount","title":"Format and set up auto-mount","text":"<ol> <li> <p>Run <code>lsblk</code> to find the device name.</p> <pre><code>lsblk\n</code></pre> <p>The output should look like the following:</p> <pre><code>NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT\nloop0          7:0    0    16M  1 loop \nmmcblk1      179:0    0  59.5G  0 disk \n\u251c\u2500mmcblk1p1  179:1    0    58G  0 part /\n\u251c\u2500mmcblk1p2  179:2    0   128M  0 part \n\u251c\u2500mmcblk1p3  179:3    0   768K  0 part \n\u251c\u2500mmcblk1p4  179:4    0  31.6M  0 part \n\u251c\u2500mmcblk1p5  179:5    0   128M  0 part \n\u251c\u2500mmcblk1p6  179:6    0   768K  0 part \n\u251c\u2500mmcblk1p7  179:7    0  31.6M  0 part \n\u251c\u2500mmcblk1p8  179:8    0    80M  0 part \n\u251c\u2500mmcblk1p9  179:9    0   512K  0 part \n\u251c\u2500mmcblk1p10 179:10   0    64M  0 part \n\u251c\u2500mmcblk1p11 179:11   0    80M  0 part \n\u251c\u2500mmcblk1p12 179:12   0   512K  0 part \n\u251c\u2500mmcblk1p13 179:13   0    64M  0 part \n\u2514\u2500mmcblk1p14 179:14   0 879.5M  0 part \nzram0        251:0    0   1.8G  0 disk [SWAP]\nzram1        251:1    0   1.8G  0 disk [SWAP]\nzram2        251:2    0   1.8G  0 disk [SWAP]\nzram3        251:3    0   1.8G  0 disk [SWAP]\nnvme0n1      259:0    0 238.5G  0 disk \n</code></pre> <p>Identify the device corresponding to your SSD. In this case, it is <code>nvme0n1</code>.</p> </li> <li> <p>Format the SSD, create a mount point, and mount it to the filesystem.</p> <pre><code>sudo mkfs.ext4 /dev/nvme0n1\n</code></pre> <p>You can choose any name for the mount point directory. We use <code>/ssd</code> here, but in <code>jetson-containers</code>' setup.md documentation, <code>/mnt</code> is used.  </p> <pre><code>sudo mkdir /ssd\n</code></pre> <pre><code>sudo mount /dev/nvme0n1 /ssd\n</code></pre> </li> <li> <p>In order to ensure that the mount persists after boot, add an entry to the <code>fstab</code> file:</p> <p>First, identify the UUID for your SSD:</p> <pre><code>lsblk -f\n</code></pre> <p>Then, add a new entry to the <code>fstab</code> file:</p> <pre><code>sudo vi /etc/fstab\n</code></pre> <p>Insert the following line, replacing the UUID with the value found from <code>lsblk -f</code>:</p> <pre><code>UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2\n</code></pre> </li> <li> <p>Finally, change the ownership of the <code>/ssd</code> directory.</p> <pre><code>sudo chown ${USER}:${USER} /ssd\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#docker","title":"Docker","text":"<ol> <li> <p>Install <code>nvidia-container</code> package.</p> <p>Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including <code>nvidia-containers</code>) and Docker are already pre-installed, so this step can be skipped.</p> <pre><code>sudo apt update\nsudo apt install -y nvidia-container\n</code></pre> <p>JetPack 6.0 DP users</p> <p>If you flash  Jetson Linux (L4T) R36.2 (JetPack 6.0 DP) on your Jetson using SDK Manager, and install <code>nvidia-container</code> using <code>apt</code>, on JetPack 6.0 it no longer automatically installs Docker.</p> <p>Therefore, you need to run the following to manually install Docker and set it up.</p> <pre><code>sudo apt update\nsudo apt install -y nvidia-container curl\ncurl https://get.docker.com | sh &amp;&amp; sudo systemctl --now enable docker\nsudo nvidia-ctk runtime configure --runtime=docker\n</code></pre> </li> <li> <p>Restart the Docker service and add your user to the <code>docker</code> group, so that you don't need to use the command with <code>sudo</code>.</p> <pre><code>sudo systemctl restart docker\nsudo usermod -aG docker $USER\nnewgrp docker\n</code></pre> </li> <li> <p>Add default runtime in <code>/etc/docker/daemon.json</code></p> <pre><code>sudo vi /etc/docker/daemon.json\n</code></pre> <p>Insert the <code>\"default-runtime\": \"nvidia\"</code> line as following:</p> <pre><code>{\n    \"runtimes\": {\n        \"nvidia\": {\n            \"path\": \"nvidia-container-runtime\",\n            \"runtimeArgs\": []\n        }\n    },\n    \"default-runtime\": \"nvidia\"\n}\n</code></pre> </li> <li> <p>Restart Docker</p> <pre><code>sudo systemctl daemon-reload &amp;&amp; sudo systemctl restart docker\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#migrate-docker-directory-to-ssd","title":"Migrate Docker directory to SSD","text":"<p>Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.</p> <ol> <li> <p>Stop the Docker service.</p> <pre><code>sudo systemctl stop docker\n</code></pre> </li> <li> <p>Move the existing Docker folder</p> <pre><code>sudo du -csh /var/lib/docker/ &amp;&amp; \\\n    sudo mkdir /ssd/docker &amp;&amp; \\\n    sudo rsync -axPS /var/lib/docker/ /ssd/docker/ &amp;&amp; \\\n    sudo du -csh  /ssd/docker/ \n</code></pre> </li> <li> <p>Edit <code>/etc/docker/daemon.json</code></p> <pre><code>sudo vi /etc/docker/daemon.json\n</code></pre> <p>Insert <code>\"data-root\"</code> line like the following.</p> <pre><code>{\n    \"runtimes\": {\n        \"nvidia\": {\n            \"path\": \"nvidia-container-runtime\",\n            \"runtimeArgs\": []\n        }\n    },\n    \"default-runtime\": \"nvidia\",\n    \"data-root\": \"/ssd/docker\"\n}\n</code></pre> </li> <li> <p>Rename the old Docker data directory</p> <pre><code>sudo mv /var/lib/docker /var/lib/docker.old\n</code></pre> </li> <li> <p>Restart the docker daemon</p> <pre><code>sudo systemctl daemon-reload &amp;&amp; \\\n    sudo systemctl restart docker &amp;&amp; \\\n    sudo journalctl -u docker\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#test-docker-on-ssd","title":"Test Docker on SSD","text":"<ol> <li> <p>[Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.</p> <pre><code>watch -n1 df \n</code></pre> </li> <li> <p>[Terminal 2] Next, open a new terminal and start Docker pull.</p> <pre><code>docker pull nvcr.io/nvidia/l4t-base:r35.2.1\n</code></pre> </li> <li> <p>[Terminal 1] Observe that the disk usage on <code>/ssd</code> goes up as the container image is downloaded and extracted.</p> <pre><code>~$ docker image ls\nREPOSITORY                  TAG       IMAGE ID       CREATED        SIZE\nnvcr.io/nvidia/l4t-base     r35.2.1   dc07eb476a1d   7 months ago   713MB\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#final-verification","title":"Final Verification","text":"<p>Reboot your Jetson, and verify that you observe the following:</p> <pre><code>~$ sudo blkid | grep nvme\n/dev/nvme0n1: UUID=\"9fc06de1-7cf3-43e2-928a-53a9c03fc5d8\" TYPE=\"ext4\"\n\n~$ df -h\nFilesystem      Size  Used Avail Use% Mounted on\n/dev/mmcblk1p1  116G   18G   94G  16% /\nnone            3.5G     0  3.5G   0% /dev\ntmpfs           3.6G  108K  3.6G   1% /dev/shm\ntmpfs           734M   35M  699M   5% /run\ntmpfs           5.0M  4.0K  5.0M   1% /run/lock\ntmpfs           3.6G     0  3.6G   0% /sys/fs/cgroup\ntmpfs           734M   88K  734M   1% /run/user/1000\n/dev/nvme0n1    458G  824M  434G   1% /ssd\n\n~$ docker info | grep Root\n Docker Root Dir: /ssd/docker\n\n~$ sudo ls -l /ssd/docker/\ntotal 44\ndrwx--x--x  4 root root 4096 Mar 22 11:44 buildkit\ndrwx--x---  2 root root 4096 Mar 22 11:44 containers\ndrwx------  3 root root 4096 Mar 22 11:44 image\ndrwxr-x---  3 root root 4096 Mar 22 11:44 network\ndrwx--x--- 13 root root 4096 Mar 22 16:20 overlay2\ndrwx------  4 root root 4096 Mar 22 11:44 plugins\ndrwx------  2 root root 4096 Mar 22 16:19 runtimes\ndrwx------  2 root root 4096 Mar 22 11:44 swarm\ndrwx------  2 root root 4096 Mar 22 16:20 tmp\ndrwx------  2 root root 4096 Mar 22 11:44 trust\ndrwx-----x  2 root root 4096 Mar 22 16:19 volumes\n\n~$ sudo du -chs /ssd/docker/\n752M    /ssd/docker/\n752M    total\n\n~$ docker info | grep -e \"Runtime\" -e \"Root\"\n Runtimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2\n Default Runtime: nvidia\n Docker Root Dir: /ssd/docker\n</code></pre> <p>Your Jetson is now set up with the SSD!</p>"},{"location":"try.html","title":"Try","text":"<p>Jump to NVIDIA Jetson Store.</p>"},{"location":"tutorial-intro.html","title":"Tutorial - Introduction","text":""},{"location":"tutorial-intro.html#overview","title":"Overview","text":"<p>Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.</p>"},{"location":"tutorial-intro.html#text-llm","title":"Text (LLM)","text":"text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui Ollama Get started effortlessly deploying GGUF models for chat and web UI llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! NanoLLM Optimized inferencing library for LLMs, multimodal agents, and speech. Small LLM (SLM) Deploy Small Language Models (SLM) with reduced memory usage and higher throughput. API Examples Learn how to write Python code for doing LLM inference using popular APIs."},{"location":"tutorial-intro.html#text-vision-vlm","title":"Text + Vision (VLM)","text":"<p>Give your locally running LLM an access to vision!</p> LLaVA Different ways to run LLaVa vision/language model on Jetson for visual understanding. Live LLaVA Run multimodal models interactively on live video streams over a repeating set of prompts. NanoVLM Use mini vision/language models and the optimized multimodal pipeline for live streaming. Llama 3.2 Vision Run Meta's multimodal Llama-3.2-11B-Vision model on Orin with HuggingFace Transformers."},{"location":"tutorial-intro.html#vision-transformers","title":"Vision Transformers","text":"EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson SAM Meta's SAM, Segment Anything model TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation"},{"location":"tutorial-intro.html#image-generation","title":"Image Generation","text":"Flux + ComfyUI Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. Stable Diffusion Run AUTOMATIC1111's <code>stable-diffusion-webui</code> to generate images from prompts SDXL Ensemble pipeline consisting of a base model and refiner with enhanced image generation. nerfstudio Experience neural reconstruction and rendering with nerfstudio and onboard training."},{"location":"tutorial-intro.html#audio","title":"Audio","text":"Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR) AudioCraft Meta's AudioCraft, to produce high-quality audio and music Voicecraft Interactive speech editing and zero shot TTS"},{"location":"tutorial-intro.html#rag-vector-database","title":"RAG &amp; Vector Database","text":"NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data LlamaIndex Realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents LlamaIndex Reference application for building your own local AI assistants using LLM, RAG, and VectorDB"},{"location":"tutorial-intro.html#api-integrations","title":"API Integrations","text":"ROS2 Nodes Optimized LLM and VLM provided as ROS2 nodes for robotics Holoscan SDK Use the Holoscan-SDK to run high-throughput, low-latency edge AI pipelines Jetson Platform Services Quickly build microservice driven vision applications with Jetson Platform Services Gapi Workflows Integrating generative AI into real world environments Gapi Micro Services Wrapping models and code to participate in systems Ultralytics YOLOv8 Run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT."},{"location":"tutorial-intro.html#about-nvidia-jetson","title":"About NVIDIA Jetson","text":"<p>Note</p> <p>We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.</p> Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores RAM(CPU+GPU) 64GB 32GB 8GB Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)"},{"location":"tutorial_api-examples.html","title":"Tutorial - API Examples","text":"<p>It's good to know the code for generating text with LLM inference, and ancillary things like tokenization, chat templates, and prompting.  On this page we give Python examples of running various LLM APIs, and their benchmarks.</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35) JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>l4t-text-generation</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_api-examples.html#transformers","title":"Transformers","text":"<p>The HuggingFace Transformers API is the de-facto API that models are released for, often serving as the reference implementation.  It's not terribly fast, but it does have broad model support, and also supports quantization (AutoGPTQ, AWQ).  This uses streaming:</p> <pre><code>from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer\nfrom threading import Thread\n\nmodel_name='meta-llama/Llama-2-7b-chat-hf'\nmodel = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nstreamer = TextIteratorStreamer(tokenizer)\n\nprompt = [{'role': 'user', 'content': 'Can I get a recipe for French Onion soup?'}]\ninputs = tokenizer.apply_chat_template(\n    prompt,\n    add_generation_prompt=True,\n    return_tensors='pt'\n).to(model.device)\n\nThread(target=lambda: model.generate(inputs, max_new_tokens=256, streamer=streamer)).start()\n\nfor text in streamer:\n    print(text, end='', flush=True)\n</code></pre> <p>To run this (it can be found here), you can mount a directory containing the script or your jetson-containers directory:</p> <pre><code>jetson-containers run --volume $PWD/packages/llm:/mount --workdir /mount \\\n  $(autotag l4t-text-generation) \\\n    python3 transformers/test.py\n</code></pre> <p>We use the <code>l4t-text-generation</code> container because it includes the quantization libraries in addition to Transformers, for running the quanztized versions of the models like <code>TheBloke/Llama-2-7B-Chat-GPTQ</code></p>"},{"location":"tutorial_api-examples.html#benchmarks","title":"Benchmarks","text":"<p>The <code>huggingface-benchmark.py</code> script will benchmark the models:</p> <pre><code>./run.sh --volume $PWD/packages/llm/transformers:/mount --workdir /mount \\\n  $(./autotag l4t-text-generation) \\\n    python3 huggingface-benchmark.py --model meta-llama/Llama-2-7b-chat-hf\n</code></pre> <pre><code>* meta-llama/Llama-2-7b-chat-hf  AVG = 20.7077 seconds,  6.2 tokens/sec  memory=10173.45 MB\n* TheBloke/Llama-2-7B-Chat-GPTQ  AVG = 12.3922 seconds, 10.3 tokens/sec  memory=7023.36 MB\n* TheBloke/Llama-2-7B-Chat-AWQ   AVG = 11.4667 seconds, 11.2 tokens/sec  memory=4662.34 MB\n</code></pre>"},{"location":"tutorial_api-examples.html#nanollm","title":"NanoLLM","text":"<p>The <code>NanoLLM</code> library uses the optimized MLC/TVM library for inference, like on the Benchmarks page:</p> <p></p> &gt; NanoLLM Reference Documentation<pre><code>from nano_llm import NanoLLM, ChatHistory, ChatTemplates\n\n# load model\nmodel = NanoLLM.from_pretrained(\n    model='meta-llama/Meta-Llama-3-8B-Instruct', \n    quantization='q4f16_ft', \n    api='mlc'\n)\n\n# create the chat history\nchat_history = ChatHistory(model, system_prompt=\"You are a helpful and friendly AI assistant.\")\n\nwhile True:\n    # enter the user query from terminal\n    print('&gt;&gt; ', end='', flush=True)\n    prompt = input().strip()\n\n    # add user prompt and generate chat tokens/embeddings\n    chat_history.append(role='user', msg=prompt)\n    embedding, position = chat_history.embed_chat()\n\n    # generate bot reply\n    reply = model.generate(\n        embedding, \n        streaming=True, \n        kv_cache=chat_history.kv_cache,\n        stop_tokens=chat_history.template.stop,\n        max_new_tokens=256,\n    )\n\n    # append the output stream to the chat history\n    bot_reply = chat_history.append(role='bot', text='')\n\n    for token in reply:\n        bot_reply.text += token\n        print(token, end='', flush=True)\n\n    print('\\n')\n\n    # save the inter-request KV cache \n    chat_history.kv_cache = reply.kv_cache\n</code></pre> <p>This example keeps an interactive chat running with text being entered from the terminal.  You can start it like this:</p> <pre><code>jetson-containers run \\\n  --env HUGGINGFACE_TOKEN=hf_abc123def \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.chat.example\n</code></pre> <p>Or for easy editing from the host device, copy the source into your own script and mount it into the container with the <code>--volume</code> flag.  And for authenticated models, request access through HuggingFace (like with Llama) and substitute your account's API token above.</p>"},{"location":"tutorial_audiocraft.html","title":"Tutorial - AudioCraft","text":"<p>Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>10.7 GB</code> for <code>audiocraft</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_audiocraft.html#how-to-start","title":"How to start","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag audiocraft)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p>"},{"location":"tutorial_audiocraft.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"<p>AudioCraft repo comes with demo Jupyter notebooks.</p> <p>On Jupyter Lab navigation pane on the left, double-click <code>demos</code> folder. </p> <p></p>"},{"location":"tutorial_audiocraft.html#audiogen-demo","title":"AudioGen demo","text":"<p>Run cells with <code>Shift + Enter</code>, first one will download models, which can take some time.</p> <p>Info</p> <p>You may encounter an error message like the following when executing the first cell, but you can keep going. <pre><code>A matching Triton is not available, some optimizations will not be enabled.\nError caught was: No module named 'triton'\n</code></pre></p> <p>In the Audio Continuation cells, you can generate continuation based on text, while in Text-conditional Generation you can generate audio based just on text descriptions.</p> <p>You can also use your own audio as prompt, and use text descriptions to generate continuation: <pre><code>prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\") # you can upload your own audio\nprompt_duration = 2\nprompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\noutput = model.generate_continuation(prompt_waveform.expand(3, -1, -1), prompt_sample_rate=prompt_sr,descriptions=[\n        'Subway train blowing its horn',   # text descriptions for continuation\n        'Horse neighing furiously',\n        'Cat hissing'\n], progress=True)\ndisplay_audio(output, sample_rate=16000)\n</code></pre></p>"},{"location":"tutorial_audiocraft.html#musicgen-and-magnet-demos","title":"MusicGen and MAGNeT demos","text":"<p>The two other jupyter notebooks are similar to AuidioGen, where you can generate continuation or generate audio, while using models trained to generate music.</p>"},{"location":"tutorial_comfyui_flux.html","title":"ComfyUI and Flux on Jetson Orin","text":"<p>Hey there, fellow developer! \ud83d\udc4b I'm excited to share with you our latest project: Flux, an open-source model for image generation. Here at NVIDIA, we're pushing the boundaries to make Flux work seamlessly across all platforms, including our Jetson Orin devices. While we're still fine-tuning the model for the Jetson Orin Nano, we've already got it running smoothly on the Jetson AGX Orin.</p> <p>In this tutorial, I'm going to walk you through every step needed to get Flux up and running on your Jetson Orin, even if you've just flashed your system. Follow along, and you should have no trouble getting everything set up. And hey, if something doesn't work out, reach out to me\u2014I\u2019ll keep this guide updated to make sure it's always on point.</p> <p></p> <p>So, let's dive in and get Flux running on your Jetson!</p>"},{"location":"tutorial_comfyui_flux.html#1-install-miniconda-and-create-a-python-310-environment","title":"1. Install Miniconda and Create a Python 3.10 Environment","text":"<p>First things first, you'll need to install Miniconda on your Jetson Orin and create a Python 3.10 environment called <code>comfyui</code>. This will ensure all dependencies are handled properly within an isolated environment.</p> <pre><code>wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh\nchmod +x Miniconda3-latest-Linux-aarch64.sh\n./Miniconda3-latest-Linux-aarch64.sh\n\nconda update conda\n\nconda create -n comfyui python=3.10\nconda activate comfyui\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#2-install-cuda-cudnn-and-tensorrt","title":"2. Install CUDA, cuDNN, and TensorRT","text":"<p>Once your environment is set up, install CUDA 12.4 along with the necessary cuDNN and TensorRT libraries to ensure compatibility and optimal performance on your Jetson Orin.</p> <pre><code>wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda-toolkit-12-4 cuda-compat-12-4\nsudo apt-get install cudnn python3-libnvinfer python3-libnvinfer-dev tensorrt\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#3-verify-and-configure-cuda","title":"3. Verify and Configure CUDA","text":"<p>After installing CUDA, you'll want to verify that the correct version (12.4) is being used and make this change permanent in your environment.</p> <pre><code>ls -l /usr/local | grep cuda\nsudo ln -s /usr/local/cuda-12.4 /usr/local/cuda\n\nexport PATH=/usr/local/cuda/bin:$PATH\nnvcc --version\n\necho 'export PATH=/usr/local/cuda/bin:$PATH' &gt;&gt; ~/.bashrc\necho 'export CUDA_HOME=/usr/local/cuda' &gt;&gt; ~/.bashrc\necho 'export CUDA_PATH=/usr/local/cuda' &gt;&gt; ~/.bashrc\nsource ~/.bashrc\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#4-compile-and-install-bitsandbytes-with-cuda-support","title":"4. Compile and Install <code>bitsandbytes</code> with CUDA Support","text":"<p>Now it\u2019s time to compile and install <code>bitsandbytes</code> with CUDA support. This involves cloning the repository, configuring the build with CMake, compiling using all available cores, and installing the resulting package.</p> <pre><code>export BNB_CUDA_VERSION=124\nexport LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH\n\ngit clone https://github.com/timdettmers/bitsandbytes.git\ncd bitsandbytes\n\nmkdir -p build\ncd build\ncmake .. -DCOMPUTE_BACKEND=cuda -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4\nmake -j$(nproc)\n\ncd ..\npython setup.py install\n</code></pre> <p>Verify the installation by importing the package in Python:</p> <pre><code>python\n&gt;&gt;&gt; import bitsandbytes as bnb\n&gt;&gt;&gt; print(bnb.__version__)\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#5-install-pytorch-torchvision-and-torchaudio","title":"5. Install PyTorch, TorchVision, and TorchAudio","text":"<p>Next up, install the essential libraries <code>PyTorch</code>, <code>torchvision</code>, and <code>torchaudio</code> for Jetson Orin. You can always check for the latest links here.</p> <pre><code>pip install http://jetson.webredirect.org/jp6/cu124/+f/5fe/ee5f5d1a75229/torch-2.3.0-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/988/cb71323efff87/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/0aa/a066463c02b4a/torchaudio-2.3.0+952ea74-cp310-cp310-linux_aarch64.whl\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#6-clone-the-comfyui-repository","title":"6. Clone the ComfyUI Repository","text":"<p>Clone the ComfyUI repository from GitHub to get the necessary source code.</p> <pre><code>git clone https://github.com/comfyanonymous/ComfyUI.git\ncd ComfyUI\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#7-update-dependencies","title":"7. Update Dependencies","text":"<p>Make sure all the necessary dependencies are installed by running the <code>requirements.txt</code> file.</p> <pre><code>pip install -r requirements.txt\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#8-resolve-issues-with-numpy","title":"8. Resolve Issues with NumPy","text":"<p>If you encounter issues with NumPy, downgrade to a version below 2.0 to avoid compatibility problems.</p> <pre><code>pip install \"numpy&lt;2\"\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#9-run-comfyui","title":"9. Run ComfyUI","text":"<p>Finally, run ComfyUI to ensure everything is set up correctly.</p> <pre><code>python main.py\n</code></pre> <p>Great! Now that you\u2019ve got ComfyUI up and running, let's load the workflow to start using the Flux model. </p> <ul> <li>Download the workflow file using this link. And load it from the ComfyUI interface.</li> <li>You\u2019ll need to download the Flux Schnell model <code>flux1-schnell.safetensors</code> and vae <code>ae.safetensors</code> from Hugging Face and place the model in the <code>models/unet</code> folder and vae in <code>models/vae</code> within ComfyUI.</li> <li>Download <code>clip_l.safetensors</code> and <code>t5xxl_fp8_e4m3fn.safetensors</code> from Stability's Hugging Face and place them inside <code>models/clip</code> folder.</li> </ul> <p>Alright, you're all set to launch your first run! Head over to the URL provided by ComfyUI (127.0.0.1:8188) on your Jetson AGX Orin, and hit that Queue Prompt button. The first time might take a little longer as the model loads, but after that, each generation should take around 21 seconds. Plus, you can queue up multiple prompts and let it generate images for hours!!</p> <p>Happy generating! \ud83c\udf89</p> <p>ASIER \ud83d\ude80</p> <p>Some examples: </p> <p> </p>"},{"location":"tutorial_distillation.html","title":"CLIP model distillation","text":"<p>See \"Jetson Introduction to Knowledge Distillation\" repo's README.md.</p> <p>https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation</p>"},{"location":"tutorial_gapi_microservices.html","title":"Tutorial - Gapi Micro Services","text":"<p>A Micro Service is a process that runs a wrapper python script that integrates your custom code/models so they can integrate into Gapi Workflows.</p> <p></p> <p>You can run a Micro Service wherever you like and connect it to a Gapi Server via the streaming, hybrid binary+json message protocol.</p> <p></p> <p>There are some out-of-the-box \u201cCommunity Micro Services\" that we integrate, test and pack into Docker images. When you run them, they auto integrate, load NVIDIA layers correctly and offer logging to the host system.</p> <p></p> <p>Even better, create your own! Just implement an on_message Python handler to process requests and respond. The rest is handled for you.</p> <p></p> <p>Think of a Micro Service as a simple wrapper to some code or model you have. It works the same as any other Node inside a Workflow. When it's your Micro Service Node's turn your on_message function will be called. Your script gets the rolling Transaction data for context and you then publish your data directly back into the flow.</p> <p></p>"},{"location":"tutorial_gapi_microservices.html#running-the-community-micro-services","title":"Running the Community Micro Services","text":"<p>Requirements for Community Micro Services</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space.</p> <ul> <li>Literally 4KB for your own Micro Service</li> <li>Anywhere from ~4GB to ~10GB for each Community Micro Service</li> </ul> </li> </ol> <pre><code>#1 Login and go to the Micro Services tab\n#2 Follow the instructions in the blue box on that page to download your custom configuration\n#3 Then follow the instructions below that for installing the Micro Service you want\n</code></pre> <p>Example of instruction page: </p> <p></p>"},{"location":"tutorial_gapi_microservices.html#congrats-you-can-go-through-the-workflow-tips-now","title":"Congrats! You Can Go Through the Workflow Tips Now","text":""},{"location":"tutorial_gapi_microservices.html#creating-your-own-micro-service","title":"Creating Your Own Micro Service","text":"<p>The entire Micro Service zip file is just 4KB with 4 files:</p> <ul> <li>message_handler.py: for you to respond</li> <li>message.py: for the streaming binary/json protocol</li> <li>gapi-ms: as entry point and handler)</li> <li>requirements.txt: defines just asyncio + websockets</li> </ul> <p>Full documentation here: Gapi Micro Service Docs. Synopsis below...</p> <pre><code>#1 Create logical Micro Service in UI and copy the key\n#2 Download the zip file from the UI\n#3 python gapi-ms.py ws://0.0.0.0:8090/gapi-ws [MICROSERVICE_KEY]\n#4 Refresh the UI to confirm it's online\n#5 Edit the message_handler.py to handle binary+json input and change the output\n#6 Add a Micro Service Node to a Workflow and tie it to your Micro Service. Hit Test.\n</code></pre>"},{"location":"tutorial_gapi_workflows.html","title":"Tutorial - Gapi","text":"<p>Gapi is an embeddable API gateway that creates streaming integrations between AI micro services and the systems that users leverage everyday.</p> <p>The project's goal is to accelerate the speed of creating pilots and demos of Jetson AI Lab achievements into real world environments</p> <p>\"On Device\" generative AI doesn't mean it has to live on an island!</p> <p></p> <ul> <li>Workflow engine with low code UI with dozens of open integrations and customizable clients for mobile web and desktop.</li> <li>Micro service framework for wrapping Jetson containers (Ollama, Whisper, Piper TTS, etc. are done, with more coming). Or wrap your own models/code and integrate it into Workflows.</li> <li>Real-time, hybrid, binary+json messaging smoothens intra-service calls and reduced latency. </li> <li>A fast path to proving generative AI value to stakeholders in their actual environment.</li> </ul>"},{"location":"tutorial_gapi_workflows.html#gapi-server","title":"Gapi Server","text":"<p>Embeddable API gateway software that runs in the background with a low code workflow UI for testing. The server is a message hub and state machine for workflow 'nodes' that talk to Micro Services. Think of it as connective-tissue for applications.</p> <p></p> <p>A Micro Service is a process that runs some wrapper python scripts that integrates custom code/models into Workflows using a streaming API.</p> <ul> <li> <p>Gapi Server can run on any Jetson Orin or really any computer as the Micro Services connect outbound over secure web sockets. It doesn't use any GPU resources. There is a also a little demo version to skip the Server install (but you'll still need to run your own Micro Services).</p> </li> <li> <p>Gapi Project Page</p> </li> <li>Gapi Github</li> <li>Gapi Docs</li> <li>Gapi Hosted Demo</li> </ul> <p>What you need to run Gapi Server on Jetson</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin NX (8GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)</p> </li> <li> <p>Docker</p> </li> <li> <p>Sufficient storage space.</p> <ul> <li>Size: ~1.3GB</li> </ul> </li> </ol> <p>Gapi Server will run on other environments. Email us at support@GenAINerds.com if that's something you think is worthwhile.</p> <p>Explaining the Steps:</p> <ul> <li>1) On the Docker host, create working dir for persistant data</li> <li>2) Download configuration files</li> <li>3) Unzip</li> <li>4) Pull Docker image, create container and start the process (will return console to you)</li> </ul> <p>Copy and Run the Steps:</p> <pre><code>mkdir ~/gapiData &amp;&amp; cd ~/gapiData\ncurl -L https://raw.githubusercontent.com/genai-nerds/Gapi/main/gapiConfigs.zip -o gapiConfigs.zip\nunzip -q gapiConfigs.zip\ndocker run -d --name gapi --network host -v ~/gapiData:/opt/gapi/vdata genainerds/gapi:arm64 /bin/bash -c \"cd /opt/gapi/bin &amp;&amp; ./startGapi.sh\"\necho \"You may need to hit Enter now. Afterwards the Docker container 'gapi' should be running\"\n</code></pre> <p>Troubleshooting:</p> <ul> <li>Keep in mind all data read or written is in ~/gapiData</li> <li>Look at ~/gapiData/gapi.log to see what happened (if say the docker run command doesn't work)</li> <li>gapiServerConfig.json has all the initial setup</li> </ul> <p>NOTE: You will need to run some Micro Services before doing anything meaningful, so please review the mini tour below but don't do any of it in the UI untill you complete the setup (instructions at the bottom)</p>"},{"location":"tutorial_gapi_workflows.html#ui","title":"UI","text":"<ul> <li>Browse in: http://[host-device-ip]:8090</li> <li>User: root</li> <li> <p>Pass: !gapi2024</p> </li> <li> <p>Change password in Settings! Docs shows how to add SSL cert.</p> </li> </ul>"},{"location":"tutorial_gapi_workflows.html#tips-use-case-templates","title":"Tips &amp; Use Case Templates","text":"<p> When you login there will be an array of Tip Workflows that have notes and explain core concepts.</p> <p>Tips:</p> <ul> <li>Hello World: Basics plus it literally says hello</li> <li>Run a Local LLM: Play with Ollama graphically and connect it to other systems</li> <li>Streaming Speech to Text: PiperTTS</li> <li>Querying a Vector Database: Query a simple set of vectorized text documents</li> <li>Variables, Flow and Logic: Understand how to setup more robust workflows</li> <li>Calling Workflows from Outside Gapi: Configure Webhooks</li> <li>Workflows Calling Your Code: Micro Service Nodes that invoke your code</li> <li>Communications: 3rd party communications like Slack (IM), Twilio (SMS), SendGrid (EMAIL)</li> </ul>"},{"location":"tutorial_gapi_workflows.html#workflows","title":"Workflows","text":"<p>Workflows visually connect the execution and flow of data between Nodes. </p> <p>A Transaction (or single firing) has \"Rolling Input\" data it accumulates as Node to Node steps each talk to Micro Services and APIs. All Nodes enjoy variables and flow control using familiar json and javascript concepts.</p> <p></p> <p>Each Node can append or reference the rolling data in the Transaction while making decisions along the way.</p> <p></p> <p>Watch live Transactions as they start from clients, webhooks and published messages from Micro Services with visual feedback and debugging.</p>"},{"location":"tutorial_gapi_workflows.html#apis-to-business-systems","title":"APIs to Business Systems","text":"<p>Gapi can help smoothly integrate generative AI into systems that people already use everyday via APIs. It has the streaming API to Micro Services plus the state management and chops to handle the outward (webhook) style APIs to existing systems.</p> <p>Our hope is to rally contributions by the community to keep growing the out-of-the-box/tested Nodes but there is a DIY one as well to manually map what you need into your Workflows.</p> <p>Some of the out-of-the-box API Nodes: Slack (IM), Twilio (SMS), SendGrid (Email), Service Now (Ticketing), DIY Webhook</p>"},{"location":"tutorial_gapi_workflows.html#micro-services","title":"Micro Services","text":"<p>There are community published Micro Services as well as custom ones you can make yourself. Gapi Server becomes most useful when leveraging them so please follow the How To below.</p> <p>Current Community Micro Services:</p> <ul> <li>Whisper</li> <li>Ollama</li> <li>Vector</li> <li>Text to Speech</li> <li>Img to Text</li> </ul> <p>Complete the Setup: How To Run and/or Create Micro Services</p>"},{"location":"tutorial_gapi_workflows.html#support-contribute","title":"Support / Contribute","text":"<p>Gapi is a project from the GenAI Nerds and hosted on Github.</p> <ul> <li>Ask a question, support@GenAINerds.com or</li> <li>Say hello, hello@GenAINerds.com</li> <li>Contribute/create tickets on Github</li> </ul>"},{"location":"tutorial_holoscan.html","title":"Tutorial - Holoscan SDK","text":"<p>The Holoscan SDK is part of NVIDIA Holoscan, the AI sensor processing platform that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications, from embedded to edge to cloud.</p>"},{"location":"tutorial_holoscan.html#holoscan-sdk-core-concepts","title":"Holoscan SDK - Core Concepts","text":"<p>A Holoscan SDK <code>Application</code> is composed of <code>Fragments</code>, each of which runs a graph of <code>Operators</code>. The implementation of that graph is sometimes referred to as a pipeline or workflow, which can be visualized below: </p>"},{"location":"tutorial_holoscan.html#holoscan-sdk-getting-started-on-jetson","title":"Holoscan SDK - Getting Started on Jetson","text":"<p>The best place to get started using the Holoscan SDK is the HoloHub repo. This is a central repository for the NVIDIA Holoscan AI sensor processing community to share apps and extensions.</p> <p>So, let's walk through how to run the Surgical Tool Tracking example application from HoloHub!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>13.7 GB</code> for <code>efficientvit</code> container image</li> <li><code>850 Mb</code> for Tool Tracking ONNX model + example video</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_holoscan.html#launching-a-holoscan-compatible-container","title":"Launching a Holoscan-compatible Container","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build the Holoscan container.</p> <p>Use the <code>-v</code> option to mount HoloHub's <code>./build</code> and <code>./data</code> directories to the local Holoscan package directory so builds and data are cached across runs</p> <pre><code>jetson-containers run \\\n    -v ./packages/holoscan/holohub/data:/opt/nvidia/holohub/data \\\n    -v ./packages/holoscan/holohub/build:/opt/nvidia/holohub/build \\\n    $(autotag holoscan)\n</code></pre>"},{"location":"tutorial_holoscan.html#running-holohubs-endoscopy-tool-tracking-app","title":"Running HoloHub's Endoscopy Tool Tracking App","text":"<p>An example application from HoloHub is the Endoscopy Tool Tracking application. This sample application demonstrates how the Holoscan SDK can be used to build an efficient pipeline that streams a video feed, preprocesses the data, runs inference using TensorRT, post-processes the data, and renders the video feed with the inference overlays.</p> <p></p>"},{"location":"tutorial_holoscan.html#building-the-app","title":"Building The App","text":"<p>The Holoscan SDK uses CMake to build C++ applications and also leverages CMake to pull and build app dependencies. So, regardless of whether an application is implemented using C++ or Python, many apps will still require that you \"build\" them first.</p> <p>The Endoscopy Tool Tracking App has both a Python and C++ implementation. Building this app creates the C++ application program, pulls in an example video, and builds the TensorRT engine used for inference.</p> <p>Go to the HoloHub directory <pre><code>cd /opt/nvidia/holohub\n</code></pre> Build the app using the 'run' script <pre><code>./run build endoscopy_tool_tracking\n</code></pre></p>"},{"location":"tutorial_holoscan.html#running-the-python-app","title":"Running The Python App","text":"<p>First, add the Holoscan SDK and the HoloHub build directory to your PYTHONPATH environment variable. <pre><code>export HOLOHUB_BUILD_PATH=/opt/nvidia/holohub/build/endoscopy_tool_tracking\nexport PYTHONPATH=$PYTHONPATH:$HOLOHUB_BUILD_PATH/python/lib:/opt/nvidia/holoscan/python/lib\n</code></pre> Next, run the application using Python! <pre><code>python3 /opt/nvidia/holohub/applications/endoscopy_tool_tracking/python/endoscopy_tool_tracking.python --data /opt/nvidia/holohub/data/endoscopy/\n</code></pre></p> <p>Shortly after launching the application, you will see the HoloViz window that visualizes the sample video and the model's outputs:</p> <p></p>"},{"location":"tutorial_holoscan.html#running-the-c-app","title":"Running The C++ App","text":"<p>The C++ app can be run using the run script by specifying the app name: <pre><code>./run launch endoscopy_tool_tracking\n</code></pre></p>"},{"location":"tutorial_holoscan.html#next-steps","title":"Next Steps","text":"<p>Congratulations! You've successfully run a Holoscan SDK application!</p> <p>To dive deeper into the Holoscan SDK, explore the SDK's documentation on Core Concepts, Holoscan by Example, and Creating an Application.</p>"},{"location":"tutorial_jetson-copilot.html","title":"Tutorial - Jetson Copilot","text":"<p>Jetson Copilot is a reference application for a local AI assistant, which demonstrates;</p> <ul> <li>Running open-source LLMs (large language models) on device</li> <li>RAG (retrieval-augmented generation) to let LLM have access to your locally indexed knowledge</li> </ul> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6 GB</code> for <code>jetrag</code> container image</li> <li>About <code>4 GB</code> for downloading some default models (<code>llama3</code> and <code>mxbai-embed-large</code>)</li> </ul> </li> </ol> <p>Info</p> <p>To run Jetson Copilot, you do not need to have <code>jetson-containers</code> on your system. It uses the <code>jetrag</code> container image that is being managed and built on <code>jetson-containers</code>.</p>"},{"location":"tutorial_jetson-copilot.html#getting-started","title":"\ud83c\udfc3 Getting started","text":""},{"location":"tutorial_jetson-copilot.html#first-time-setup","title":"First time setup","text":"<p>If this is your first time to run Jetson Copilot on your Jetson, first run <code>setup.sh</code> to ensure you have all the necessary software installed and the environment set up. </p> <pre><code>git clone https://github.com/NVIDIA-AI-IOT/jetson-copilot/\ncd jetson-copilot\n./setup_environment.sh\n</code></pre> <p>It will install the following, if not yet.</p> <ul> <li>Chromium web browser</li> <li>Docker</li> </ul>"},{"location":"tutorial_jetson-copilot.html#how-to-start-jetson-copilot","title":"How to start Jetson Copilot","text":"<pre><code>cd jetson-copilot\n./launch_jetson_copilot.sh\n</code></pre> <p>This will start a Docker container and start a Ollama server and Streamlit app inside the container. It will shows the URL on the console in order to access the web app hosted on your Jetson.</p> <p>With your web browser on Jetson, open the Local URL (<code>http://localhost:8501</code>).Or on a PC connected on the same network as on your Jetson, access the Network URL.</p> <pre><code>Local URL: http://localhost:8501\nNetwork URL: http://10.110.50.252:8501 \n</code></pre> <p>Info</p> <p>You will need the Internet connection on Jetson when it launches for the first time, as it will pull the container image (and download the default LLM and embedding model when web UI starts for the first time).</p> <p>When you access the web UI for the first time, it will dowload the default LLM (<code>llama3</code>) and the embedding model (<code>mxbai-embed-large</code>).</p> <p>Tips</p> <p>If you are on Ubuntu Desktop, a frameless Chromium window will pop up to access the web app, to make it look like an independent application. You need to close the window as stopping the container on the console won't shutdown Chromium.</p> <p></p>"},{"location":"tutorial_jetson-copilot.html#how-to-use-jetson-copilot","title":"\ud83d\udcd6 How to use Jetson Copilot","text":""},{"location":"tutorial_jetson-copilot.html#0-interact-with-the-plain-llama3-8b","title":"0. Interact with the plain Llama3 (8b)","text":"<p>You can use Jetson Copilot just to interact with a LLM withut enabling RAG feature.</p> <p>By default, Llama3 (8b) model is downloaded when running for the first time and use as the default LLM.</p> <p>You will be surprized how much a model like Llama3 is capable, but may soon find limitations as it does not have information prior to its cutoff date nor know anything about your specific subject matter.</p>"},{"location":"tutorial_jetson-copilot.html#1-ask-jetson-related-question-using-pre-built-index","title":"1. Ask Jetson related question using pre-built index","text":"<p>On the side panel, you can toggle \"Use RAG\" on to enable RAG pipeline. The LLM will have an access to a custom knowledge/index that is selected under \"Index\".</p> <p>As a sample, a pre-build index \"<code>_L4T_README</code>\" is provided. This is built on all the README text files that supplied in the \"L4T-README\" folder on the Jetson desktop.</p> <p>It is mounted as <code>/media/&lt;USER_NAME&gt;/L4T-README/</code> once you execute <code>udisksctl mount -b /dev/disk/by-label/L4T-README</code>.</p> <p>You can ask questions like:</p> <pre><code>What IP address does Jetson gets assigned when connected to a PC via a USB cable in USB Device Mode?\n</code></pre>"},{"location":"tutorial_jetson-copilot.html#2-build-your-own-index-based-on-your-documents","title":"2. Build your own index based on your documents","text":"<p>You can build your own index based on your local and/or online documents.</p> <p>First, on the console (or on the desktop) create a directory under <code>Documents</code> directory to store your documents.</p> <pre><code>cd jetson-copilot\nmkdir Documents/Jetson-Orin-Nano\ncd Documents/Jetson-Orin-Nano\nwget https://developer.nvidia.com/downloads/assets/embedded/secure/jetson/orin_nano/docs/jetson_orin_nano_devkit_carrier_board_specification_sp.pdf\n</code></pre> <p>Now back on the web UI, open the side bar, toggle on \"Use RAG\", then click on \"\u2795Build a new index\" to jump to a \"Build Index\" page.</p> <p>Give a name for the Index you are to build. (e.g. \"JON Carrier Board\") Type in the field and hit <code>Enter</code> key, then it will check and show what path will be created for your index.</p> <p></p> <p>And then from the drop select box under \"Local documents\", select the directory you created and saved your documents in. (e.g. <code>/opt/jetson_copilot/Documents/Jetson-Orin-Nano</code>).</p> <p>It will show the summary of files found in the selected directory.</p> <p></p> <p>If you want to rather only or additionally supply URLs for the online docuemnts to be ingested, fill the text area with one URL per a line. You can skip this if you are building your index only based on your local documents.</p> <p>Info</p> <p>On the sidebar, make sure <code>mxbai-embed-large</code> is selected for the embedding model.</p> <p>Use of OpenAI embedding models is not well supported and needs more testing.</p> <p>Finally, hit \"Build Index\" button. It will show the progress in the drop-down \"status container\", so you can check the status by clicking on it. Once done, it will show the summary of your index and time it took.</p> <p>You can go back to the home screen to now select the index you just built.</p>"},{"location":"tutorial_jetson-copilot.html#3-test-different-llm-or-embedding-model","title":"3. Test different LLM or Embedding model","text":"<p>TODO</p>"},{"location":"tutorial_jetson-copilot.html#development","title":"\ud83c\udfd7\ufe0f Development","text":"<p>Streamlit based web app is very easy to develop.</p> <p>On web UI, at the top-right of the screen to choose \"Always rerun\" to automatically update your app every time you change the source codes.</p> <p>See Streamlit Documentation for the detail.</p>"},{"location":"tutorial_jetson-copilot.html#manually-run-streamlit-app-inside-the-container","title":"Manually run streamlit app inside the container","text":"<p>In case you make more fundamental changes, you can also manually run streamlit app.</p> <pre><code>cd jetson-copilot\n./launch_dev.sh\n</code></pre> <p>Once in container;</p> <pre><code>streamlit run app.py\n</code></pre>"},{"location":"tutorial_jetson-copilot.html#directory-structure","title":"\ud83e\uddf1 Directory structure","text":"<pre><code>\u2514\u2500\u2500 jetson-copilot\n    \u251c\u2500\u2500 launch_jetson_copilot.sh\n    \u251c\u2500\u2500 setup_environment.sh\n    \u251c\u2500\u2500 Documents \n    \u2502   \u2514\u2500\u2500 your_abc_docs\n    \u251c\u2500\u2500 Indexes\n    \u2502   \u251c\u2500\u2500 _L4T_README\n    \u2502   \u2514\u2500\u2500 your_abc_index\n    \u251c\u2500\u2500 logs\n    \u2502   \u251c\u2500\u2500 container.log\n    \u2502   \u2514\u2500\u2500 ollama.log\n    \u251c\u2500\u2500 ollama_models\n    \u2514\u2500\u2500 Streamlit_app\n        \u251c\u2500\u2500 app.py\n        \u251c\u2500\u2500 build_index.py\n        \u2514\u2500\u2500 download_model.py\n</code></pre> <p>Following directories inside the <code>jetson-copilot</code> directory are mounted in the Docker container.</p> Directory Name Description <code>Docuemtns</code> Directory to store your documents to be indexed <code>Indexes</code> Directory to store pre-built (or built-by-you) indexes for LLM to perform RAG on <code>logs</code> Directory for the app to store log files <code>ollama_models</code> Directory for the ollama server to store download models <code>stremlit_app</code> Directory for Python scripts to make up the web app"},{"location":"tutorial_jetson-copilot.html#troubleshooting","title":"\ud83d\udcab Troubleshooting","text":"<p>If you find any issue, please check GitHub Issues of the Jetson Copilot repo.</p>"},{"location":"tutorial_jps.html","title":"Tutorial - Jetson Platform Services","text":"<p>Jetson Plaform Services (JPS) provide a platform to simplify development, deployment and management of Edge AI applications on NVIDIA Jetson. JPS is a modular &amp; extensible architecture for developers to distill large complex applications into smaller modular microservice with APIs to integrate into other apps &amp; services. At its core are a collection of AI services leveraging generative AI, deep learning, and analytics, which provide state of the art capabilities including video analytics, video understanding and summarization, text based prompting, zero shot detection and spatio temporal analysis of object movement. </p> <p> VLM Alert Workflow built with JPS</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> </li> </ol> <p>To get started with Jetson Platform Services, follow the quickstart guide to install and setup JPS. Then explore the reference workflows to learn how to use DeepStream, Analytics, Generative AI and more with JPS:</p> <p>1) Quick Start Guide 2) AI Powered Network Video Recorder 3) Zero Shot Detection with NanoOWL 4) Visual Language Model Alerts </p> <p>The reference workflows demonstrate how to use the microservices provided in JPS to build full end to end systems on your Jetson. </p> <p> VLM Alert Workflow Architecture</p> <p>View the links below to learn more about Jetson Platform Services: VLM Alert Blog JPS Product Page JPS Documentation  VLM Alert Demo Video</p>"},{"location":"tutorial_live-llava.html","title":"Tutorial - Live LLaVA","text":"<p>Recommended</p> <p>Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for in interactive pipeline editor built from live VLMs.</p> <p>This multimodal agent runs a vision-language model on a live camera feed or video stream, repeatedly applying the same prompts to it:</p> <p></p> <p>It uses models like LLaVA or VILA and has been quantized with 4-bit precision.  This runs an optimized multimodal pipeline from the <code>NanoLLM</code> library, including running the CLIP/SigLIP vision encoder in TensorRT, event filters and alerts, and multimodal RAG (see the NanoVLM page for benchmarks)</p> <p></p>"},{"location":"tutorial_live-llava.html#running-the-live-llava-demo","title":"Running the Live Llava Demo","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Follow the chat-based LLaVA and NanoVLM tutorials first.</p> </li> <li> <p>Supported vision/language models:</p> <ul> <li><code>liuhaotian/llava-v1.5-7b</code>, <code>liuhaotian/llava-v1.5-13b</code>, <code>liuhaotian/llava-v1.6-vicuna-7b</code>, <code>liuhaotian/llava-v1.6-vicuna-13b</code></li> <li><code>Efficient-Large-Model/VILA-2.7b</code>,<code>Efficient-Large-Model/VILA-7b</code>, <code>Efficient-Large-Model/VILA-13b</code></li> <li><code>Efficient-Large-Model/VILA1.5-3b</code>,<code>Efficient-Large-Model/Llama-3-VILA1.5-8B</code>, <code>Efficient-Large-Model/VILA1.5-13b</code></li> <li><code>VILA-2.7b</code>, <code>VILA1.5-3b</code>, <code>VILA-7b</code>, <code>Llava-7b</code>, and <code>Obsidian-3B</code> can run on Orin Nano 8GB</li> </ul> </li> </ol> <p>The VideoQuery agent applies prompts to the incoming video feed with the VLM.  Navigate your browser to <code>https://&lt;IP_ADDRESS&gt;:8050</code> after launching it with your camera (Chrome is recommended with <code>chrome://flags#enable-webrtc-hide-local-ips-with-mdns</code> disabled)</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.video_query --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --video-input /dev/video0 \\\n    --video-output webrtc://@:8554/output\n</code></pre> <p>This uses <code>jetson_utils</code> for video I/O, and for options related to protocols and file formats, see Camera Streaming and Multimedia.  In the example above, it captures a V4L2 USB webcam connected to the Jetson (under the device <code>/dev/video0</code>) and outputs a WebRTC stream.</p>"},{"location":"tutorial_live-llava.html#processing-a-video-file-or-stream","title":"Processing a Video File or Stream","text":"<p>The example above was running on a live camera, but you can also read and write a video file or network stream by substituting the path or URL to the <code>--video-input</code> and <code>--video-output</code> command-line arguments like this:</p> <pre><code>jetson-containers run \\\n  -v /path/to/your/videos:/mount\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.agents.video_query --api=mlc \\\n      --model Efficient-Large-Model/VILA1.5-3b \\\n      --max-context-len 256 \\\n      --max-new-tokens 32 \\\n      --video-input /mount/my_video.mp4 \\\n      --video-output /mount/output.mp4 \\\n      --prompt \"What does the weather look like?\"\n</code></pre> <p>This example processes and pre-recorded video (in MP4, MKV, AVI, FLV formats with H.264/H.265 encoding), but it also can input/output live network streams like RTP, RTSP, and WebRTC using Jetson's hardware-accelerated video codecs.</p>"},{"location":"tutorial_live-llava.html#nanodb-integration","title":"NanoDB Integration","text":"<p>If you launch the VideoQuery agent with the <code>--nanodb</code> flag along with a path to your NanoDB database, it will perform reverse-image search on the incoming feed against the database by re-using the CLIP embeddings generated by the VLM.</p> <p>To enable this mode, first follow the NanoDB tutorial to download, index, and test the database.  Then launch VideoQuery like this:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.video_query --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --video-input /dev/video0 \\\n    --video-output webrtc://@:8554/output \\\n    --nanodb /data/nanodb/coco/2017\n</code></pre> <p>You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:</p>"},{"location":"tutorial_live-llava.html#video-vila","title":"Video VILA","text":"<p>The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action &amp; behavior analysis, change detection, and other temporal-based vision functions.  The <code>vision/video.py</code> example keeps a rolling history of frames:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.video \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-images 8 \\\n    --max-new-tokens 48 \\\n    --video-input /data/my_video.mp4 \\\n    --video-output /data/my_output.mp4 \\\n    --prompt 'What changes occurred in the video?'\n</code></pre> <p></p>"},{"location":"tutorial_live-llava.html#python-code","title":"Python Code","text":"<p>For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs. </p> <p>You can use this to implement customized prompting techniques and integrate with other vision pipelines.  This code applies the same set of prompts to the latest image from the video feed.  See here for the version that does multi-image queries on video sequences.</p>"},{"location":"tutorial_live-llava.html#walkthrough-videos","title":"Walkthrough Videos","text":""},{"location":"tutorial_llamaindex.html","title":"Tutorial - LlamaIndex","text":"<p>Let's use LlamaIndex, to realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>5.5 GB</code> for <code>llama-index</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_llamaindex.html#how-to-start-a-container-with-samples","title":"How to start a container with samples","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag llama-index:samples)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p> <p>You can follow along <code>LlamaIndex_Local-Models_L4T.ipynb</code> (which is based on the official LlamaIndex tutorial). </p> <p></p>"},{"location":"tutorial_llamaspeak.html","title":"Tutorial - llamaspeak","text":"<p>Talk live with Llama using streaming ASR/TTS, and chat about images with Llava!</p> <p></p> <ul> <li>The <code>NanoLLM</code> library provides optimized inference for LLM and speech models.</li> <li>It's recommended to run JetPack 6.0 to be able to run the latest containers.</li> </ul> <p>The <code>WebChat</code> agent has responsive conversational abilities and multimodal support for chatting about images with vision/language models, including overlapping ASR/LLM/TTS generation and verbal interruptability.</p>"},{"location":"tutorial_llamaspeak.html#running-llamaspeak","title":"Running llamaspeak","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Start the Riva server first and test the ASR examples.</p> </li> </ol> <pre><code>jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n  $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.web_chat --api=mlc \\\n    --model meta-llama/Meta-Llama-3-8B-Instruct \\\n    --asr=riva --tts=piper\n</code></pre> <p>This will start llamaspeak with text LLM and ASR/TTS enabled.  You can then navigate your browser to <code>https://IP_ADDRESS:8050</code> <ul> <li>The default port is 8050, but can be changed with <code>--web-port</code> (and <code>--ws-port</code> for the websocket port)</li> <li>During bot replies, the TTS model will pause output if you speak a few words in the mic to interrupt it.</li> <li>Request access to the Llama models on HuggingFace and substitute your account's API token above.</li> </ul> </p> <p>The code and docs for the <code>WebAgent</code> that runs llamaspeak can be found in the NanoLLM library.  This block diagram shows the speech pipeline with interleaved model generation, user interruption, and streaming I/O:</p> <p></p>"},{"location":"tutorial_llamaspeak.html#multimodality","title":"Multimodality","text":"<p>If you load a multimodal vision/language model instead, you can drag images into the chat and ask questions about them:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.web_chat --api=mlc \\\n    --model Efficient-Large-Model/VILA-7b \\\n    --asr=riva --tts=piper\n</code></pre> <p>For more info about the supported vision/language models, see the NanoVLM page.</p>"},{"location":"tutorial_llamaspeak.html#function-calling","title":"Function Calling","text":"<p>There's the ability to define functions from Python that the bot has access to and can invoke based on the chat flow:</p> <p>This works by using the <code>bot_function()</code> decorator and adding the API description's to the system prompt:</p> <pre><code>from nano_llm import NanoLLM, ChatHistory, BotFunctions, bot_function\nfrom datetime import datetime\n\n@bot_function\ndef DATE():\n    \"\"\" Returns the current date. \"\"\"\n    return datetime.now().strftime(\"%A, %B %-m %Y\")\n\n@bot_function\ndef TIME():\n    \"\"\" Returns the current time. \"\"\"\n    return datetime.now().strftime(\"%-I:%M %p\")\n\nsystem_prompt = \"You are a helpful and friendly AI assistant.\" + BotFunctions.generate_docs()\n</code></pre> <p>The system prompt can be autogenerated from the Python docstrings embedded in the functions themselves, and can include parameters that the bot can supply (for example, selectively saving relevant user info to a vector database for RAG like is shown in the video).  </p> <p>For more information about this topic, see the Function Calling section of the NanoLLM documentation.  </p>"},{"location":"tutorial_llava.html","title":"Tutorial - LLaVA","text":"<p>LLaVA is a popular multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries.  Llava uses the CLIP vision encoder to transform images into the same embedding space as its LLM (which is the same as Llama architecture).  Below we cover different methods to run Llava on Jetson, with increasingly optimized performance:</p> <ol> <li>Chat with Llava using <code>text-generation-webui</code></li> <li>Run from the terminal with <code>llava.serve.cli</code></li> <li>Quantized GGUF models with <code>llama.cpp</code></li> <li>Optimized Multimodal Pipeline with <code>NanoVLM</code></li> </ol> Llava-13B (Jetson AGX Orin) Quantization Tokens/sec Memory <code>text-generation-webui</code> 4-bit (GPTQ) 2.3 9.7 GB <code>llava.serve.cli</code> FP16 (None) 4.2 27.7 GB <code>llama.cpp</code> 4-bit (Q4_K) 10.1 9.2 GB <code>NanoVLM</code> 4-bit (MLC) 21.1 8.7 GB <p>In addition to Llava, the <code>NanoVLM</code> pipeline supports VILA and mini vision models that run on Orin Nano as well.</p> <p></p>"},{"location":"tutorial_llava.html#1-chat-with-llava-using-text-generation-webui","title":"1. Chat with Llava using <code>text-generation-webui</code>","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.2GB</code> for <code>text-generation-webui</code> container image</li> <li>Space for models<ul> <li>CLIP model : <code>1.7GB</code></li> <li>Llava-v1.5-13B-GPTQ model : <code>7.25GB</code></li> </ul> </li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_llava.html#download-model","title":"Download Model","text":"<pre><code>jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n  python3 download-model.py --output=/data/models/text-generation-webui \\\n    TheBloke/llava-v1.5-13B-GPTQ\n</code></pre>"},{"location":"tutorial_llava.html#start-web-ui-with-multimodal-extension","title":"Start Web UI with Multimodal Extension","text":"<pre><code>jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n  python3 server.py --listen \\\n    --model-dir /data/models/text-generation-webui \\\n    --model TheBloke_llava-v1.5-13B-GPTQ \\\n    --multimodal-pipeline llava-v1.5-13b \\\n    --loader autogptq \\\n    --disable_exllama \\\n    --verbose\n</code></pre> <p>Go to Chat tab, drag and drop an image into the Drop Image Here area, and your question in the text area and hit Generate:</p> <p></p>"},{"location":"tutorial_llava.html#result","title":"Result","text":""},{"location":"tutorial_llava.html#2-run-from-the-terminal-with-llavaservecli","title":"2. Run from the terminal with <code>llava.serve.cli</code>","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin 64GB Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.1GB</code> for <code>llava</code> container</li> <li><code>14GB</code> for Llava-7B (or <code>26GB</code> for Llava-13B)</li> </ul> </li> </ol> <p>This example uses the upstream Llava repo to run the original, unquantized Llava models from the command-line.  It uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging.  See the Llava container readme for more info.</p>"},{"location":"tutorial_llava.html#llava-v15-7b","title":"llava-v1.5-7b","text":"<pre><code>jetson-containers run $(autotag llava) \\\n  python3 -m llava.serve.cli \\\n    --model-path liuhaotian/llava-v1.5-7b \\\n    --image-file /data/images/hoover.jpg\n</code></pre>"},{"location":"tutorial_llava.html#llava-v15-13b","title":"llava-v1.5-13b","text":"<pre><code>jetson-containers run $(autotag llava) \\\n  python3 -m llava.serve.cli \\\n    --model-path liuhaotian/llava-v1.5-13b \\\n    --image-file /data/images/hoover.jpg\n</code></pre> <p>Unquantized 13B may run only on Jetson AGX Orin 64GB due to memory requirements.</p>"},{"location":"tutorial_llava.html#3-quantized-gguf-models-with-llamacpp","title":"3. Quantized GGUF models with <code>llama.cpp</code>","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> </ol> <p>llama.cpp is one of the faster LLM API's, and can apply a variety of quantization methods to Llava to reduce its memory usage and runtime.  Despite its name, it uses CUDA.  There are pre-quantized versions of Llava-1.5 available in GGUF format for 4-bit and 5-bit:</p> <ul> <li>mys/ggml_llava-v1.5-7b</li> <li>mys/ggml_llava-v1.5-13b</li> </ul> <pre><code>jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n  /bin/bash -c './llava-cli \\\n    --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n    --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n    --n-gpu-layers 999 \\\n    --image /data/images/hoover.jpg \\\n    --prompt \"What does the sign say\"'\n</code></pre> Quantization Bits Response Tokens/sec Memory <code>Q4_K</code> 4 The sign says \"Hoover Dam, Exit 9.\" 10.17 9.2 GB <code>Q5_K</code> 5 The sign says \"Hoover Dam exit 9.\" 9.73 10.4 GB <p>A lower temperature like 0.1 is recommended for better quality (<code>--temp 0.1</code>), and if you omit <code>--prompt</code> it will describe the image:</p> <pre><code>jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n  /bin/bash -c './llava-cli \\\n    --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n    --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n    --n-gpu-layers 999 \\\n    --image /data/images/lake.jpg'\n\nIn this image, a small wooden pier extends out into a calm lake, surrounded by tall trees and mountains. The pier seems to be the only access point to the lake. The serene scene includes a few boats scattered across the water, with one near the pier and the others further away. The overall atmosphere suggests a peaceful and tranquil setting, perfect for relaxation and enjoying nature.\n</code></pre> <p>You can put your own images in the mounted <code>jetson-containers/data</code> directory.  The C++ code for llava-cli can be found here.  The llama-cpp-python bindings also support Llava, however they are slower from Python (potentially handling of the tokens) </p>"},{"location":"tutorial_llava.html#4-optimized-multimodal-pipeline-with-nanovlm","title":"4. Optimized Multimodal Pipeline with <code>NanoVLM</code>","text":"<p>What's Next</p> <p>This section got too long and was moved to the NanoVLM page - check it out there for performance optimizations, mini VLMs, and live streaming!</p> <p></p> <p></p>"},{"location":"tutorial_minigpt4.html","title":"Tutorial - MiniGPT-4","text":"<p>Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>5.8GB</code> for container image</li> <li>Space for pre-quantized MiniGPT-4 model</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_minigpt4.html#start-minigpt4-container-with-models","title":"Start <code>minigpt4</code> container with models","text":"<p>To start the MiniGPT4 container and webserver with the recommended models, run this command:</p> <pre><code>jetson-containers run $(autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 &amp;&amp; python3 webui.py \\\n  $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \\\n  $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'\n</code></pre> <p>Then, open your web browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code>.</p>"},{"location":"tutorial_minigpt4.html#results","title":"Results","text":""},{"location":"tutorial_mmj.html","title":"Tutorial mmj","text":"<p>17# First steps with Metropolis Microservices for Jetson</p> <p>NVIDIA Metropolis Microservices for Jetson simplifies the development of vision AI applications, offering a suite of customizable, cloud-native tools. Before diving into this tutorial, ensure you've filled out the Metropolis Microservices for Jetson Early Access form to gain the necessary access to launch the services. This step is crucial as it enables you to utilize all the features and capabilities discussed in this guide.</p> <p>Perfect for both newcomers and experts, this tutorial provides straightforward steps to kick-start your edge AI projects. Whether you're a student or an ecosystem partner working on a use case, this guide offers a straightforward start for every skill level.</p> <p></p>"},{"location":"tutorial_mmj.html#0-install-nvidia-jetson-services","title":"0. Install NVIDIA Jetson Services:","text":"<p>Ok, let's start by installing NVIDIA Jetson Services: <pre><code>sudo apt install nvidia-jetson-services\n</code></pre></p> <p>Let's add some performance hacks that will be needed to run the demo faster and without streaming artifacts:</p> <ul> <li> <p>If you don't have the Orin at max performance, you can use these two commands, a reboot is needed after: <pre><code>sudo nvpmodel -m 0 \nsudo jetson_clocks\n</code></pre></p> </li> <li> <p>After these two commands, a reboot is needed if your Jetson wasn't already in high-performance mode. These are optional, but they fine-tune your network buffers to ensure smoother streaming by optimizing how much data can be sent and received: <pre><code>sudo sysctl -w net.core.rmem_default=2129920\nsudo sysctl -w net.core.rmem_max=10000000\nsudo sysctl -w net.core.wmem_max=2000000\n</code></pre></p> </li> </ul>"},{"location":"tutorial_mmj.html#1-download-nvidia-cli-for-jetson","title":"1. Download NVIDIA CLI for Jetson","text":"<p>Download NGC for ARM64 from the NGC for CLI site:  <pre><code>unzip ngccli_arm64.zip\nchmod u+x ngc-cli/ngc\necho \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" &gt;&gt; ~/.bash_profile &amp;&amp; source ~/.bash_profile\nngc config set\n</code></pre> Here it will ask for your API Key, and the organization name, to get those you need to login into NGC and generate an API key here.</p> <p>You should then paste the API key and use the organization name you are using. You can also press [Enter] to select the default values for the remaining options. After this, you should get the message:</p> <pre><code>Successfully saved NGC configuration to /home/jetson/.ngc/config\n</code></pre> <p>Then, login with the same API key: <pre><code>sudo docker login nvcr.io -u \"\\$oauthtoken\" -p &lt;NGC-API-KEY&gt;\n</code></pre></p> <p>Now launch the Redis and Ingress services, as we need them for this tutorial. </p> <pre><code>sudo systemctl start jetson-redis\nsudo systemctl start jetson-ingress\n</code></pre>"},{"location":"tutorial_mmj.html#2-download-and-launch-nvstreamer","title":"2. Download and launch NVStreamer","text":""},{"location":"tutorial_mmj.html#_1","title":"Tutorial mmj","text":"<p>First, we need to install NVStreamer, an app that streams the videos MMJs will need to run AI on them. Follow this NVStreamer Link (In the top-left, click Download files.zip)</p> <p><pre><code>unzip files.zip\nrm files.zip\ntar -xvf nvstreamer.tar.gz\ncd nvstreamer\n</code></pre> Launch it: <pre><code>sudo docker compose -f compose_nvstreamer.yaml up -d  --force-recreate\n</code></pre></p>"},{"location":"tutorial_mmj.html#3-download-ai_nvr-and-launch","title":"3. Download AI_NVR and launch:","text":""},{"location":"tutorial_mmj.html#_2","title":"Tutorial mmj","text":"<p>AI NVR (NGC) Link (Top-left -&gt; Download files.zip)</p> <pre><code>unzip files.zip\nrm files.zip\ntar -xvf ai_nvr.tar.gz\nsudo cp ai_nvr/config/ai-nvr-nginx.conf /opt/nvidia/jetson/services/ingress/config/\ncd ai_nvr\nsudo docker compose -f compose_agx.yaml up -d --force-recreate\n</code></pre>"},{"location":"tutorial_mmj.html#4-download-some-sample-videos-and-upload-them-to-nvstreamer","title":"4. Download some sample videos and upload them to NVStreamer","text":"<p>Download them from here.</p> <p><pre><code>unzip files.zip\n</code></pre> Ok, now, this is important to understand, there are 2 web interfaces:</p> <ol> <li>The NVStream Streamer Dashboard, running in: http://localhost:31000</li> <li>The NVStreamer Camera Management Dashboard, running in: http://localhost:31000</li> </ol> <p>So, first we need to upload the file in the Streamer interface, it looks like this:</p> <p></p> <p>There, go to File Upload, and drag and drop the file in the upload squared area.</p> <p>After uploading it, go to the Dashboad option of the left menu, and copy the RTSP URL of the video you just uploaded, you will need it for the Camera Management Dashboard.</p> <p>Now jump to the Camera Management Dashboard (http://localhost:30080/vst), it looks like this:</p> <p></p> <p>Go to the Camera Management option of the menu, then use the Add device manually option, and paste the RTSP URL, add the name of your video to the Name and Location text boxes, so it will be displayed on top of the stream.</p> <p>Finally, click in the Live Streams option of the left menu, and you should be able to watch your video stream.</p> <p></p>"},{"location":"tutorial_mmj.html#5-watch-rtsp-ai-processed-streaming-from-vlc","title":"5. Watch RTSP AI processed streaming from VLC","text":"<p>Open VLC from another computer (localhost doesn't work here), and point to your Jetson Orin's IP address (you should be in the same network, or not having a firewal to access).</p> <p>The easiest way to get Jetson's ip is launching: <pre><code>ifconfig\n</code></pre> And checking the IP of the interface (usually wlan0, inet IP).</p> <p>Then go to rtsp://[JETSON_IP]:8555/ds-test using VLC like this:</p> <p></p>"},{"location":"tutorial_mmj.html#6-android-app","title":"6. Android app","text":"<p>There is an Android app that allows you to track events and create areas of interest to monitor, you can find it on Google Play as AI NVR.</p> <p></p> <p>Here is a quick walkthough where you can see how to:</p> <ul> <li>Add the IP address of the Jetson</li> <li>Track current events</li> <li>Add new areas of interest</li> <li>Add tripwire to track the flux and direction of events</li> </ul> <p></p>"},{"location":"tutorial_nano-llm.html","title":"NanoLLM - Optimized LLM Inference","text":"<p><code>NanoLLM</code> is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM\u2019s, multimodality, speech services, vector databases with RAG, and web frontends like Agent Studio.</p> <p></p> <p>It provides similar APIs to HuggingFace, backed by highly-optimized inference libraries and quantization tools:</p> NanoLLM Reference Documentation<pre><code>from nano_llm import NanoLLM\n\nmodel = NanoLLM.from_pretrained(\n   \"meta-llama/Meta-Llama-3-8B-Instruct\",  # HuggingFace repo/model name, or path to HF model checkpoint\n   api='mlc',                              # supported APIs are: mlc, awq, hf\n   api_token='hf_abc123def',               # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)\n   quantization='q4f16_ft'                 # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n\nresponse = model.generate(\"Once upon a time,\", max_new_tokens=128)\n\nfor token in response:\n   print(token, end='', flush=True)\n</code></pre>"},{"location":"tutorial_nano-llm.html#containers","title":"Containers","text":"<p>To test a chat session with Llama from the command-line, install <code>jetson-containers</code> and run NanoLLM like this:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> Llama CLIAgent Studio <pre><code>jetson-containers run \\\n  --env HUGGINGFACE_TOKEN=hf_abc123def \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.chat --api mlc \\\n      --model meta-llama/Meta-Llama-3-8B-Instruct \\\n      --prompt \"Can you tell me a joke about llamas?\"\n</code></pre> <pre><code>jetson-containers run \\\n  --env HUGGINGFACE_TOKEN=hf_abc123def \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.studio\n</code></pre> <p>If you haven't already, request access to the Llama models on HuggingFace and substitute your account's API token above.</p>"},{"location":"tutorial_nano-llm.html#resources","title":"Resources","text":"<p>Here's an index of the various tutorials &amp; examples using NanoLLM on Jetson AI Lab:</p> Benchmarks Benchmarking results for LLM, SLM, VLM using MLC/TVM backend. API Examples Python code examples for chat, completion, and multimodal. Documentation Reference documentation for the NanoLLM model and agent APIs. Llamaspeak Talk verbally with LLMs using low-latency ASR/TTS speech models. Small LLM (SLM) Focus on language models with reduced footprint (7B params and below) Live LLaVA Realtime live-streaming vision/language models on recurring prompts. Nano VLM Efficient multimodal pipeline with one-shot image tagging and RAG support. Agent Studio Rapidly design and experiment with creating your own automation agents. OpenVLA Robot learning with Vision/Language Action models and manipulation in simulator."},{"location":"tutorial_nano-vlm.html","title":"NanoVLM - Efficient Multimodal Pipeline","text":"<p>We saw in the previous LLaVA tutorial how to run vision-language models through tools like <code>text-generation-webui</code> and <code>llama.cpp</code>.  In a similar vein to the SLM page on Small Language Models, here we'll explore optimizing VLMs for reduced memory usage and higher performance that reaches interactive levels (like in Liva LLava).  These are great for fitting on Orin Nano and increasing the framerate.</p> <p>There are 3 model families currently supported:  Llava, VILA, and Obsidian (mini VLM)</p>"},{"location":"tutorial_nano-vlm.html#vlm-benchmarks","title":"VLM Benchmarks","text":"<p>This FPS measures the end-to-end pipeline performance for continuous streaming like with Live Llava (on yes/no question)  </p>"},{"location":"tutorial_nano-vlm.html#multimodal-chat","title":"Multimodal Chat","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Supported VLM models in <code>NanoLLM</code>:</p> <ul> <li><code>liuhaotian/llava-v1.5-7b</code>, <code>liuhaotian/llava-v1.5-13b</code>, <code>liuhaotian/llava-v1.6-vicuna-7b</code>, <code>liuhaotian/llava-v1.6-vicuna-13b</code></li> <li><code>Efficient-Large-Model/VILA-2.7b</code>,<code>Efficient-Large-Model/VILA-7b</code>, <code>Efficient-Large-Model/VILA-13b</code></li> <li><code>Efficient-Large-Model/VILA1.5-3b</code>,<code>Efficient-Large-Model/Llama-3-VILA1.5-8B</code>, <code>Efficient-Large-Model/VILA1.5-13b</code></li> <li><code>VILA-2.7b</code>, <code>VILA1.5-3b</code>, <code>VILA-7b</code>, <code>Llava-7b</code>, and <code>Obsidian-3B</code> can run on Orin Nano 8GB</li> </ul> </li> </ol> <p>The optimized <code>NanoLLM</code> library uses MLC/TVM for quantization and inference provides the highest performance.  It efficiently manages the CLIP embeddings and KV cache.  You can find Python code for the chat program used in this example here. </p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32\n</code></pre> <p>This starts an interactive console-based chat with Llava, and on the first run the model will automatically be downloaded from HuggingFace and quantized using MLC and W4A16 precision (which can take some time).  See here for command-line options.</p> <p>You'll end up at a <code>&gt;&gt; PROMPT:</code> in which you can enter the path or URL of an image file, followed by your question about the image.  You can follow-up with multiple questions about the same image.  Llava does not understand multiple images in the same chat, so when changing images, first reset the chat history by entering <code>clear</code> or <code>reset</code> as the prompt.  VILA supports multiple images (area of active research)</p>"},{"location":"tutorial_nano-vlm.html#automated-prompts","title":"Automated Prompts","text":"<p>During testing, you can specify prompts on the command-line that will run sequentially:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --prompt '/data/images/hoover.jpg' \\\n    --prompt 'what does the road sign say?' \\\n    --prompt 'what kind of environment is it?' \\\n    --prompt 'reset' \\\n    --prompt '/data/images/lake.jpg' \\\n    --prompt 'please describe the scene.' \\\n    --prompt 'are there any hazards to be aware of?'\n</code></pre> <p>You can also use <code>--prompt /data/prompts/images.json</code> to run the test sequence, the results of which are in the table below.</p>"},{"location":"tutorial_nano-vlm.html#results","title":"Results","text":"<p>\u2022 \u00a0 The model responses are with 4-bit quantization enabled, and are truncated to 128 tokens for brevity. \u2022 \u00a0 These chat questions and images are from <code>/data/prompts/images.json</code> (found in jetson-containers) </p>"},{"location":"tutorial_nano-vlm.html#json","title":"JSON","text":"<p>When prompted, these models can also output in constrained JSON formats (which the LLaVA authors cover in their LLaVA-1.5 paper), and can be used to programatically query information about the image:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model liuhaotian/llava-v1.5-13b \\\n    --prompt '/data/images/hoover.jpg' \\\n    --prompt 'extract any text from the image as json'\n\n{\n  \"sign\": \"Hoover Dam\",\n  \"exit\": \"2\",\n  \"distance\": \"1 1/2 mile\"\n}\n</code></pre>"},{"location":"tutorial_nano-vlm.html#web-ui","title":"Web UI","text":"<p>To use this through a web browser instead, see the llamaspeak tutorial: </p> <p></p>"},{"location":"tutorial_nano-vlm.html#live-streaming","title":"Live Streaming","text":"<p>These models can also be used with the Live Llava agent for continuous streaming - just substitute the desired model name below:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.video_query --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --video-input /dev/video0 \\\n    --video-output webrtc://@:8554/output\n</code></pre> <p>Then navigate your browser to <code>https://&lt;IP_ADDRESS&gt;:8050</code> after launching it with your camera.  Using Chrome or Chromium is recommended for a stable WebRTC connection, with <code>chrome://flags#enable-webrtc-hide-local-ips-with-mdns</code> disabled.</p> <p>The Live Llava tutorial shows how to enable additional features like vector database integration, image tagging, and RAG.</p>"},{"location":"tutorial_nano-vlm.html#video-sequences","title":"Video Sequences","text":"<p>The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action &amp; behavior analysis, change detection, and other temporal-based vision functions.  By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model.  The <code>vision/video.py</code> example shows how to use this:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.video \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-images 8 \\\n    --max-new-tokens 48 \\\n    --video-input /data/my_video.mp4 \\\n    --video-output /data/my_output.mp4 \\\n    --prompt 'What changes occurred in the video?'\n</code></pre>"},{"location":"tutorial_nano-vlm.html#python-code","title":"Python Code","text":"<p>For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs. </p> <p>You can use this to implement customized prompting techniques and integrate with other vision pipelines.  This code applies the same set of prompts to the latest image from the video feed.  See here for the version that does multi-image queries on video sequences.</p>"},{"location":"tutorial_nanodb.html","title":"Tutorial - NanoDB","text":"<p>Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>16GB</code> for container image</li> <li><code>40GB</code> for MS COCO dataset</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_nanodb.html#how-to-start","title":"How to start","text":""},{"location":"tutorial_nanodb.html#download-coco","title":"Download COCO","text":"<p>Just for an example, let's use MS COCO dataset:</p> <pre><code>cd jetson-containers\nmkdir -p data/datasets/coco/2017\ncd data/datasets/coco/2017\n\nwget http://images.cocodataset.org/zips/train2017.zip\nwget http://images.cocodataset.org/zips/val2017.zip\nwget http://images.cocodataset.org/zips/unlabeled2017.zip\n\nunzip train2017.zip\nunzip val2017.zip\nunzip unlabeled2017.zip\n</code></pre>"},{"location":"tutorial_nanodb.html#download-index","title":"Download Index","text":"<p>You can download a pre-indexed NanoDB that was already prepared over the COCO dataset from here:</p> <pre><code>cd jetson-containers/data\nwget https://nvidia.box.com/shared/static/icw8qhgioyj4qsk832r4nj2p9olsxoci.gz -O nanodb_coco_2017.tar.gz\ntar -xzvf nanodb_coco_2017.tar.gz\n</code></pre> <p>This allow you to skip the indexing process in the next step, and jump to starting the Web UI.</p>"},{"location":"tutorial_nanodb.html#indexing-data","title":"Indexing Data","text":"<p>If you didn't download the NanoDB index for COCO from above, we need to build the index by scanning your dataset directory:</p> <pre><code>jetson-containers run $(autotag nanodb) \\\n  python3 -m nanodb \\\n    --scan /data/datasets/coco/2017 \\\n    --path /data/nanodb/coco/2017 \\\n    --autosave --validate \n</code></pre> <p>This will take a few hours on AGX Orin.  Once the database has loaded and completed any start-up operations , it will drop down to a <code>&gt;</code> prompt from which the user can run search queries. You can quickly check the operation by typing your query on this prompt:</p> <pre><code>&gt; a girl riding a horse\n\n* index=80110   /data/datasets/coco/2017/train2017/000000393735.jpg      similarity=0.29991915822029114\n* index=158747  /data/datasets/coco/2017/unlabeled2017/000000189708.jpg  similarity=0.29254037141799927\n* index=123846  /data/datasets/coco/2017/unlabeled2017/000000026239.jpg  similarity=0.292171448469162\n* index=127338  /data/datasets/coco/2017/unlabeled2017/000000042508.jpg  similarity=0.29118549823760986\n* index=77416   /data/datasets/coco/2017/train2017/000000380634.jpg      similarity=0.28964102268218994\n* index=51992   /data/datasets/coco/2017/train2017/000000256290.jpg      similarity=0.28929752111434937\n* index=228640  /data/datasets/coco/2017/unlabeled2017/000000520381.jpg  similarity=0.28642547130584717\n* index=104819  /data/datasets/coco/2017/train2017/000000515895.jpg      similarity=0.285491943359375\n</code></pre> <p>You can press Ctrl+C to exit. For more info about the various options available, see the NanoDB container documentation.</p>"},{"location":"tutorial_nanodb.html#interactive-web-ui","title":"Interactive Web UI","text":"<p>Spin up the Gradio server:</p> <pre><code>jetson-containers run $(autotag nanodb) \\\n  python3 -m nanodb \\\n    --path /data/nanodb/coco/2017 \\\n    --server --port=7860\n</code></pre> <p>Then navigate your browser to <code>http://&lt;IP_ADDRESS&gt;:7860</code>, and you can enter text search queries as well as drag/upload images:</p> <p>To use the dark theme, navigate to <code>http://&lt;IP_ADDRESS&gt;:7860/?__theme=dark</code> instead"},{"location":"tutorial_ollama.html","title":"Tutorial - Ollama","text":"<p>Ollama is a popular LLM tool that's easy to get started with, and includes a built-in model library of pre-quantized weights that will automatically be downloaded and run using llama.cpp underneath for inference.  The ollama container was compiled with CUDA support.</p> <p></p> <ul> <li>The <code>ollama</code> client can run inside or outside container after starting the server.</li> <li>You can also run an Open WebUI server for supporting web clients.</li> <li>Supports the latest models like Llama-3 and Phi-3 Mini!</li> </ul>"},{"location":"tutorial_ollama.html#ollama-server","title":"Ollama Server","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>7GB</code> for <code>ollama</code> container image</li> <li>Space for models (<code>&gt;5GB</code>)</li> </ul> </li> </ol> <pre><code># models cached under jetson-containers/data\njetson-containers run --name ollama $(autotag ollama)\n\n# models cached under your user's home directory\ndocker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0\n</code></pre> <p>Running either of these will start the local Ollama server as a daemon in the background.  It will save the models it downloads under your mounted <code>jetson-containers/data/models/ollama</code> directory (or another directory that you override with <code>OLLAMA_MODELS</code>)</p>"},{"location":"tutorial_ollama.html#ollama-client","title":"Ollama Client","text":"<p>Start the Ollama command-line chat client with your desired model (for example: <code>llama3</code>, <code>phi3</code>, <code>mistral</code>)</p> <pre><code># if running inside the same container as launched above\n/bin/ollama run phi3\n\n# if launching a new container for the client in another terminal\njetson-containers run $(autotag ollama) /bin/ollama run phi3\n</code></pre> <p>Or you can install Ollama's binaries for arm64 outside of container (without CUDA, which only the server needs)</p> <pre><code># download the latest ollama release for arm64 into /bin\nsudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort=\"version:refname\" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama\nsudo chmod +x /bin/ollama\n\n# use the client like normal outside container\n/bin/ollama run phi3\n</code></pre>"},{"location":"tutorial_ollama.html#open-webui","title":"Open WebUI","text":"<p>To run an Open WebUI server for client browsers to connect to, use the <code>open-webui</code> container:</p> <pre><code>docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main\n</code></pre> <p>You can then navigate your browser to <code>http://JETSON_IP:8080</code>, and create a fake account to login (these credentials are only local)</p> <p></p> <p>Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the Llava page.  It gets roughly half of peak performance versus the faster APIs like NanoLLM, but is generally considered fast enough for text chat.  </p>"},{"location":"tutorial_slm.html","title":"Tutorial - Small Language Models (SLM)","text":"<p>Small Language Models (SLMs) represent a growing class of language models that have &lt;7B parameters - for example StableLM, Phi-2, and Gemma-2B.  Their smaller memory footprint and faster performance make them good candidates for deploying on Jetson Orin Nano.  Some are very capable with abilities at a similar level as the larger models, having been trained on high-quality curated datasets.</p> <p></p> <p>This tutorial shows how to run optimized SLMs with quantization using the <code>NanoLLM</code> library and MLC/TVM backend.  You can run these models through tools like <code>text-generation-webui</code> and llama.cpp as well, just not as fast - and since the focus of SLMs is reduced computational and memory requirements, here we'll use the most optimized path available.  Those shown below have been profiled:</p>"},{"location":"tutorial_slm.html#slm-benchmarks","title":"SLM Benchmarks","text":"<p>\u2022 \u00a0 The HuggingFace Open LLM Leaderboard is a collection of multitask benchmarks including reasoning &amp; comprehension, math, coding, history, geography, ect. \u2022 \u00a0 The model's memory footprint includes 4-bit weights and KV cache at full context length (factor in extra for process overhead, library code, ect) \u2022 \u00a0 The <code>Chat Model</code> is the instruction-tuned variant for chatting with in the commands below, as opposed to the base completion model. </p> <p>Based on user interactions, the recommended models to try are <code>stabilityai/stablelm-zephyr-3b</code> and <code>princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT</code>, for having output quality on par with Llama-2-7B and well-optimized neural architectures. These models have also been used as the base for various fine-tunes (for example <code>Nous-Capybara-3B-V1.9</code>) and mini VLMs. Others may not be particularly coherent.</p>"},{"location":"tutorial_slm.html#chatting-with-slms","title":"Chatting with SLMs","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;5GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol> <p>The <code>nano_llm.chat</code> program will automatically download and quantize models from HuggingFace like those listed in the table above:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT\n</code></pre> <p>\u2022 \u00a0 For models requiring authentication, use <code>--env HUGGINGFACE_TOKEN=&lt;YOUR-ACCESS-TOKEN&gt;</code> \u2022 \u00a0 Press Ctrl+C twice in succession to exit (once will interrupt bot output) </p> <p>This will enter into interactive mode where you chat back and forth using the keyboard (entering <code>reset</code> will clear the chat history)  </p> <p></p>"},{"location":"tutorial_slm.html#automated-prompts","title":"Automated Prompts","text":"<p>During testing, you can specify prompts on the command-line that will run sequentially:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model stabilityai/stablelm-zephyr-3b \\\n    --max-new-tokens 512 \\\n    --prompt 'hi, how are you?' \\\n    --prompt 'whats the square root of 900?' \\\n    --prompt 'can I get a recipie for french onion soup?'\n</code></pre> <p>You can also load JSON files containing prompt sequences, like with <code>--prompt /data/prompts/qa.json</code> (the output of which is below)</p>"},{"location":"tutorial_slm.html#results","title":"Results","text":"<p>\u2022 \u00a0 The model responses are with 4-bit quantization, and are truncated to 256 tokens for brevity. \u2022 \u00a0 These chat questions are from <code>/data/prompts/qa.json</code> (found in jetson-containers) </p>"},{"location":"tutorial_slm.html#nemotron-mini","title":"Nemotron Mini","text":"<p>Nemotron-Mini-4B-Instruct is a 4B SLM tuned for on-device deployment, RAG, and function calling and is based on Minitron-4B (pruned and distilled from Nemotron4 15B).  Inference on Jetson is available through HuggingFace Transformers and llama.cpp for quantization.  Here's how to run a local OpenAI-compatible server with llama.cpp and 4-bit quantized GGUF:</p> <pre><code>jetson-containers run $(autotag llama_cpp) \\\n  llama-server \\\n    --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n    --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n    --gpu-layers 34 \\\n    --seed 42 \\\n    --host 0.0.0.0 \\\n    --port 8080\n</code></pre> <p>For a quick test, you can navigate your browser to <code>http://JETSON_IP:8080</code>, connect other clients like Open WebUI, or have applications send requests to your server's OpenAI chat completion endpoints (i.e. from openai-python, REST, JavaScript, ect)</p> <p></p> <p>You can more easily see the performance with the <code>llama-cli</code> tool:</p> <pre><code>jetson-containers run $(autotag llama_cpp) \\\n  llama-cli \\\n    --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n    --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n    --gpu-layers 34 \\\n    --seed 42 \\\n    --ignore-eos \\\n    -n 128 \\\n    -p \"The meaning to life and the universe is\"\n</code></pre> <pre><code># Jetson AGX Orin\nllama_print_timings:        load time =    1408.27 ms\nllama_print_timings:      sample time =      70.05 ms /   128 runs   (    0.55 ms per token,  1827.32 tokens per second)\nllama_print_timings: prompt eval time =     120.08 ms /     9 tokens (   13.34 ms per token,    74.95 tokens per second)\nllama_print_timings:        eval time =    3303.93 ms /   127 runs   (   26.02 ms per token,    38.44 tokens per second)\nllama_print_timings:       total time =    3597.17 ms /   136 tokens\n</code></pre> <p>The model can also be previewed at build.nvidia.com (example client requests for OpenAI API are also there)</p>"},{"location":"tutorial_slm.html#llama-32","title":"Llama 3.2","text":"<p>Meta has released multilingual 1B and 3B SLMs in the latest additions to the Llama family with <code>Llama-3.2-1B</code> and <code>Llama-3.2-3B</code>.  These can be run with INT4 quantization using the latest MLC container for Jetson (<code>dustynv/mlc:0.1.2-r36.3.0</code>).  After having requested access to the models from Meta with your HuggingFace API key, you can download, quantize, and benchmark them with these commands:</p> <pre><code>HUGGINGFACE_KEY=YOUR_API_KEY \\\nMLC_VERSION=0.1.2 \\\n jetson-containers/packages/llm/mlc/benchmark.sh \\\n   meta-llama/Llama-3.2-1B\n</code></pre> <ul> <li><code>Llama-3.2-1B</code> \u00a0 Jetson Orin Nano 54.8 tokens/sec, Jetson AGX Orin 163.9 tokens/sec</li> <li><code>Llama-3.2-3B</code> \u00a0 Jetson Orin Nano 27.7 tokens/sec, Jetson AGX Orin 80.4 tokens/sec</li> </ul> <p>The Llama-3.2 SLMs use the same core Llama architecture as previous Llama releases (except <code>tie_word_embeddings=True</code>), so it is already supported with quantization and full performance on edge devices.  Thanks to Meta for continuing to advance open generative AI models with Llama.</p>"},{"location":"tutorial_stable-diffusion-xl.html","title":"Tutorial - Stable Diffusion XL","text":"<p>Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities.  All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices (SDXL requires &gt;= ~13GB memory)</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li><code>12.4GB</code> for SDXL models</li> </ul> </li> <li> <p>Have followed the previous <code>stable-diffusion-webui</code> tutorial and have the webserver container running.</p> </li> </ol>"},{"location":"tutorial_stable-diffusion-xl.html#downloading-sdxl-models","title":"Downloading SDXL Models","text":"<p>Stable Diffusion XL is supported through AUTOMATIC1111's <code>stable-diffusion-webui</code> with some additional settings.  First you need to download the SDXL models to your <code>jetson-containers</code> data directory (which is automatically mounted into the container)</p> <pre><code># run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device\nCONTAINERS_DIR=/path/to/your/jetson-containers\nMODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/\n\nsudo chown -R $USER $MODEL_DIR\n\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors\n</code></pre> <p>It's assumed that you already have the <code>stable-diffusion-webui</code> container and webserver running from the previous tutorial.</p>"},{"location":"tutorial_stable-diffusion-xl.html#sdxl-settings","title":"SDXL Settings","text":"<p>After the models have finished downloading, click the \ud83d\udd04 button to refresh the model list in the web UI.  Select <code>sd_xl_base_1.0.safetensors</code> from the Stable Diffusion checkpoint drop-down:</p> <p></p> <p>Then under the Generation tab, expand the Refiner section, and select <code>sd_xl_refiner_1.0.safetensors</code> from the drop-down:</p> <p></p> <p>Guidance on other relevant settings:</p> <ul> <li>Change the width/height to 1024x1024.  SDXL works best at higher resolutions, and using 512x512 often results in more simplistic/cartoonish content.  Changing image resolutions impacts the actual scene contents of the image, not just the details.</li> <li>The refiner's <code>Switch at</code> setting dictates the step at which the refiner takes over from the base model.  At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image.</li> <li>Typical <code>Sampling steps</code> are between 20-40 and <code>Switch at</code> is between 0.7-0.9.  This takes experimentation to find the best combination for the characteristics of your desired output.</li> <li>Extensive negative prompts are not as necessary in the same way as SD 1.5 was (e.g. <code>low quality, jpeg artifacts, blurry, ect</code>)</li> <li><code>CFG Scale</code> controls how closely the model conforms to your prompt versus how creative it is.</li> </ul> <p>When you get a good image, remember to save your random seed and settings so you can re-generate it later!</p>"},{"location":"tutorial_stable-diffusion-xl.html#results","title":"Results","text":"<p> photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)</p> <p> a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)</p> <p> small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)</p> <p> small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)</p> <p>Want to explore using Python APIs to run diffusion models directly? See <code>jetson-containers/stable-diffusion</code>.</p>"},{"location":"tutorial_stable-diffusion.html","title":"Tutorial - Stable Diffusion","text":"<p>Let's run AUTOMATIC1111's <code>stable-diffusion-webui</code> on NVIDIA Jetson to generate images from our prompts!</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li><code>4.1GB</code> for SD 1.5 model</li> </ul> </li> </ol>"},{"location":"tutorial_stable-diffusion.html#setup-a-container-for-stable-diffusion-webui","title":"Setup a container for stable-diffusion-webui","text":"<p>The jetson-containers project provides pre-built Docker images for <code>stable-diffusion-webui</code>.  You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> <p>Info</p> <p>JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the <code>stable-diffusion-webui</code> and <code>text-generation-webui</code>.  You can find the complete article with detailed instructions here.</p> <p></p>"},{"location":"tutorial_stable-diffusion.html#how-to-start","title":"How to start","text":"<p>Use <code>jetson-containers run</code> and <code>autotag</code> tools to automatically pull or build a compatible container image:</p> <pre><code>jetson-containers run $(autotag stable-diffusion-webui)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the webserver like this:</p> <pre><code>cd /opt/stable-diffusion-webui &amp;&amp; python3 launch.py \\\n  --data=/data/models/stable-diffusion \\\n  --enable-insecure-extension-access \\\n  --xformers \\\n  --listen \\\n  --port=7860\n</code></pre> <p>You should see it downloading the model checkpoint on the first run.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code></p>"},{"location":"tutorial_stable-diffusion.html#results-output-examples","title":"Results / Output Examples","text":""},{"location":"tutorial_stable-diffusion.html#stable-diffusion-xl","title":"Stable Diffusion XL","text":"<p>To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!</p> <p>Want to explore using Python APIs to run diffusion models directly? See <code>jetson-containers/stable-diffusion</code>.</p>"},{"location":"tutorial_text-generation.html","title":"Tutorial - text-generation-webui","text":"<p>Interact with a local AI assistant by running a LLM with oobabooga's <code>text-generaton-webui</code> on NVIDIA Jetson!</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f<sup>1</sup></p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.2GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> </ol>"},{"location":"tutorial_text-generation.html#set-up-a-container-for-text-generation-webui","title":"Set up a container for text-generation-webui","text":"<p>The jetson-containers project provides pre-built Docker images for <code>text-generation-webui</code> along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect).  You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> <p>Info</p> <p>JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the <code>stable-diffusion-webui</code> and <code>text-generation-webui</code>.  You can find the complete article with detailed instructions here.</p> <p></p>"},{"location":"tutorial_text-generation.html#how-to-start","title":"How to start","text":"<p>Use <code>jetson-containers run</code> and <code>autotag</code> tools to automatically pull or build a compatible container image:</p> <pre><code>jetson-containers run $(autotag text-generation-webui)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the webserver like this:</p> <pre><code>cd /opt/text-generation-webui &amp;&amp; python3 server.py \\\n  --model-dir=/data/models/text-generation-webui \\\n  --chat \\\n  --listen\n</code></pre> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code>.</p>"},{"location":"tutorial_text-generation.html#download-a-model-on-web-ui","title":"Download a model on web UI","text":"<p>See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using <code>download-model.py</code></p> <pre><code>jetson-containers run --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \\\n  'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'\n</code></pre> <p>From within the web UI, select Model tab and navigate to \"Download model or LoRA\" section.  </p> <p>You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub).  Then click the Download button.</p>"},{"location":"tutorial_text-generation.html#gguf-models","title":"GGUF models","text":"<p>The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.</p> <p>You can download a single model file for a particular quantization, like <code>*.Q4_K_M.bin</code>. Input the file name and hit Download button.</p> Model Quantization Memory (MB) <code>TheBloke/Llama-2-7b-Chat-GGUF</code> <code>llama-2-7b-chat.Q4_K_M.gguf</code> 5,268 <code>TheBloke/Llama-2-13B-chat-GGUF</code> <code>llama-2-13b-chat.Q4_K_M.gguf</code> 8,609 <code>TheBloke/LLaMA-30b-GGUF</code> <code>llama-30b.Q4_K_S.gguf</code> 19,045 <code>TheBloke/Llama-2-70B-chat-GGUF</code> <code>llama-2-70b-chat.Q4_K_M.gguf</code> 37,655 <p></p> <p>Info</p>"},{"location":"tutorial_text-generation.html#model-selection-for-jetson-orin-nano","title":"Model selection for Jetson Orin Nano","text":"<p>Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below.  The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano.  Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.</p>"},{"location":"tutorial_text-generation.html#load-a-model","title":"Load a model","text":"<p>After you have downloaded a model, click the \ud83d\udd04 button to refresh your model list, and select the model you want to use.</p> <p>For a GGUF model, remember to</p> <ul> <li>Set <code>n-gpu-layers</code> to <code>128</code></li> <li>Set <code>n_gqa</code> to <code>8</code> if you using Llama-2-70B (on Jetson AGX Orin 64GB)</li> </ul> <p>Then click the Load button.</p>"},{"location":"tutorial_text-generation.html#chat-template","title":"Chat Template","text":"<p>If you're using a Llama model fine-tuned for chat, like the models listed above (except for <code>LLaMA-30b</code>), you need to use the oobabooga Instruct mode and set the template.  On the Parameters tab, go to the Instruction Template sub-tab, then select <code>Llama-v2</code> from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)  </p> <p>For the base text completion models (like <code>LLaMA-30b</code>), use the Default or Notebook tab.</p> <p>Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior.  There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like <code>temperature</code> and <code>top_p</code> for controlling randomness.</p> <p>Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode).  Then you can start chatting with the LLM!</p>"},{"location":"tutorial_text-generation.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html#things-to-do-with-your-llm","title":"Things to do with your LLM","text":"<p>Here are some common test prompts for coding, math, history ect.  You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else.  Also Llama-2 is quite playful and likes to play games to test its logic abilities!</p> <pre><code>&gt;&gt; What games do you like to play?\n\nI'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:\n\n- 20 Questions:  You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions.  The goal is to guess the object in as few questions as possible.\n- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!\n- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!\n- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!\n- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own\n- Jokes: I love to tell jokes and make people laugh with my \"Dad humor\"!  Knock knock!  *giggles*\n</code></pre>"},{"location":"tutorial_text-generation.html#model-size-tested","title":"Model size tested","text":"<p>With llama.cpp, GGUF model, 4-bit quantization:</p> Model size Jetson AGX Orin 64GB Jetson AGX Orin 32GB Jetson Orin Nano 8GB 70B model \u2705 30B model \u2705 \u2705 13B model \u2705 \u2705 7B model \u2705 \u2705 \u2705 <p>Want to explore using Python APIs to run LLMs directly?  See <code>jetson-containers</code> for its LLM related packages and containers.</p> <ol> <li> <p>Limited to 7B model (4-bit quantized).\u00a0\u21a9</p> </li> </ol>"},{"location":"tutorial_ultralytics.html","title":"Tutorial - Ultralytics YOLOv8","text":"<p>Let's run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT.</p> <p>Ultralytics YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Nano (4GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 4 (L4T r32.x) JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> </ol>"},{"location":"tutorial_ultralytics.html#how-to-start","title":"How to start","text":"<p>Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.</p> JetPack 4JetPack 5JetPack 6 <pre><code>t=ultralytics/ultralytics:latest-jetson-jetpack4\nsudo docker pull $t &amp;&amp; sudo docker run -it --ipc=host --runtime=nvidia $t\n</code></pre> <pre><code>t=ultralytics/ultralytics:latest-jetson-jetpack5\nsudo docker pull $t &amp;&amp; sudo docker run -it --ipc=host --runtime=nvidia $t\n</code></pre> <pre><code>t=ultralytics/ultralytics:latest-jetson-jetpack6\nsudo docker pull $t &amp;&amp; sudo docker run -it --ipc=host --runtime=nvidia $t\n</code></pre>"},{"location":"tutorial_ultralytics.html#convert-model-to-tensorrt-and-run-inference","title":"Convert model to TensorRT and run inference","text":"<p>The YOLOv8n model in PyTorch format is converted to TensorRT to run inference with the exported model.</p> <p>Example</p> PythonCLI <pre><code>from ultralytics import YOLO\n\n# Load a YOLOv8n PyTorch model\nmodel = YOLO(\"yolov8n.pt\")\n\n# Export the model\nmodel.export(format=\"engine\")  # creates 'yolov8n.engine'\n\n# Load the exported TensorRT model\ntrt_model = YOLO(\"yolov8n.engine\")\n\n# Run inference\nresults = trt_model(\"https://ultralytics.com/images/bus.jpg\")\n</code></pre> <pre><code># Export a YOLOv8n PyTorch model to TensorRT format\nyolo export model=yolov8n.pt format=engine  # creates 'yolov8n.engine'\n\n# Run inference with the exported model\nyolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'\n</code></pre> Manufacturing Sports Wildlife Vehicle Spare Parts Detection Football Player Detection Tiger pose Detection <p>Note</p> <p>Visit the Export page to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when <code>dynamic=False</code>. To change the input source for inference, please refer to Model Prediction page. </p>"},{"location":"tutorial_ultralytics.html#benchmarks","title":"Benchmarks","text":"<p>Benchmarks of the YOLOv8 variants with TensorRT were run by Seeed Studio on their reComputer systems:</p> <p></p> Xavier NX 8GBOrin NX 16GBAGX Orin 32GB Model PyTorch FP32 FP16 INT8 YOLOv8n 32 63 120 167 YOLOv8s 25 26 69 112 YOLOv8m 11 11 33 56 YOLOv8l 6 6 20 38 Model PyTorch FP32 FP16 INT8 YOLOv8n 56 115 204 256 YOLOv8s 53 67 128 196 YOLOv8m 26 31 63 93 YOLOv8l 16 20 42 69 Model PyTorch FP32 FP16 INT8 YOLOv8n 77 192 323 385 YOLOv8s 67 119 213 303 YOLOv8m 40 56 105 145 YOLOv8l 27 38 73.5 114 <ul> <li>FP32/FP16/INT8 with TensorRT (frames per second)</li> <li>Original post with the benchmarks are found here</li> </ul>"},{"location":"tutorial_ultralytics.html#further-reading","title":"Further reading","text":"<p>To learn more, visit our comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson including benchmarks!</p> <p>Note</p> <p>Ultralytics YOLOv8 models are offered under <code>AGPL-3.0 License</code> which is an OSI-approved open-source license and is ideal for students and enthusiasts, promoting open collaboration and knowledge sharing. See the LICENSE file for more details.</p>"},{"location":"tutorial_ultralytics.html#one-click-run-ultralytics-yolo-on-jetson-orin-by-seeed-studio-jetson-examples","title":"One-Click Run Ultralytics YOLO on Jetson Orin - by Seeed Studio jetson-examples","text":""},{"location":"tutorial_ultralytics.html#quickstart","title":"Quickstart \u26a1","text":"<ol> <li> <p>Install the package:     <pre><code>pip install jetson-examples\n</code></pre></p> </li> <li> <p>Restart your reComputer:     <pre><code>sudo reboot\n</code></pre></p> </li> <li> <p>Run Ultralytics YOLO on Jetson with one command:     <pre><code>reComputer run ultralytics-yolo\n</code></pre></p> </li> <li>Enter <code>http://127.0.0.1:5001</code> or <code>http://device_ip:5001</code> in your browser to access the Web UI.     <p> </p></li> </ol> <p>For more details, please read: Jetson-Example: Run Ultralytics YOLO Platform Service on NVIDIA Jetson Orin .</p>"},{"location":"tutorial_ultralytics.html#follow-ultralytics-to-stay-updated","title":"Follow Ultralytics to stay updated!","text":""},{"location":"tutorial_voicecraft.html","title":"Tutorial - VoiceCraft","text":"<p>Let's run VoiceCraft, a Zero-Shot Speech Editing and Text-to-Speech in the Wild!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) </p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p> JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>15.6 GB</code> for <code>voicecraft</code> container image</li> <li>Space for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_voicecraft.html#how-to-start","title":"How to start","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag voicecraft)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Gradio app.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code>.</p>"},{"location":"tutorial_voicecraft.html#gradio-app","title":"Gradio app","text":"<p>VoiceCraft repo comes with Gradio demo app.</p> <ol> <li>Select which models you want to use, I recommend using 330M_TTSEnhanced on 32GB AGX Orin</li> <li>Click load, if you run it for the first time, models are downloaded from huggingface, otherwise are loaded from <code>/data</code> folder, where are saved to from previous runs</li> <li>Upload audio file of your choice (MP3/wav)</li> <li>Click transcribe, it will use whisper to get transcription along with start/end time of each word spoken</li> <li>Now you can edit the sentence, or use TTS. Click Run to generate output.</li> </ol> <p></p> <p>Warning</p> <p>For TTS it's okay to use only first few seconds of audio as prompt, since it consumes a lot of memory. On AGX 32GB Orin the maximal TTS length of generated audio is around ~16 seconds in headless mode.</p>"},{"location":"tutorial_voicecraft.html#resources","title":"Resources","text":"<p>If you want to know how it works under the hood, you can read following papers:</p> <ol> <li>VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild</li> <li>High Fidelity Neural Audio Compression</li> <li>Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers</li> </ol>"},{"location":"tutorial_whisper.html","title":"Tutorial - Whisper","text":"<p>Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.1 GB</code> for <code>whisper</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_whisper.html#how-to-start","title":"How to start","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag whisper)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server, with SSL enabled.</p> <p>Open your browser and access <code>https://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>Attention</p> <p>Note it is <code>https</code> (not <code>http</code>).</p> <p>HTTPS (SSL) connection is needed to allow <code>ipywebrtc</code> widget to have access to your microphone (for <code>record-and-transcribe.ipynb</code>).</p> <p>You will see a warning message like this.</p> <p></p> <p>Press \"Advanced\" button and then click on \"Proceed to  (unsafe)\" link to proceed to the Jupyter Lab web interface. <p></p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p>"},{"location":"tutorial_whisper.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"<p>Whisper repo comes with demo Jupyter notebooks, which you can find under <code>/notebooks/</code> directory.</p> <p><code>jetson-containers</code> also adds one convenient notebook (<code>record-and-transcribe.ipynb</code>) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio. </p> <p></p>"},{"location":"tutorial_whisper.html#record-and-transcribeipynb","title":"<code>record-and-transcribe.ipynb</code>","text":"<p>This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's <code>medium</code> model to transcribe the audio sample.</p> <p>It uses Jupyter notebook/lab's <code>ipywebrtc</code> extension to record an audio sample on your web browser.</p> <p></p> <p>Attention</p> <p>When you click the \u23fa botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.</p> <p></p> Final check <p>Once done, if you click on the \"\u26a0 Not secure\" part in the URL bar, you should see something like this.</p> <p></p>"},{"location":"tutorial_whisper.html#result","title":"Result","text":"<p>Once you go through all the steps, you should see the transcribe result in text like this.</p> <p></p>"},{"location":"vit/index.html","title":"Vision Transformers &amp; Computer Vision","text":"<p>You can quickly run these realtime Vision Transformers (ViT) and computer vision models onboard your Jetson:</p>"},{"location":"vit/index.html#efficient-vit","title":"Efficient ViT","text":""},{"location":"vit/index.html#nanosam","title":"NanoSAM","text":""},{"location":"vit/index.html#nanoowl","title":"NanoOWL","text":""},{"location":"vit/index.html#sam","title":"SAM","text":""},{"location":"vit/index.html#tam","title":"TAM","text":""},{"location":"vit/index.html#ultralytics-yolov8","title":"Ultralytics YOLOv8","text":""},{"location":"vit/tutorial_efficientvit.html","title":"Tutorial - EfficientViT","text":"<p>Let's run MIT Han Lab's EfficientViT on Jetson!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>10.9 GB</code> for <code>efficientvit</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_efficientvit.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag efficientvit)\n</code></pre>"},{"location":"vit/tutorial_efficientvit.html#usage-of-efficientvit","title":"Usage of EfficientViT","text":"<p>The official EfficientViT repo shows the complete usage information: <code>https://github.com/mit-han-lab/efficientvit#usage</code></p>"},{"location":"vit/tutorial_efficientvit.html#run-examplebenchmark","title":"Run example/benchmark","text":"<p>Inside the container, a small benchmark script <code>benchmark.py</code> is added under <code>/opt/efficientvit</code> directory by the jetson-container build process.</p> <p>It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.</p>"},{"location":"vit/tutorial_efficientvit.html#download-l2pt-model","title":"Download <code>l2.pt</code> model","text":"<pre><code>mkdir -p /data/models/efficientvit/sam/\ncd /data/models/efficientvit/sam/\nwget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt\n</code></pre> <p>The downloaded checkpoint file is stored on the <code>/data/</code> directory that is mounted from the Docker host.</p>"},{"location":"vit/tutorial_efficientvit.html#run-benchmark-script","title":"Run benchmark script","text":"<pre><code>cd /opt/efficientvit\npython3 ./benchmark.py\n</code></pre> <p>At the end you should see a summary like the following.</p> <pre><code>AVERAGE of 2 runs:\n  encoder --- 0.062 sec\n  latency --- 0.083 sec\nMemory consumption :  3419.68 MB\n</code></pre>"},{"location":"vit/tutorial_efficientvit.html#check-the-outputresult","title":"Check the output/result","text":"<p>The output image file (of the last inference result) is stored as <code>/data/benchmarks/efficientvit_sam_demo.png</code>.</p> <p>It is stored under <code>/data/</code> directory that is mounted from the Docker host. So you can go back to your host machine, and check <code>jetson-containers/data/benchmark/</code> directory.</p> <p>You should find the output like this.</p> <p></p>"},{"location":"vit/tutorial_nanoowl.html","title":"Tutorial - NanoOWL","text":"<p>Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>7.2 GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_nanoowl.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl)\n</code></pre>"},{"location":"vit/tutorial_nanoowl.html#how-to-run-the-tree-prediction-live-camera-example","title":"How to run the tree prediction (live camera) example","text":"<ol> <li> <p>Ensure you have a camera device connected</p> <pre><code>ls /dev/video*\n</code></pre> <p>If no video device is found, exit from the container and check if you can see a video device on the host side.</p> </li> <li> <p>Launch the demo     <pre><code>cd examples/tree_demo\npython3 tree_demo.py ../../data/owl_image_encoder_patch32.engine\n</code></pre></p> <p>Info</p> <p>If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.</p> <pre><code>python3 -m nanoowl.build_image_encoder_engine \\\n    data/owl_image_encoder_patch32.engine\n</code></pre> </li> <li> <p>Second, open your browser to <code>http://&lt;ip address&gt;:7860</code></p> </li> <li> <p>Type whatever prompt you like to see what works!  </p> <p>Here are some examples</p> <ul> <li>Example: <code>[a face [a nose, an eye, a mouth]]</code></li> <li>Example: <code>[a face (interested, yawning / bored)]</code></li> <li>Example: <code>(indoors, outdoors)</code></li> </ul> </li> </ol>"},{"location":"vit/tutorial_nanoowl.html#result","title":"Result","text":""},{"location":"vit/tutorial_nanosam.html","title":"Tutorial - NanoSAM","text":"<p>Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.3GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_nanosam.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag nanosam)\n</code></pre>"},{"location":"vit/tutorial_nanosam.html#run-examples","title":"Run examples","text":"<p>Inside the container, you can move to <code>/opt/nanosam</code> directory, to go through all the examples demonstrated on the repo.</p> <pre><code>cd /opt/nanosam\n</code></pre> <p>To run the \"Example 1 - Segment with bounding box\":</p> <pre><code>python3 examples/basic_usage.py \\\n    --image_encoder=\"data/resnet18_image_encoder.engine\" \\\n    --mask_decoder=\"data/mobile_sam_mask_decoder.engine\"\n</code></pre> <p>The result is saved under <code>/opt/nanosam/data/basic_usage_out.jpg</code>.</p> <p>To check on your host machine, you can copy that into <code>/data</code> directory of the container where that is mounted from the host.</p> <pre><code>cp data/basic_usage_out.jpg /data/\n</code></pre> <p>Then you can go to your host system, and find the file under <code>jetson-containers/data/basic_usage_out.jpg</code></p>"},{"location":"vit/tutorial_nanosam.html#results","title":"Results","text":""},{"location":"vit/tutorial_sam.html","title":"Tutorial - SAM (Segment Anything)","text":"<p>Let's run Meta's <code>SAM</code> on NVIDIA Jetson.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f<sup>1</sup></p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_sam.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag sam)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p>"},{"location":"vit/tutorial_sam.html#run-jupyter-notebook","title":"Run Jupyter notebook","text":"<p>In Jupyter Lab, navigate to <code>notebooks</code> and open <code>automatic_mask_generator_example.py</code> notebook.</p> <p>Create a new cell at the top, insert the model download command below and run the cell.</p> <pre><code>!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\n</code></pre> <p>Then go through executing all the cells below Set-up.</p>"},{"location":"vit/tutorial_sam.html#results","title":"Results","text":"<ol> <li> <p>The biggest <code>vit_h</code> (2.4GB) model may not ran due to OOM, but <code>vit_l</code> (1.1GB) runs on Jetson Orin Nano.\u00a0\u21a9</p> </li> </ol>"},{"location":"vit/tutorial_tam.html","title":"Tutorial - SAM (Segment Anything)","text":"<p>Let's run <code>TAM</code> to perform Segment Anything on videos on NVIDIA Jetson.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_tam.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag tam)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start TAM's web server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:12212</code>.</p>"},{"location":"vit/tutorial_tam.html#tam-web-ui","title":"TAM web UI","text":"<p>Check out the official tutorial to learn how to operate the web UI.</p>"},{"location":"vit/tutorial_tam.html#results","title":"Results","text":""},{"location":"vit/tutorial_tam.html#troubleshooting","title":"Troubleshooting","text":""},{"location":"vit/tutorial_tam.html#filenotfounderror-errno-2-no-such-file-or-directory-checkpointse2fgvi-hq-cvpr22pth","title":"<code>FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/E2FGVI-HQ-CVPR22.pth'</code>","text":"<p>You may find the TAM app fails to download a checkpoint file <code>E2FGVI-HQ-CVPR22.pth</code>.</p> <pre><code>Downloading checkpoints from Google Drive... tips: If you cannot see the progress bar, please try to download it manuall               and put it in the checkpointes directory. E2FGVI-HQ-CVPR22.pth: https://github.com/MCG-NKU/E2FGVI(E2FGVI-HQ model)\nAccess denied with the following error:\n\n        Cannot retrieve the public link of the file. You may need to change\n        the permission to 'Anyone with the link', or have had many accesses. \n\nYou may still be able to access the file from the browser:\n\n         https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \n</code></pre> <p>You can manually download the checkpoint file on your Docker host machine.</p> <pre><code>cd jetson-containers/\npip install gdown\nsource ~/.profile\ngdown https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \nmv E2FGVI-HQ-CVPR22.pth ./data/models/tam/\n</code></pre> <p>And you can try running the TAM container.</p> <pre><code>jetson-containers run $(autotag tam)\n</code></pre>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"agent_studio.html","title":"Agent Studio","text":"<p>Rapidly design and experiment with creating your own automation agents, personal assistants, and edge AI systems in an interactive sandbox for connecting multimodal LLMs, speech and vision transformers, vector databases, prompt templating, and function calling to live sensors and I/O.  Optimized for deployment onboard Jetson with on-device compute, low-latency streaming, and unified memory.</p> \u2022 Edge LLM inference with quantization and KV caching (NanoLLM) \u2022 Realtime vision/language models (ala Live Llava and Video VILA) \u2022 Speech recognition and synthesis (Whisper ASR, Piper TTS, Riva) \u2022 Multimodal vector database from NanoDB \u2022 Audio and video streaming (WebRTC, RTP, RTSP, V4L2) \u2022 Performance monitoring and profiling \u2022 Native bot-callable functions and agent tools \u2022 Extensible plugins with auto-generated UI controls \u2022 Save, load, and export pipeline presets \u2022 Native bot-callable functions and agent tools"},{"location":"agent_studio.html#running-agent-studio","title":"Running Agent Studio","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;5GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol> <pre><code>jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.studio\n</code></pre> <p>This will start the server running on your device.  You can then navigate your browser to <code>https://IP_ADDRESS:8050</code></p> <ul> <li>You can load a preset at startup with the <code>--load</code> flag (like <code>--load /data/nano_llm/presets/xyz.json</code>)         <li>The default port is 8050, but can be changed with <code>--web-port</code> (and <code>--ws-port</code> for the websocket port)</li> <li>Request access to the Llama models on HuggingFace and substitute your account's API token above.</li>"},{"location":"agent_studio.html#dev-mode","title":"Dev Mode","text":"<p>To make code changes without rebuilding the container, clone the NanoLLM sources and then mount them over <code>/opt/NanoLLM</code></p> <pre><code>git clone https://github.com/dusty-nv/NanoLLM\njetson-containers run \\\n  -v ${PWD}/NanoLLM:/opt/NanoLLM \\\n  $(autotag nano_llm)\n</code></pre> <p>You can then edit the source from outside the container.  And in the terminal that starts, you can install other packages from apt/pip/ect.</p>"},{"location":"agent_studio.html#plugins","title":"Plugins","text":"<p>The models, I/O, and logic blocks share a lightweight plugin interface, exposing a <code>process()</code> function and any runtime properties:</p> Custom Plugin<pre><code>from nano_llm import Plugin\n\nclass TextFilter(Plugin):\n    def __init__(self, search: str='bad word', replace: str='censored', **kwargs):\n        \"\"\"\n        A simple plugin performing text replacement.  Write docs because they're extracted for the UI,\n        and for the tool descriptions that go to the bot for function calling (and use the type hints)        \n\n        Args:\n          search: The string to look for an replace in incoming text.\n          replace: What to replace occurances of the search string with.\n        \"\"\"\n        super().__init__(inputs=['text'], outputs=['text'])\n\n        # set these attributes and register them as configurable\n        self.add_parameters(search=search, replace=replace)\n\n    def process(self, input, **kwargs):\n        \"\"\" Substitute the text, and the return value will be sent to the plugins connected to this. \"\"\"\n        return input.replace(self.search, self.replace)\n</code></pre> <p>Typically each plugin runs asynchronously its own thread and queue of data, although some lightweight routines are executed inline.  Usually they're in the same process, but could make requests to microservices.  Websockets are used for clients and the web UI.</p> <p>Below are descriptions of commonly-used components. Help text for these is extracted from their Python docs and shown in the UI, along with settings that you can control.  In the node editor, click on a plugin to open its grid widget or configuration window.</p>"},{"location":"agent_studio.html#llm","title":"LLM","text":"NanoLLMAutoPromptUserPromptTextStream &gt; <code>nano_llm.plugins.NanoLLM</code> <code>(source)</code><pre><code> Load quantized LLM/VLM with MLC (speed), AWQ (quality), or HF Transformers (compatability)\n\n - Inputs\n     * str, list[str], np.ndarray, torch.Tensor, cudaImage, PIL.Image  \n\n - Commands     \n     * /reset /refresh /generate (these inputs can control the chat)\n\n - Outputs\n     * delta    The latest detokenized text to be generated\n     * partial  All of the text generated in the response so far\n     * final    The complete response sent at the end of the request\n     * words    Similar to 'delta', but with at least one word at a time\n     * history  A list of the messages in the chat history (as dictionaries)\n     * tools    Enable the bot to call functions in plugins connected on this channel\n</code></pre> &gt; <code>nano_llm.plugins.AutoPrompt</code> <code>(source)</code><pre><code> Apply a template each time new data comes in, for example '&lt;image&gt; Describe the image' \n would tag each incoming image along with that text prompt (with images, usually for VLM)\n\n AutoPrompt can be used to compose structured messages that reference prior inputs, \n like `Image 1: &lt;image&gt;  Image 2:  &lt;image&gt;` or using &lt;text&gt; for last text recieved:  \n\n    The three most recent replies were:\n      * &lt;text&gt;\n      * &lt;text&gt;\n      * &lt;text&gt; \n\nThese most recent inputs are used in newest to oldest order from a LIFO queue.\n</code></pre> &gt; <code>nano_llm.plugins.UserPrompt</code> <code>(source)</code><pre><code> This plugin is for inputting text on the keyboard, either from the terminal or text box in the UI.\n\n It can also load prompts from a text or json file, and those files can reference other files to load.\n</code></pre> &gt; <code>nano_llm.plugins.TextStream</code> <code>(source)</code><pre><code> Simple plugin for viewing any stream of text from the system in a text box in the UI.  \n\n For sources emanating from ASR or LLM, can apply color highlighting to partial/final responses.\n</code></pre>"},{"location":"agent_studio.html#speech","title":"Speech","text":"Whisper ASRPiper TTSVAD FilterRiva &gt; <code>nano_llm.plugins.WhisperASR</code> <code>(source)</code><pre><code> Whisper streaming speech-to-text with TensorRT (github.com/NVIDIA-AI-IOT/whisper_trt)\n\n Supports these Whisper models:  'tiny' (39M), 'base' (74M), 'small' (244M)\n\n - Inputs\n     * audio    bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n     * final    The text transcript of the entire segment spoken in the input\n     * partial  Rolling partial transcript of the words spoken so far in the audio segment\n</code></pre> <p>The Whisper plugin is designed to be used in a pipeline with the VAD Filter to reduce erroneous output from background noise:</p> <p></p> &gt; <code>nano_llm.plugins.PiperTTS</code> <code>(source)</code><pre><code> Piper text-to-speech using CUDA and onnxruntime (github.com/rhasspy/piper)\n\n Populates a list of the available Piper models that it can download, along with the speaker voices \n for multi-speaker models.  The default model is 'en_US-libritts-high'.  Has controls for voice speed.\n\n - Inputs\n     * str      either individual words, sentences, or an entire passage to synthesize\n\n - Outputs\n     * audio    np.ndarray (int16) at 16KHz sample rate (for low-quality models) or 22.5KHz (for medium/high)\n</code></pre> <p>This PiperTTS preset is available, which uses RateLimit to stream the output at 1x realtime factor (so it can be muted later)</p> <p></p> &gt; <code>nano_llm.plugins.VADFilter</code> <code>(source)</code><pre><code> Voice activity detection model using Silero.  Drops incoming audio unless it exceeds the VAD threshold.\n\n Use this on incoming audio before ASR plugins to reduce spurious transcripts produced from background noise.\n\n - Inputs\n     * audio    bytes, list[int], np.ndarray, torch.Tensor (int16, float32)\n\n - Outputs\n     * audio    np.ndarray or torch.Tensor (only when VAD confidence &gt; threshold)\n</code></pre> <p>Riva ASR / TTS services - these run in another container over GRPC.  See these resources for setup:</p> <ul> <li><code>NGC - riva-embedded quickstart guide</code></li> <li><code>jetson-containers - riva container setup</code></li> <li><code>JetsonHacks - Speech AI on NVIDIA Jetson Tutorial</code></li> </ul>"},{"location":"agent_studio.html#audio","title":"Audio","text":"<code>AudioInputDevice</code>USB/I2S soundcard input via PulseAudio/PyAudio (for microphone directly attached to system) <code>AudioOutputDevice</code>USB/I2S soundcard output via PulseAudio/PyAudio (for speakers directly attached to system) <code>AudioRecorder</code>Save an audio stream to WAV file on the server <code>WebAudioIn</code>Recieve audio samples streamed from the client over websockets <code>WebAudioOut</code>Transmit audio samples to the client over websockets"},{"location":"agent_studio.html#video","title":"Video","text":"<code>VideoSource</code>Capture images from camera device (V4L2/CSI), network stream (RTP, RTSP), video file (MP4, MKV, AVI, FLV) <code>VideoOutput</code>Output H264/H265-encoded video to network stream (RTP, RTSP, WebRTC), display, or file (MP4, MKV, AVI, FLV) <code>VideoOverlay</code>Draw text on top of video streams for HUD or OSD-style display <code>RateLimit</code>Throttle transmission to a specified rate (can also be used with audio)"},{"location":"agent_studio.html#database","title":"Database","text":"<code>NanoDB</code>Optimized in-memory multimodal vectorDB for txt2img/img2img similarity search and image tagging [[\u2197]](tutorial_nanodb.md) <code>Deduplicate</code>Filter incoming text against previous entries with embedding model and cosine similarity over a timeout period <code>EventFilter</code>Look for keywords or search strings in streams of text, keeping track of the begin/end times that these occur."},{"location":"agent_studio.html#tips-tricks","title":"Tips &amp; Tricks","text":"<p>Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recreated in the dynamic environment of Agent Studio without needing to touch any code.  Here are some practical tips as you go about trying different combinations of model pipelines:</p>"},{"location":"agent_studio.html#ui","title":"UI","text":"<ul> <li>To add an element to the agent pipeline, either use the menus in the node editor title bar, or right click inside the editor.  </li> <li>Inserting elements via the context menu (right click) will position them at your cursor, so you needn't reposition them.  </li> <li>To remove a component from the pipeline, right click on it and then select the little X button in the top right.  </li> <li>To remove a connection between components, first select the link and then right click for the X button to appear.  </li> <li>Click on a node in the graph to open its grid widget if it implements one.  Click again to open its settings dialog.  </li> </ul>"},{"location":"agent_studio.html#save-load","title":"Save &amp; Load","text":"<ul> <li>Under the Agent menu, there is a Save button which will serialize the current pipeline to JSON.</li> <li>These presets are saved in a mounted directory on the host, under <code>jetson-containers/data/nano_llm/presets</code></li> <li>Some commonly-used subgraphs are already included to quickly add and combine (like ASR, TTS, and video VLM)</li> <li>You can load these via the Agent menu, by right-clicking inside the node editor, or with the <code>--load</code> flag at startup.</li> </ul>"},{"location":"agent_studio.html#memory","title":"Memory","text":"<ul> <li>As you are adding models, keep an eye on the system resources in the top right to make sure you don't run out of memory.  </li> <li>Due to loading times, the models are cached in memory even after you remove them from the node editor.  </li> <li>To actually free the memory, use the <code>Clear Cache</code> button in the Agent menu. </li> </ul>"},{"location":"agent_studio.html#performance","title":"Performance","text":"<ul> <li>You can monitor the system's CPU and GPU usage in the top right corner.  </li> <li>Various statistics are refreshed in the nodes to see performance of each component.</li> <li>Use the <code>RateLimiter</code> plugin to throttle data sources and balance resources.</li> <li>The <code>drop_inputs</code> setting that some plugins expose will always keep them up with the latest request.</li> </ul>"},{"location":"agent_studio.html#templating","title":"Templating","text":"<ul> <li>In your prompts and system templates, you can perform variable substitution like <code>${DATE}</code> and <code>${TIME}</code></li> <li>These can reference plugin attributes and tools by using the class selector:  <code>${HOMEASSISTANT.DEVICES}</code></li> <li>AutoPrompt can queue up inputs for structured messages, like <code>Image 1: &lt;image&gt;  Image 2:  &lt;image&gt;</code> or: <pre><code>The three most recent replies were:\n    * &lt;text&gt;\n    * &lt;text&gt;\n    * &lt;text&gt;\n</code></pre></li> </ul> <p>\ud83e\udd16 Have fun bot building!  If you need help, reach out on the Jetson Forums or GitHub Issues.</p>"},{"location":"benchmarks.html","title":"Benchmarks","text":""},{"location":"benchmarks.html#large-language-models-llm","title":"Large Language Models (LLM)","text":"<p>For running LLM benchmarks, see the <code>MLC</code> container documentation.</p>"},{"location":"benchmarks.html#small-language-models-slm","title":"Small Language Models (SLM)","text":"<p>Small language models are generally defined as having fewer than 7B parameters (Llama-7B shown for reference)  For more data and info about running these models, see the <code>SLM</code> tutorial and <code>MLC</code> container documentation.</p>"},{"location":"benchmarks.html#vision-language-models-vlm","title":"Vision Language Models (VLM)","text":"<p>This measures the end-to-end pipeline performance for continuous streaming like with Live Llava. For more data and info about running these models, see the <code>NanoVLM</code> tutorial.</p>"},{"location":"benchmarks.html#vision-transformers-vit","title":"Vision Transformers (ViT)","text":"<p>VIT performance data from [1] [2] [3]</p>"},{"location":"benchmarks.html#stable-diffusion","title":"Stable Diffusion","text":""},{"location":"benchmarks.html#riva","title":"Riva","text":"<p>For running Riva benchmarks, see ASR Performance and TTS Performance.</p>"},{"location":"benchmarks.html#vector-database","title":"Vector Database","text":"<p>For running vector database benchmarks, see the <code>NanoDB</code> container documentation.</p>"},{"location":"community_articles.html","title":"Community Projects","text":"<p>Below, you'll find a collection of guides, tutorials, and articles contributed by the community showcasing the implementation of generative AI on the Jetson platform.</p>"},{"location":"community_articles.html#github-jetbot-voice-activated-copilot-tools-with-ros2-riva-and-nanollm-9-21-2024","title":"GitHub Jetbot Voice-Activated Copilot Tools with ROS2, RIVA, and NanoLLM <sup>(9-21-2024)</sup>","text":"<p>Jen Hung Ho created ROS2 nodes for ASR/TTS and LLM/VLM on Jetson that can can be used to control JetBot, including customizable voice commands and the execution of advanced actions.  Check it out on GitHub under <code>Jen-Hung-Ho/ros2_jetbot_tools</code> and <code>Jen-Hung-Ho/ros2_jetbot_voice</code> and on the forums here.</p>"},{"location":"community_articles.html#hackster-ai-powered-photorealistic-talking-avatar-4-26-2024","title":"Hackster AI-Powered Photorealistic Talking Avatar <sup>(4-26-2024)</sup>","text":"<p>Nurgaliyev Shakhizat creates an interactive talking avatar using ASR, TTS, LLM, and Audio2Face with NVIDIA Omniverse and Unreal Engine:</p>"},{"location":"community_articles.html#hackster-an-eye-for-an-item-4-26-2024","title":"Hackster An Eye for an Item <sup>(4-26-2024)</sup>","text":"<p>Allen Tao builds an indoor robot with Isaac ROS that maps your home and remembers where things are using SLAM and NanoDB!</p>"},{"location":"community_articles.html#hackster-escalator-people-tracker-4-2-2024","title":"Hackster Escalator People Tracker <sup>(4-2-2024)</sup>","text":"<p>George Profenza &amp; team install local CV solution for detecting and tracking people in large retail spaces to drive live Generative AI graphics:</p>"},{"location":"community_articles.html#hackster-edge-style-fashion-preview-at-the-edge-4-1-2024","title":"Hackster Edge Style: Fashion Preview at the Edge <sup>(4-1-2024)</sup>","text":"<p>AI-powered app from Andrei Ciobanu shows virtual try-ons with customer images, enhancing retail shopping using Jetson Orin for speed and privacy:</p>"},{"location":"community_articles.html#hackster-cooking-meals-with-a-local-ai-assistant-on-jetson-axg-orin-4-1-2024","title":"Hackster Cooking meals with a local AI assistant on Jetson AXG Orin <sup>(4-1-2024)</sup>","text":"<p>Dimiter Kendri builds a multimodal, multi AI agent, fully local, conversational chatbot with multi agent research capabilities via speech queries:</p>"},{"location":"community_articles.html#hackster-realtime-language-segment-anything-on-jetson-orin-3-4-2024","title":"Hackster Realtime Language-Segment-Anything on Jetson Orin <sup>(3-4-2024)</sup>","text":"<p>Huy Mai enables Segment Anything (SAM) with natural language prompting using GroundingDINO for object detection:</p>"},{"location":"community_articles.html#github-japanese-nmt-translation-for-stable-diffusion-2-23-2024","title":"GitHub Japanese NMT Translation for Stable Diffusion <sup>(2-23-2024)</sup>","text":"<p>Toshihiko Aoki has created a prompt generator for <code>stable-diffusion-webui</code> that translates Japanese queries into English using a fine-tuned GPT-2 NMT model before feeding them into Stable Diffusion.  Check out the full guide on GitHub under <code>to-aoki/ja-tiny-sd-webui</code>, including the training dataset and LoRA building!</p>      Your browser does not support the video tag."},{"location":"community_articles.html#hackster-clearwater-underwater-image-enhancement-with-generative-ai-2-16-2024","title":"Hackster ClearWater: Underwater Image Enhancement with Generative AI <sup>(2-16-2024)</sup>","text":"<p>Vy Pham has created a novel denoising pipeline using a custom trained Transformer-based diffusion model and GAN upscaler for image enhancement, running on Jetson AGX Orin.  It runs interactively in a Streamlit web UI for photo capturing and the processing of images and videos.  Great work!</p>      Your browser does not support the video tag."},{"location":"community_articles.html#hackster-ai-powered-application-for-the-blind-and-visually-impaired-12-13-2023","title":"Hackster AI-Powered Application for the Blind and Visually Impaired <sup>(12-13-2023)</sup>","text":"<p>Nurgaliyev Shakhizat demonstrates a locally-hosted Blind Assistant Device running on Jetson AGX Orin 64GB Developer Kit for realtime image-to-speech translation:</p> <p></p> <p> \u00a0 Find more resources about this project here:  [Hackster] [GitHub]</p>"},{"location":"community_articles.html#daves-armoury-bringing-glados-to-life-with-robotics-and-ai-2-8-2024","title":"Dave's Armoury Bringing GLaDOS to life with Robotics and AI <sup>(2-8-2024)</sup>","text":"<p>See how DIY robotics legend Dave Niewinski from davesarmoury.com brings GLaDOS to life using Jetson AGX Orin, running LLMs onboard alongside object + depth tracking, and RIVA ASR/TTS with a custom-trained voice model for speech recognition and synthesis!  Using Unitree Z1 arm with 3D printing and StereoLabs ZED2.</p> <p> \u00a0 Find more resources about this project here:  [Forums] [GitHub]</p>"},{"location":"community_articles.html#hackster-seeed-studios-local-voice-chatbot-puts-a-speech-recognizing-llama-2-llm-on-your-jetson-2-7-2024","title":"Hackster Seeed Studio's Local Voice Chatbot Puts a Speech-Recognizing LLaMa-2 LLM on Your Jetson <sup>(2-7-2024)</sup>","text":"<p>Seeed Studio has announced the launch of the Local Voice Chatbot, an NVIDIA Riva- and LLaMa-2-based large language model (LLM) chatbot with voice recognition capabilities \u2014 running entirely locally on NVIDIA Jetson devices, including the company's own reComputer range.  Follow the step-by-step guide on the Seeed Studio wiki.</p>"},{"location":"community_articles.html#youtube-genai-nerds-react-insider-look-at-nvidias-newest-generative-ai-2-6-2024","title":"YouTube GenAI Nerds React - Insider Look at NVIDIA's Newest Generative AI <sup>(2-6-2024)</sup>","text":"<p>Watch this panel about the latest trends &amp; tech in edge AI, featuring Kerry Shih from OStream, Jim Benson from JetsonHacks, and Dusty from NVIDIA.</p>"},{"location":"community_articles.html#nvidia-bringing-generative-ai-to-life-with-nvidia-jetson-11-7-2023","title":"NVIDIA Bringing Generative AI to Life with NVIDIA Jetson <sup>(11-7-2023)</sup>","text":"<p>Watch this webinar about deploying LLMs, VLMs, ViTs, and vector databases onboard Jetson Orin for building next-generation applications using Generative AI:</p>"},{"location":"community_articles.html#jetsonhacks-jetson-ai-labs-generative-ai-playground-10-31-2023","title":"JetsonHacks Jetson AI Labs \u2013 Generative AI Playground <sup>(10-31-2023)</sup>","text":"<p>JetsonHacks publishes an insightful video that walks developers through the typical steps for running generative AI models on Jetson following this site's tutorials. The video shows the interaction with the LLaVA model. </p>"},{"location":"community_articles.html#hackster-vision2audio-giving-the-blind-an-understanding-through-ai-10-15-2023","title":"Hackster Vision2Audio - Giving the blind an understanding through AI <sup>(10-15-2023)</sup>","text":"<p>Nurgaliyev Shakhizat demonstrates Vision2Audio running on Jetson AGX Orin 64GB Developer Kit to harness the power of LLaVA to help visually impaired people:</p> <p></p>"},{"location":"community_articles.html#nvidia-generative-ai-models-at-the-edge-10-19-2023","title":"NVIDIA Generative AI Models at the Edge <sup>(10-19-2023)</sup>","text":"<p>Follow this walkthrough of the Jetson AI Lab tutorials along with coverage of the latest features and advances coming to JetPack 6 and beyond:</p> <p> \u00a0 Technical Blog - https://developer.nvidia.com/blog/bringing-generative-ai-to-life-with-jetson/</p>"},{"location":"community_articles.html#medium-how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-10-02-2023","title":"Medium How to set up your Jetson device for LLM inference and fine-tuning <sup>(10-02-2023)</sup>","text":"<p>Michael Yuan's guide demonstrating how to set up the Jetson AGX Orin 64GB Developer Kit specifically for large language model (LLM) inference, highlighting the crucial role of GPUs and the cost-effectiveness of the Jetson AGX Orin for LLM tasks.</p> <p> https://medium.com/@michaelyuan_88928/how-to-set-up-your-jetson-device-for-llm-inference-and-fine-tuning-682e36444d43</p>"},{"location":"community_articles.html#hackster-getting-started-with-ai-on-nvidia-jetson-agx-orin-dev-kit-09-16-2023","title":"Hackster Getting Started with AI on Nvidia Jetson AGX Orin Dev Kit <sup>(09-16-2023)</sup>","text":"<p>Nurgaliyev Shakhizat demonstrates <code>llamaspeak</code> on Jetson AGX Orin 64GB Developer Kit in this Hackster post:</p> <p></p>"},{"location":"community_articles.html#hackster-new-ai-tool-is-generating-a-lot-of-buzz-09-13-2023","title":"Hackster New AI Tool Is Generating a Lot of Buzz <sup>(09-13-2023)</sup>","text":"<p>Nick Bild provides an insightful introduction to the Jetson Generative AI Playground:</p> <p> https://www.hackster.io/news/new-ai-tool-is-generating-a-lot-of-buzz-3cc5f23a3598</p>"},{"location":"community_articles.html#jetsonhacks-use-these-jetson-docker-containers-tutorial-09-04-2023","title":"JetsonHacks Use These! Jetson Docker Containers Tutorial <sup>(09-04-2023)</sup>","text":"<p>JetsonHacks has a in-depth tutorial on how to use <code>jetson-containers</code> and even show <code>text-generation-webui</code> and <code>stable-diffusion-webui</code> containers in action!</p>"},{"location":"community_articles.html#hackster-llama-2-llms-w-nvidia-jetson-and-textgeneration-web-ui-08-17-2023","title":"Hackster LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui <sup>(08-17-2023)</sup>","text":"<p>Paul DeCarlo demonstrates 13B and 70B parameter LLama 2 models running locally on Jetson AGX Orin 64GB Developer Kit in this Hackster post:</p> <p></p>"},{"location":"community_articles.html#hackster-running-a-chatgpt-like-llm-llama2-on-a-nvidia-jetson-cluster-08-14-2023","title":"Hackster Running a ChatGPT-Like LLM-LLaMA2 on a Nvidia Jetson Cluster <sup>(08-14-2023)</sup>","text":"<p>Discover how to run a LLaMA-2 7B model on an NVIDIA Jetson cluster in this insightful tutorial by Nurgaliyev Shakhizat:</p> <p></p>"},{"location":"community_articles.html#jetsonhacks-speech-ai-on-nvidia-jetson-tutorial-08-07-2023","title":"JetsonHacks Speech AI on NVIDIA Jetson Tutorial <sup>(08-07-2023)</sup>","text":"<p>JetsonHacks gives a nice introduction to NVIDIA RIVA SDK and demonstrate its automated speech recognition (ASR) capability on Jetson Orin Nano Developer Kit.</p>"},{"location":"community_articles.html#hackster-llm-based-multimodal-ai-w-azure-open-ai-nvidia-jetson-07-12-2023","title":"Hackster LLM based Multimodal AI w/ Azure Open AI &amp; NVIDIA Jetson <sup>(07-12-2023)</sup>","text":"<p>Learn how to harness the power of Multimodal AI by running Microsoft JARVIS on an Jetson AGX Orin 64GB Developer Kit, enabling a wide range of AI tasks with ChatGPT-like capabilities, image generation, and more, in this comprehensive guide by Paul DeCarlo.</p> <p></p>"},{"location":"community_articles.html#hackster-how-to-run-a-chatgpt-like-llm-on-nvidia-jetson-board-06-13-2023","title":"Hackster How to Run a ChatGPT-Like LLM on NVIDIA Jetson board <sup>(06-13-2023)</sup>","text":"<p>Nurgaliyev Shakhizat explores voice AI assistant on Jetson using FastChat and VoskAPI.</p> <p></p>"},{"location":"getting-started.html","title":"Getting started","text":""},{"location":"hello_ai_world.html","title":"Hello AI World","text":"<p>Hello AI World is an in-depth tutorial series for DNN-based inference and training of image classification, object detection, semantic segmentation, and more.  It is built on the <code>jetson-inference</code> library using TensorRT for optimized performance on Jetson. </p> <p></p> <p>It's highly recommended to familiarize yourself with the concepts of machine learning and computer vision before diving into the more advanced topics of generative AI here on the Jetson AI Lab.  Many of these models will prove useful to have during your development.</p> <p></p> <p></p> <p>HELLO AI WORLD &gt;&gt; https://github.com/dusty-nv/jetson-inference</p>"},{"location":"initial_setup_jon.html","title":"Initial Setup Guide for Jetson Orin Nano Developer Kit","text":"<p>Note</p> <p>This guide is to supplement the official Jetson Orin Nano Developer Kit Getting Started Guide.</p> <p>The NVIDIA\u00ae Jetson Orin Nano\u2122 Developer Kit is a perfect kit to start your journey of local generative AI evaluation and development.</p> <p>This guide explains the complete flow from opening the box, updating the firmware if needed, flashing the latest JetPack 6.0 GA image on SD card, and the initial software setup, so that you will be ready for tutorials listed on this site and other AI projects.</p>"},{"location":"initial_setup_jon.html#check-your-inventory","title":"Check your inventory","text":"<p>Following item is needed or highly desired to set up your Jetson Orin Nano Developer Kit. If you don't have them in your inventory, you want to arrange them and come back to this guide once they are available.</p> <p>What not come in the box - What you need/want to prepare</p>"},{"location":"initial_setup_jon.html#storage","title":"Storage","text":"<ul> <li> microSD card (64GB or bigger) </li> <li> NVMe SSD (Optional, but highly recommended for following tutorials on this site)</li> </ul>"},{"location":"initial_setup_jon.html#mean-to-access-terminal","title":"Mean to access terminal","text":"<p>You need either of the following set:</p> <ul> <li> DisplayPort cable,  DisplayPort capable monitor and an USB keyboard</li> <li> DisplayPort to HDMI cable and HDMI capable monitor (or TV) and an USB keyboard</li> <li> USB to TTL Serial cable  (Advanced)</li> </ul>"},{"location":"initial_setup_jon.html#open-the-box","title":"Open the box","text":"<p>What you find in the box</p> <p></p>"},{"location":"initial_setup_jon.html#jetson-orin-nano-developer-kit","title":"Jetson Orin Nano Developer Kit","text":"<p>The Jetson Orin Nano Developer Kit consists of Jetson Orin Nano module (enlarged SO-DIMM form factor), and the reference carrier board.</p> <p>It is designed to use a microSD card as the primary storage, thus the module (that has a big black heat sink with a fan) has a microSD card slot at the bottom side of the module.</p>"},{"location":"initial_setup_jon.html#19v-dc-power-supply","title":"19V DC power supply","text":""},{"location":"initial_setup_jon.html#overall-flow","title":"Overall flow","text":"<p>Jetson Orin Nano Initial Setup Flowchart (Click to expand)</p> <pre><code>flowchart\n    A(start) --&gt; B{1. Check&lt;br&gt;Manufactured after May 2024?&lt;br&gt;i.e. Jetson UEFI Firmware&lt;br&gt;newer than version 36.0}\n    B --[YES] --&gt; O[6. Flash JetPack 6.0 GA image on microSD card]\n    B --[No] --&gt; C[2. Flash JetPack 5.1.3 image on microSD card]\n    C --&gt; D[3. Reboot] --&gt; E{{Firmware update during reboot}}\n    E --&gt; F[4. Run QSPI updater] --&gt; G[5. Reboot] --&gt; H{{Firmware update during reboot}}\n    H --&gt; O\n    O --&gt; P(7. Start developing on JetPack 6.0) \n\n    style C fill:#fee\n    style D fill:#DEE,stroke:#333\n    style G fill:#DEE,stroke:#333\n    style F stroke-width:4px\n    style E stroke-width:2px,stroke-dasharray: 5 5\n    style H stroke-width:2px,stroke-dasharray: 5 5\n    style O fill:#fee</code></pre>"},{"location":"initial_setup_jon.html#1-check-if-jetson-uefi-firmware-version-363","title":"1. Check if Jetson UEFI Firmware version &gt; <code>36.3</code>","text":"<p>Your Jetson Orin Nano Developer Kit may have the latest firmware (\"Jetson UEFI firmware\" on QSPI-NOR flash memory) flashed at the factory.</p> <p>If not, we need to go through a set of procedures to upgrade to the latest firmware. (Luckily, we can now do this all just on Jetson, meaning we don't need to use a host Ubuntu PC any more!)</p> <p>So let's first check the version of your Jetson UEFI Firmware. You can take one of the following methods.</p>  Monitor-attached Headless\ud83d\ude01I'm feeling lucky <ol> <li>Connect your monitor and USB keyboard to your developer kit.</li> <li>Turn on the developer kit by plugging in the bundled DC power supply</li> <li>Repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor</li> <li>You should see UEFI setup menu screen</li> <li>Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware</li> </ol> <ol> <li>Connect USB to TTL Serial cable onto the following pins on <code>J14</code> \"button\" header of carrier board located under the Jetson module. <ul> <li><code>RXD</code> (Pin 3)  Adafruit adaptor cable Green </li> <li><code>TXD</code> (Pin 4)  Adafruit adaptor cable White </li> <li><code>GND</code> (Pin 7)  Adafruit adaptor cable Black <p>For the detail, refer to Jetson Orin Nano Developer Kit Carrier Board Specification.</p> </li> </ul> </li> <li>On your PC, run your console monitor program and open the USB serial port.</li> <li>Power on the developer kit by plugging in the bundled DC power supply</li> <li>On the PC console, repeatedly press Esc key on the keyboard, especially after NVIDIA logo boot splash screen first appears on the monitor</li> <li>You should see UEFI setup menu screen</li> <li>Check the third line from the top (below \"Not specified\"), which should be the version number of Jetson UEFI firmware</li> </ol> <p>You could skip to 6. Flash JetPack 6.0 GA image onto your microSD card, and try your luck to see if your Jetson just boots your Jetson Orin Nano Developer Kit up to the initial software set up (OEM-config).</p>"},{"location":"initial_setup_jon.html#360-upgrade-the-jetson-uefi-firmware-to-36x","title":"[&lt; 36.0] Upgrade the Jetson UEFI firmware to 36.x","text":"<p>Attention</p> <p>Select the appropriate tab below based on your firmware version you found in the above step.</p> <p>If you found your Jetson Orin Nano needs its firmware updated to run JetPack 6, click \" Firmware &lt; 36.0\" tab, and then additional step 2 to 5 will appear for you to follow.</p> <p>If you know your Jetson Orin Nano has the latest firmware, stay on \" Firmware 36.x\" tab, and skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)</p>  Firmware 36.x Firmware &lt; 36.0 <p>Your Jetson Orin Nano has the latest firmware that is ready for JetPack 6 SD card.</p> <p>Skip to the next section (6. Flash JetPack 6.0 GA image onto your microSD card)</p> <p>Your Jetson Orin Nano needs its firmware updated in order to make JetPack 6 SD card work.</p> <p>Perform the following steps (2 to 5).</p>"},{"location":"initial_setup_jon.html#2-flash-jetpack-513-image-onto-your-microsd-card","title":"2. Flash JetPack 5.1.3 image onto your microSD card","text":"<p>First we need to run JetPack 5.1.3 in order let its <code>nvidia-l4t-bootloader</code> package to get its bootloader/firmware updater activated, so that the firmware update automatically runs the next time it reboots.</p> <ol> <li> <p>Download SD card image on to your PC</p> <p>On your PC, download JetPack 5.1.3 image for Jetson Orin Nano Developer Kit from the official JetPack 5.1.3 page or from the below direct link button.</p> <p>Warning</p> <p>NVIDIA had updated the JetPack 5.1.3 image on 5/28/2024, as the old version had some issue and the following process did not work.So please download and use the latest image (the new file name is <code>JP513-orin-nano-sd-card-image_b29.zip</code>).</p> <p>Jetson Orin Nano Developer KitJetPack 5.1.3 image</p> </li> <li> <p>Use Balena Etcher to flash image to SD card</p> <p>If you don't have Balena Etcher on your PC, download from Balena official site.</p> <p></p> </li> <li> <p>Insert the flashed microSD card into the slot on Jetson module</p> <p></p> </li> </ol>"},{"location":"initial_setup_jon.html#3-power-on-and-reboot-to-ensure-firmware-gets-updated-to-50-jetpack-513","title":"3. Power-on and reboot to ensure firmware gets updated to 5.0 (JetPack 5.1.3)","text":"<ol> <li> <p>Power-on</p> <p>Turn on the Jetson Orin Nano Developer Kit with JetPack 5.1.3 SD card inserted by plugging in the DC power supply.</p> </li> <li> <p>Complete the initial software setup (<code>oem-config</code>)</p> </li> <li> <p>Ensure firmware update is scheduled.</p> <p>Once Jetson boots into Jetson Linux system, a background service automatically runs to schedule a firmware update (if needed) to be performed during the next boot-up process.</p> <p>Once you see the following, or just wait about 5 minutes after powering on to ensure the scheduling is done, reboot.</p>  GUI CUI <p></p> <pre><code>$ sudo systemctl status nv-l4t-bootloader-config\n[sudo] password for jetson: \n\u25cf nv-l4t-bootloader-config.service - Configure bootloader service\n    Loaded: loaded (/etc/systemd/system/nv-l4t-bootloader-config.service; enabled; vendor preset: enabled)\n    Active: inactive (dead) since Fri 2024-05-03 13:36:13 PDT; 1min 57s ago\n    Process: 11439 ExecStart=/opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -v (code=exited, status=0/SUCCESS)\nMain PID: 11439 (code=exited, status=0/SUCCESS)\n</code></pre> </li> <li> <p>Reboot</p> <p>Reboot your Jetson Orin Nano Developer Kit.</p> <p>You should see the following during the boot up process.</p>  With monitor Headless (serial) <p></p> <p></p> <p>Once done, you will boot into JetPack 5.1.3 (again), with underlying firmware updated to <code>5.0-35550185</code>.</p> </li> </ol>"},{"location":"initial_setup_jon.html#4-install-and-run-qspi-updater-package","title":"4. Install and run QSPI Updater package","text":"<ol> <li> <p>Double-check your firmware version is up to date (<code>35.5.0</code> = JetPack 5.1.3)</p> <p>Once it reboots back into Jetson Linux system, on Jetson terminal, run the following:</p> <pre><code>sudo nvbootctrl dump-slots-info\n</code></pre> <p>You should see something like the following, with the Current version indicating <code>35.5.0</code>.</p> <pre><code>Current version: 35.5.0\nCapsule update status: 0\nCurrent bootloader slot: A\nActive bootloader slot: A\nnum_slots: 2\nslot: 0,             status: normal\nslot: 1,             status: normal\n</code></pre> </li> <li> <p>Install QSPI Updater Debian package to trigger another (final) firmware update</p> <p>On Jetson terminal, run the following:</p> <pre><code>sudo apt-get install nvidia-l4t-jetson-orin-nano-qspi-updater\n</code></pre> <p>Installing the <code>nvidia-l4t-jetson-orin-nano-qspi-updater</code> automatically runs its script to schedule another (final) firmware update to be performed during the next boot process, so that the firmware is ready for JetPack 6.</p> </li> </ol>"},{"location":"initial_setup_jon.html#5-reboot-and-power-off-the-developer-kit","title":"5. Reboot and power-off the developer kit","text":"<ol> <li> <p>Reboot </p> <p>Once the QSPI update is scheduled, reboot your Jetson Orin Nano Developer Kit.</p> </li> <li> <p>Observe update</p> <p>You can observe the update during the boot up process.</p> </li> <li> <p>Power off</p> <p>Once the update is done, it reboots and tries to boot, however it will get stuck UNLESS you change the SD card to JetPack 6 one. </p> <p>Therefore you should just power off the developer kit by disconnecting the DC power supply.</p> <p>Attention</p> <p>This part may look very confusing as neither the attached monitor nor the debug UART shows any explicit message on what action to take next.</p> <p>What is going on here is that the Jetson's firmware (inside the QSPI-NOR flash memory) is now updated, ready for the JetPack 6 SD card, however it is now incompatible with JetPack 5.1.3 SD card left in the Jetson module's slot, so after the reboot it gets stuck in the boot process.</p> <p>So there is nothing problematic with this boot halt (or endless rebooting). We just need to power-off and insert a new SD card.</p> </li> </ol>"},{"location":"initial_setup_jon.html#6-flash-jetpack-60-ga-image-onto-your-microsd-card","title":"6. Flash JetPack 6.0 GA image onto your microSD card","text":"<p>Once we know the onboard firmware is up-to-date and ready for JetPack 6, we can boot Jetson Orin Nano Developer Kit with a microSD card for JetPack 6.</p> <ol> <li> <p>Download SD card image on to your PC</p> <p>On your PC, download JetPack 6.0 GA image for Jetson Orin Nano Developer Kit from the official JetPack 6.0 page or from the below direct link button.</p> <p>Jetson Orin Nano Developer KitJetPack 6.0 GA image</p> </li> <li> <p>Use Balena Etcher to flash image to SD card</p> <p>Insert your microSD card into your PC's SD card slot, and use Balena Etcher to flash the SD card with the image you just downloaded.</p> <p>If you don't have Balena Etcher on your PC, download from Balena official site.</p> <p></p> </li> </ol>"},{"location":"initial_setup_jon.html#7-power-on-to-start-developing-on-jetpack-6","title":"7. Power on to start developing on JetPack 6","text":"<ol> <li> <p>Insert the JetPack 6 microSD card into the slot on Jetson module</p> <p></p> </li> <li> <p>Power-on by plugging the DC power supply</p> </li> <li> <p>Complete the initial software setup (<code>oem-config</code>)</p> </li> </ol> <p>\ud83c\udf8a Congratulations!  Your Jetson Orin Nano Developer Kit is set up with JetPack 6 SD card and you are ready to develop on JetPack 6.</p>"},{"location":"initial_setup_jon.html#next-step","title":"Next step","text":""},{"location":"initial_setup_jon.html#nvme-ssd-installation","title":"NVMe SSD installation","text":"<p>Take a look at this page for installing NVMe SSD and setting up Docker with it.</p>"},{"location":"lerobot.html","title":"HuggingFace LeRobot","text":"<p>Let's run HuggingFace <code>LeRobot</code> to train Transformer-based action diffusion policies and ACT onboard NVIDIA Jetson.  These models learn to predict actions for a particular task from visual inputs and prior trajectories, typically collected during teleoperation or in simulation.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 GA (L4T r36.3) JetPack 6.1 (L4T r36.4)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>16.5GB</code> for <code>lerobot</code> container image</li> <li>Space for models (<code>&gt;2GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"lerobot.html#work-with-real-world-robots-before-starting-containers","title":"Work with Real-World Robots - Before starting containers","text":"<p>This section gives the guide on how you can work through the LeRobot official example of Getting Started with Real-World Robots (<code>7_get_started_with_real_robot.md</code>) on your Jetson.</p> <p>Tip</p> <p>It's recommended to work on your Jetson in monitor-attached mode.</p> <p><code>lerobot</code> is designed to show camera view in windows and playback TTS audio while capturing dataset, so it is more convenient to setup your Jetson with its monitor (and speakers) attached to Jetson.d</p>"},{"location":"lerobot.html#a-check-jetson-containers-location","title":"a. Check <code>jetson-container</code>'s location","text":"<p>Through out the course of all the workflows of <code>lerobot</code>, we will be generating a lot of data, especially for capturing dataset.</p> <p>We will clone the <code>lerobot</code> directory on host and mount the directory in the container to keep all the data persistant, but first make sure your <code>jetson-containers</code> directory is placed on your SSD, not on your eMMC or microSD card.</p> <p>If you have created the <code>jetson-containers</code> directory on eMMC or microSD card (likely the case if you first set up your Jetson device without SSD first and later added SSD), then use the <code>rsync</code> command to move the entire directory under SSD mount point.</p> <pre><code>rsync -aHAX /home/jetson/jetson-containers/ /ssd/jetson-containers/\n</code></pre> <p>Then run the installer again.</p> <pre><code>bash /ssd/jetson-containers/install.sh\n</code></pre>"},{"location":"lerobot.html#b-create-lerobot-dir-on-host","title":"b. Create <code>lerobot</code> dir on host","text":"<p>As described above, we will setup the <code>lerobot</code> directory under <code>data</code> directory of <code>jetson-containers</code> for monting it inside the container so that generated data persist.</p> <pre><code>cd jetson-containers\n./packages/robots/lerobot/clone_lerobot_dir_under_data.sh\n./packages/robots/lerobot/copy_overlay_files_in_data_lerobot.sh\n</code></pre>"},{"location":"lerobot.html#c-pulseaudio-setup","title":"c. PulseAudio setup","text":"<p>LeRobot's dataset capture flow (<code>control_robot.py</code>) utilizes Speech Dispatcher to use espeak TTS, in order to give operators audio queues for notifying the status and signaling the next operation. It's actually very helpful.</p> <p>Speech Dispatcher utilizes Pulse Audio, so rather than just sharing the <code>/dev/snd</code> device when <code>docker run</code> (which is good for ALSA), we need to add the following arguments.</p> <pre><code>   --device /dev/snd \\\n   -e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native \\\n   -v ${XDG_RUNTIME_DIR}/pulse:${XDG_RUNTIME_DIR}/pulse \\\n</code></pre> <p>This is already added to <code>run.sh</code> of <code>jetson-containers</code>, however, we need to edit <code>/etc/pulse/default.pa</code> in order to allow the root user access to the socket file.</p> <pre><code>sudo vi /etc/pulse/default.pa\n</code></pre> <p>Find the section loading <code>module-native-protomocl-unix</code> and add <code>auth-anonymous=1</code> </p> <pre><code>### Load several protocols\n.ifexists module-esound-protocol-unix.so\nload-module module-esound-protocol-unix auth-anonymous=1\n.endif\nload-module module-nativ\n</code></pre> <p>Then restart PulseAudio service to make the config take effect.</p> <pre><code>pulseaudio --kill\npulseaudio --start\n</code></pre> <p>For troubleshootings or details, please check the <code>docs.md</code> of <code>speech-dispatcher</code> package.</p>"},{"location":"lerobot.html#d-set-udev-rule-for-acm-devices","title":"d. Set udev rule for ACM devices","text":"<p>It is more convenient if the lerobot programs can always find the device of leader and follower arm with unique names.</p> <p>For that, we set an udev rule so that arms always get assigned the same device name as following. This is first done on Jetson host side. </p> <ul> <li><code>/dev/ttyACM_kochleader</code>   : Leader arm</li> <li><code>/dev/ttyACM_kochfollower</code> : Follower arm</li> </ul> <p>First only connect the leader arm to Jetson and record the serial ID by running the following:</p> <pre><code>ll /dev/serial/by-id/\n</code></pre> <p>The output should look like this.</p> <pre><code>lrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_BA98C8C350304A46462E3120FF121B06-if00 -&gt; ../../ttyACM1\n</code></pre> <p>Then edit the first line of <code>./99-usb-serial.rules</code> like the following.</p> <p>You can find the template of this file under <code>./packages/robots/lerobot</code> directory.</p> <pre><code>SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"00000000000000000000000000000000\", SYMLINK+=\"ttyACM_kochfollower\"\n</code></pre> <p>Now disconnect the leader arm, and then only connect the follower arm to Jetson.</p> <p>Repeat the same steps to record the serial to edit the second line of <code>99-usb-serial.rules</code> file.</p> <pre><code>$ ll /dev/serial/by-id/\nlrwxrwxrwx 1 root root 13 Sep 24 13:07 usb-ROBOTIS_OpenRB-150_483F88DC50304A46462E3120FF0C081A-if00 -&gt; ../../ttyACM0\n$ vi ./packages/robots/lerobot\n</code></pre> <p>You should have <code>./99-usb-serial.rules</code> now looking like this:</p> <pre><code>SUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"BA98C8C350304A46462E3120FF121B06\", SYMLINK+=\"ttyACM_kochleader\"\nSUBSYSTEM==\"tty\", ATTRS{idVendor}==\"2f5d\", ATTRS{idProduct}==\"2202\", ATTRS{serial}==\"483F88DC50304A46462E3120FF0C081A\", SYMLINK+=\"ttyACM_kochfollower\"\n</code></pre> <p>Finally copy this under <code>/etc/udev/rules.d/</code> (of host), and restart Jetson.</p> <pre><code>sudo cp ./99-usb-serial.rules /etc/udev/rules.d/\nsudo reboot\n</code></pre> <p>After reboot, check if we now have achieved the desired fixed simlinks names for the arms.</p> <pre><code>ls -l /dev/ttyACM*\n</code></pre> <p>You should get something like this:</p> <pre><code>crw-rw---- 1 root dialout 166, 0 Sep 24 17:20 /dev/ttyACM0\ncrw-rw---- 1 root dialout 166, 1 Sep 24 16:13 /dev/ttyACM1\nlrwxrwxrwx 1 root root         7 Sep 24 17:20 /dev/ttyACM_kochfollower -&gt; ttyACM0\nlrwxrwxrwx 1 root root         7 Sep 24 16:13 /dev/ttyACM_kochleader -&gt; ttyACM1\n</code></pre>"},{"location":"lerobot.html#e-optional-csi-cameras","title":"e. (Optional) CSI cameras","text":"<p>If you plan to use CSI cameras (not USB webcams) for data capture, you will use the new <code>--csi2webcam</code> options of <code>jetson-containers</code>, which exposes V4L2loopback devices that performs like USB webcams (MJPEG) for CSI cameras using Jetson's hardware JPEG encoder.</p> <p>This feature require some packages to be installed.</p> <pre><code>sudo apt update &amp;&amp; sudo apt install v4l2loopback-dkms v4l-utils\n</code></pre>"},{"location":"lerobot.html#f-increse-the-swap-file-size","title":"f. Increse the swap file size","text":"<p>You may ran out of memory when are setting up to perform ACT model training.</p> <pre><code>swapoff -a -v\nsudo rm /swfile\nsudo systemctl disable nvzramconfi\nsudo fallocate -l 8G /ssd/8GB.swap\nsudo chmod 600 /ssd/8GB.swap\nsudo mkswap /ssd/8GB.swap\nsudo echo \"/ssd/8GB.swap swap swap defaults 0 0\" &gt;&gt; /etc/fstab\nsudo reboot\n</code></pre>"},{"location":"lerobot.html#g-starting-the-lerobot-container","title":"g. Starting the <code>lerobot</code> container","text":"USB webcamsCSI cameras <pre><code>cd jetson-containers\n./run.sh \\\n  -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n  $(./autotag lerobot)\n</code></pre> <pre><code>cd jetson-containers\n./run.sh \\\n  --csi2webcam --csi-capture-res='1640x1232@30' --csi-output-res='640x480@30' \\\n  -v ${PWD}/data/lerobot/:/opt/lerobot/ \\\n  $(./autotag lerobot)\n</code></pre>"},{"location":"lerobot.html#work-with-real-world-robots-once-in-container","title":"Work with Real-World Robots - Once in container","text":"<p>JupyerLab tip</p> <p>Inside the <code>lerobot</code> container, JupyterLab server process starts.</p> <p>You can access with <code>http://localhost:8888/</code> (or <code>http://&lt;IP_ADDRESS&gt;:8888/</code> from other PC on the same network).</p> <p>In the <code>notebooks</code>, there are some Jupyter notebooks for each segment of the official tutorial Getting Started with Real-World Robots (<code>7_get_started_with_real_robot.md</code>).</p> <p></p> <p>Please note that some of them (like <code>notebooks/7-2_real-robot_configure-motors.ipynb</code>) can be used as a real work notebook to execute python codes and scritps convniently inside the notebook along with instructions (rather than switching to console).</p> <p>However, keep in mind that you are encouraged to always check the original official tutorial, and some operation like training is much better executed on console.</p> <p>Bash history tip</p> <p>Inside the container, on the console, you can press Up key to scroll through some of the frequently used commands pre-registered in bash history.</p>"},{"location":"lerobot.html#q-setup-audio","title":"q. Setup audio","text":"<p>Check if PulseAudio is available.</p> <pre><code>pactl info\n</code></pre> <p>If you need to set the default audio output device, use <code>set-default-sink</code>.</p> <pre><code>pactl list short sinks\npactl set-default-sink [SINK_NAME_OR_INDEX]\n</code></pre>"},{"location":"lerobot.html#1-order-and-assemble-your-koch-v11","title":"1. Order and Assemble your Koch v1.1","text":"<p>You can order the Koch v1.1 kits from ROBOTIS. (Note: they don't come with 3d printed parts)</p> <ul> <li>Follower arm </li> <li>Leader arm</li> </ul> <p>TODO:</p> <ul> <li> Document Jetson unique hardware setup</li> <li> Share custom 3D print models</li> </ul>"},{"location":"lerobot.html#2-configure-motors-calibrate-arms-teleoperate-your-koch-v11","title":"2. Configure motors, calibrate arms, teleoperate your Koch v1.1","text":"<p>Follow the Jupyter notebook <code>7-2_real-robot_configure-motors.ipynb</code>.</p>"},{"location":"lerobot.html#3-record-your-dataset-and-visualize-it","title":"3. Record your Dataset and Visualize it","text":"<p>You should mostly operate on the container's terminal.</p> <p>Follow the official document's section.</p> <p>Camera config tip</p> <p>The official document demonstrates the two camera positions, one at the top (\"phone\") and the other at directly in front facing the arm (\"laptop\").</p> <p>In our trials, this camera placement worked, but we needed to make the camera zoom-up to the scene so that they capture better spacial resolution.</p> <p>Another thing worth experimenting is the wrist cam. More to come later.</p> <p>Tip</p> <p>Following commands are registered in Bash history inside the <code>lerobot</code> container.</p> <pre><code>wandb login\nexport HF_USER=\npython lerobot/scripts/control_robot.py record \\\n  --robot-path lerobot/configs/robot/koch.yaml \\\n  --fps 30 \\\n  --root data \\\n  --repo-id ${HF_USER}/koch_test_$(date +%Y%m%d_%H%M%S) \\\n  --tags tutorial \\\n  --warmup-time-s 5 \\\n  --episode-time-s 30 \\\n  --reset-time-s 30 \\\n  --num-episodes 10\n</code></pre> <p>Tip</p> <p>If you plan to perfom training on a different machine, <code>scp</code> the dataset directory.</p> To another JetsonTo other PC <pre><code>scp -r data/lerobot/data/${HF_USER}/koch_test_01/ &lt;USER&gt;@&lt;IP&gt;:/ssd/jetson-containers/data/lerobot/data/${HF_USER}/\n</code></pre> <pre><code>scp -r data/lerobot/data/${HF_USER}/koch_test_01/ &lt;USER&gt;@&lt;IP&gt;:/home/&lt;USER&gt;/lerobot/data/${HF_USER}/\n</code></pre>"},{"location":"lerobot.html#4-train-a-policy-on-your-data","title":"4. Train a policy on your data","text":"<p>You should operate on ther container's terminal.</p> <p>Follow the official document's section.</p> <p>Tip</p> <pre><code>wandb login\nDATA_DIR=data python lerobot/scripts/train.py \\\n    dataset_repo_id=${HF_USER}/koch_test \\\n    policy=act_koch_real \\\n    env=koch_real \\\n    hydra.run.dir=outputs/train/act_koch_test \\\n    hydra.job.name=act_koch_test \\\n    device=cuda \\\n    wandb.enable=true\n</code></pre> <p>Tip</p> <p>If you perform the training on other Jetson or PC, <code>scp</code> the outputs directory content back to the orinal Jetson that has the leader and follower arm attached.</p> <pre><code>scp -r outputs/train/act_koch_test_01/ &lt;USER&gt;@&lt;IP&gt;:/ssd/jetson-containers/data/lerobot/outputs/train/ \n</code></pre>"},{"location":"lerobot.html#5-evaluate-your-policy","title":"5. Evaluate your policy","text":"<p>You should operate on the container's terminal.</p> <p>Follow the official document's section.</p> <p>Tip for a. Use <code>koch.yaml</code> and our <code>record</code> function</p> <p>Modify the command in the bash history to add <code>-p</code> arugment to points to the policy checkpoint.</p> <pre><code>python lerobot/scripts/control_robot.py record \\\n  --robot-path lerobot/configs/robot/koch.yaml \\\n  --fps 30 \\\n  --root data \\\n  --repo-id ${HF_USER}/eval_koch_test_01 \\\n  --tags tutorial eval \\\n  --warmup-time-s 5 \\\n  --episode-time-s 30 \\\n  --reset-time-s 30 \\\n  --num-episodes 10 \\\n  -p outputs/train/act_koch_test/checkpoints/last/pretrained_model \\\n  --run-compute-stats 0\n</code></pre> <p>Tip for Visualize evaluation afterwards</p> <pre><code>python lerobot/scripts/visualize_dataset.py \\\n  --root data \\\n  --repo-id ${HF_USER}/eval_koch_test\n</code></pre> <p></p> <p>If everything goes well, you should see </p>"},{"location":"lerobot.html#basic-walkthrough","title":"Basic Walkthrough","text":"<p>This is from the lerobot top README.md.</p>"},{"location":"lerobot.html#visualize-datasets","title":"Visualize Datasets","text":"<p>Outside of container, first launch the rerun.io visualization tool that LeRobot uses <sup>[\u2197]</sup></p> <pre><code>pip3 install rerun-sdk\nrerun\n</code></pre> <p>Then, start the docker container to playback one of these LeRobot datasets.</p> <pre><code>jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n  python3 lerobot/scripts/visualize_dataset.py \\\n    --repo-id lerobot/pusht \\\n    --episode-index 0\n</code></pre> <p></p>"},{"location":"lerobot.html#evaluate-a-pretrained-diffusion-policy","title":"Evaluate a Pretrained Diffusion Policy","text":"<p>This will download and run a pre-trained diffusion model on the PushT environment <sup>[\u2197]</sup></p> <pre><code>jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n  python3 lerobot/scripts/eval.py \\\n    -p lerobot/diffusion_pusht \\\n    eval.n_episodes=10 \\\n    eval.batch_size=10\n</code></pre>"},{"location":"lerobot.html#train-your-own-act-policy","title":"Train your own ACT Policy","text":"<p>Next, train ACT on the Aloha manipulation environment <sup>[\u2197]</sup></p> <pre><code>jetson-containers run -w /opt/lerobot $(autotag lerobot) \\\n  python3 lerobot/scripts/train.py \\\n    policy=act \\\n    env=aloha \\\n    env.task=AlohaInsertion-v0 \\\n    dataset_repo_id=lerobot/aloha_sim_insertion_human \n</code></pre> <p>See Trossen Robotics for dual-arm ALOHA kits, and Robotis for the low-cost Koch v1.1 kit used in the LeRobot tutorial:</p> <p></p> <p>HuggingFace LeRobot - Get Started with Real-World Robots (YouTube Playlist)</p>"},{"location":"llama_vlm.html","title":"Llama 3.2 Vision","text":"<p>The latest additions to Meta's family of foundation LLMs include multimodal vision/language models (VLMs) in 11B and 90B sizes with high-resolution image inputs (1120x1120) and cross-attention with base completion and instruction-tuned chat variants:</p> <ul> <li><code>Llama-3.2-11B-Vision</code></li> <li><code>Llama-3.2-11B-Vision-Instruct</code></li> <li><code>Llama-3.2-90B-Vision</code></li> <li><code>Llama-3.2-90B-Vision-Instruct</code></li> </ul> <p>While quantization and optimization efforts are underway, we have started with running the unquantized 11B model in a container based on HuggingFace Transformers that has been updated with the latest support for Llama-3.2-Vision a jump start on trying out these exciting new multimodal models - thanks to Meta for continuing to release open Llama models!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>12.8GB</code> for <code>llama-vision</code> container image</li> <li>Space for models (<code>&gt;25GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> <li> <p>Request access to the gated models here with your HuggingFace API key.</p> </li> </ol>"},{"location":"llama_vlm.html#code-example","title":"Code Example","text":"<p>Today Llama-3.2-11B-Vision is able to be run on Jetson AGX Orin in FP16 via HuggingFace Transformers.  Here's a simple code example from the model card for using it: </p> <pre><code>import time\nimport requests\nimport torch\n\nfrom PIL import Image\nfrom transformers import MllamaForConditionalGeneration, AutoProcessor\n\nmodel_id = \"meta-llama/Llama-3.2-11B-Vision\"\nmodel = MllamaForConditionalGeneration.from_pretrained(model_id, device_map=\"auto\", torch_dtype=torch.bfloat16)\nprocessor = AutoProcessor.from_pretrained(model_id)\n\nprompt = \"&lt;|image|&gt;&lt;|begin_of_text|&gt;If I had to write a haiku for this one\"\nurl = \"https://llava-vl.github.io/static/images/view.jpg\"\nraw_image = Image.open(requests.get(url, stream=True).raw)\n\ninputs = processor(text=prompt, images=raw_image, return_tensors=\"pt\").to(model.device)\noutput = model.generate(**inputs, do_sample=False, max_new_tokens=32)\n</code></pre> <p></p> <pre><code>If I had to write a haiku for this one, it would be:\n\nA dock on a lake.\nA mountain in the distance.\nA long exposure.\n</code></pre> <p>Initial testing seems that Llama-3.2-Vision has more conversational abilities than VLMs typically retain after VQA alignment.  This llama_vision.py script has interactive completion and image loading to avoid re-loading the model.  It can be launched from the container like this:</p> <pre><code>jetson-containers run \\\n    -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n    $(autotag llama-vision) \\\n      python3 /opt/llama_vision.py \\\n        --model \"meta-llama/Llama-3.2-11B-Vision\" \\\n        --image \"/data/images/hoover.jpg\" \\\n        --prompt \"I'm out in the\" \\\n        --max-new-tokens 32 \\\n        --interactive\n</code></pre> <p>After processing the initial image, it will ask you to submit another prompt or image:</p> <pre><code>total 4.8346s (39 tokens, 8.07 tokens/sec)\n\nEnter prompt or image path/URL:\n\n&gt;&gt; \n</code></pre> <p>We will update this page and container as support for the Llama-3.2-Vision architecture is added to quantization APIs like MLC and llama.cpp for GGUF, which will reduce the memory and latency.</p>"},{"location":"nerf.html","title":"Nerfstudio - Neural Reconstruction","text":"<p>Nerfstudio is a tool for training and visualizing Neural Radiance Fields (NeRF) models for 3D volumetric reconstruction from a collection of images taken of a scene or environment. It's designed to be easy to use and starts generating imagery at the outset, which becomes further refined as training progresses. Neural reconstruction is being applied to edge devices and robotics like with FruitNeRF. This tutorial will guide you through the process of training high-quality NeRF models using nerfstudio on Jetson.</p> <p></p> <p>Special thanks to Johnny N\u00fa\u00f1ez Cano for porting the NeRF stack to Jetson! See NeRF in 2023: Theory and Practice by It-Jim for useful background info.</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>17.6GB</code> for <code>nerfstudio</code> container image</li> <li>Space for models and datasets (<code>&gt;5GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"nerf.html#start-container","title":"Start Container","text":"<p>Use this command to automatically run, build, or pull a compatible container image for nerfstudio:</p> <p><pre><code>jetson-containers run $(autotag nerfstudio)\n</code></pre> To mount your own directories into the container, use the <code>-v</code> or <code>--volume</code> flags:</p> <pre><code>jetson-containers run -v /path/on/host:/path/in/container $(autotag nerfstudio)\n</code></pre> <p>Later a webserver will be reachable at <code>http://localhost:7007</code>.  The official docs include command-line usage and models <sup>[\u2197]</sup></p>"},{"location":"nerf.html#training-your-first-model","title":"Training Your First Model","text":"<p>It's recommended to train a test scene first by following these steps from the getting started tutorial:</p> <pre><code># Download some test data\nns-download-data nerfstudio --capture-name=poster\n\n# Train model\nns-train nerfacto --data /data/nerfstudio/models\n</code></pre> <p>After it loads, you can navigate your browser to <code>http://localhost:7007</code> (or substitute your Jetson's IP address)</p> <p></p> <p>The 3D visualizations will begin rendering after the first training step, and progressively become more refined.  This process typically takes an hour on Jetson AGX Orin to complete the default setting of 30,000 training steps.  The model checkpoints are saved under the mounted <code>jetson-containers/data/nerfstudio</code> directory, and generate the same output quality as before.</p>"},{"location":"nerf.html#fruitnerf","title":"FruitNeRF","text":"<p>The creators behind FruitNeRF fused NeRFs with segmentation for more accurate counting of fruits in 3D.  By training a semantic neural radiance field on both RGB and segmentation labels from SAM, the model is robust to occlusions, double-counting, and mistaken or undesirable fruits. </p> <p>Extended sampling of the volumetric grid can then blob objects for pose estimation, occupancy mapping, and navigation. Such approaches are promising for combining the strengths of NeRFs for 3D reconstruction, segmentation VITs for open-vocabulary classification, and Jetson's onboard compute for high-definition environmental scanning and perception while in the field.</p>"},{"location":"nerf.html#data-preparation","title":"Data Preparation","text":"<p>We'll use the FruitNeRF datasets as an example of training NeRFs on custom data with nerfstudio.  To train a NeRF model on a different scene, you first need to capture a set of images and corresponding camera poses (which can be estimated with the included photogrammetry and image registration tools like COLMAP and OpenSFM as shown here)</p> <p>The images should be taken from different viewpoints of the scene you want to model, structured in the nerfstudio dataset format.  For FruitNeRF, you can just download and extract pre-recorded real or synthetic data from here:</p> Real DataSynthetic data <pre><code>cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Real.zip\nunzip FruitNeRF_Real.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n</code></pre> <pre><code>cd /data/nerfstudio\nwget https://zenodo.org/records/10869455/files/FruitNeRF_Synthetic.zip\nunzip FruitNeRF_Synthetic.zip\nmv FruitNeRF_Dataset/tree_01/semantics_sam FruitNeRF_Dataset/tree_01/semantics\n</code></pre>"},{"location":"nerf.html#training-a-fruitnerf-model","title":"Training a FruitNeRF Model","text":"<p>This command will generate a NeRF on the first tree (there are multiple trees to try from each dataset).  You can find the complete usage information and models on the official GitHub repo: <code>github.com/meyerls/FruitNeRF</code></p> <pre><code>ns-train nerfacto \\\n  --data /data/nerfstudio/FruitNeRF_Dataset/tree_01 \\\n  --output-dir /data/nerfstudio/models \\\n  --pipeline.datamanager.camera-res-scale-factor 0.5\n</code></pre> <p>The memory usage depends on the number of images in the dataset and their resolution, so if you're running low on memory try adjusting the scaling factors and image sampling parameters. The simulated data has a lower resolution and can run at full size.</p> <p></p>"},{"location":"openvla.html","title":"OpenVLA - Vision/Language Action Models for Embodied Robotics","text":"<p>Fine Tuning and Deployment Guide</p> <p>The tutorials's goal is to provide optimized quantization and inference for deploying VLA models, along with reference fine-tuning workflows for adapting models for new robots, tasks, and environments.  Rigorous performance and accuracy validation is applied in a self-contained sim environment with scenario generation and domain randomization (MimicGen).  Future phases will include sim2real with Isaac Lab and ROS2 integration, study of related models like CrossFormer and optimizations to the neural architecture for realtime performance. </p> <p>\u2705 Quantization and inference optimizations for VLA models \u2705 Accuracy validation of the original OpenVLA-7B weights \u2705 Reference fine-tuning workflow with synthetic data generation \u2705 On-device training with LoRA's on Jetson AGX Orin and full fine-tuning on A100/H100 instances \u2705 85% accuracy on an example block-stacking task with domain randomization \u2705 Sample datasets and test models for reproducing results </p> <p>Thank you to OpenVLA, Open X-Embodiment, MimicGen, Robosuite and many others with related work for sharing their promising research, models, and tools for advancing physical AI and robotics.</p>"},{"location":"openvla.html#vla-architecture","title":"VLA Architecture","text":"<p>OpenVLA is a vision/language action model for embodied robotics and behavioral learning built on LLM/VLMs (this base model is a Prismatic VLM using Llama-7B, DINOv2, and SigLIP).  Instead of image captioning or visual question/answering, VLA models generate action tokens from camera images and natural language instructions that are used for controlling the robot.  Action tokens are discrete token ID's reserved from the text tokenizer's vocabulary that map to continuous values, normalized against the range of motion of each robot. These real-valued tokens are more efficient and accurate than the model outputting numerical data as text in JSON or Pydantic formats, where each digit, decimal point, separator, and whitespace takes an additional token to generate.  Other hybrid vision/language models like Florence-2 have adopted similar approaches for continuous-domain prediction using Transformers.</p> <p>Each action token generated by the model represents a degree-of-freedom of the output coordinate space (i.e. xyz, rotation pose), or a component of the robot that can be controlled (like the gripper). OpenVLA-7B was trained on the Open X-Embodiment dataset for manipulation, with a 7-DoF action space consisting of <code>(delta xyz, delta roll/pitch/yaw, gripper)</code>.  The position and rotation are relative changes to the end-effector (EEF) pose, with an external inverse kinematics (IK) solution like cuMotion solving joint constraints specific to each robotic arm.  The gripper dimension is an absolute control between 0 (open) and 1 (closed) that does not recieve further scaling/normalization.</p> <p></p> <p>OpenVLA reserves 256 of the least-frequently used tokens out of the Llama-7B vocabulary for action values, which gives it 8-bit resolution over the controls.  It has an input image resolution of 224x224 to stacked DINOv2/SigLIP vision encoders that are projected to ~275 input tokens (plus the text prompt), and outputs 7 tokens mapped to <code>(\u0394pos, \u0394rotation, gripper)</code> coordinates.</p>"},{"location":"openvla.html#quantization","title":"Quantization","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models and datasets (<code>&gt;15GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol> <p>Support for OpenVLA has been added to NanoLLM on top of its streaming VLM pipeline with INT4/FP8 quantization using MLC and vision encoders in FP16 with TensorRT.  First we'll test the model on BridgeData V2, one of the top weighted datasets from the Open X-Embodiment collection.  The model was trained on this data and is used to confirm that the quantization and inference are working correctly during deployment.  This is what the dataset looks like, courtesy of their website:</p> <p>The following command starts the container, downloads the dataset and model (if needed), quantizes it on the first run, and measures the accuracy of the action values against the groundtruth from the dataset using normalized mean-squared error (NRMSE) to unbias the varying ranges each dimension of the action space can have.  We extracted a 100-episode subset of the original Bridge data here on HuggingFace Hub, so you don't need to download the entire ~400GB dataset just for these tests.</p> INT4FP8FP16 <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model openvla/openvla-7b \\\n    --quantization q4f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_bridge_int4.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model openvla/openvla-7b \\\n    --quantization q8f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_bridge_fp8.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api hf \\\n    --model openvla/openvla-7b \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_bridge_fp16.json\n</code></pre> Quantization Accuracy Latency FPS FP16 95.3% 840 ms 1.19 FP8 95.2% 471 ms 2.12 INT4 90.1% 336 ms 2.97 <p>These results were run on Jetson AGX Orin 64GB with JetPack 6, and we will see later with our fine-tuned model the INT4 accuracy match FP8/FP16.</p> <p>Each frame, the 7D action vector predicted by the model is printed along with the groundtruth, along with the accuracy, latency, and framerate for that frame.  The numbers printed after <code>~</code> are the averages of those so far, with the last value reported being the mean over the entire dataset processed.  </p> <pre><code># INT4\nstep 355  [-0.02692  0.00776 -0.00299  0.08160  0.07292  0.04791  0.99608]  accuracy 0.8466 ~0.9017  time=336.2 ms  fps=2.96 ~2.97\ngt   355  [-0.02387  0.00760 -0.00318  0.15965  0.07707  0.03281  1.00000]\n\n# FP8\nstep 355  [-0.02392  0.00767 -0.00310  0.08160  0.07692  0.03217  0.99608]  accuracy 0.9982 ~0.9523  time=469.7 ms  fps=2.13 ~2.12\ngt   355  [-0.02387  0.00760 -0.00318  0.15965  0.07707  0.03281  1.00000]\n\n# FP16\nstep 355  [-0.02392  0.00767 -0.00310  0.08160  0.07692  0.03217  0.99608]  accuracy 0.9982 ~0.9531  time=842.3 ms  fps=1.19 ~1.18\ngt   355  [-0.02387  0.00760 -0.00318  0.15965  0.07707  0.03281  1.00000]\n</code></pre> <p>The per-frame metrics and averages can be saved with the <code>--save-stats</code> argument, and in the interests of time you can cap the amount of episodes processed with <code>--max-episodes</code>.  As mentioned above, the Bridge dataset used was included in the training dataset, and further below we run this again on data we generated not from the training dataset with significant variation. This tool can also load other datasets in RLDS/TFDS format from Open X-Embodiment, and HDF5 from Robomimic/MimicGen.  You can also create your own agents and scripts using the exposed APIs from the coding examples below.</p>"},{"location":"openvla.html#inference-api","title":"Inference API","text":"<p>The code is simple for running VLA inference on camera streams using the NanoLLM library in the container:</p> VLA on Video<pre><code>from nano_llm import NanoLLM\nfrom nano_llm.plugins import VideoSource\n\n# load vision/language action model\nmodel = NanoLLM.from_pretrained(model, quantization='q4f16_ft')\ncamera = VideoSource(video_source, cuda_stream=0)\n\nassert(model.vla)  # make sure this is a VLA\n\nwhile True:\n    # returns a cudaImage, np.ndarray, or torch.Tensor on the GPU\n    image = camera.capture()\n\n    if image is None: # in case of timeout, keep trying\n        continue\n\n    # returns a np.ndarray or torch.Tensor with vla.dof elements\n    # for OpenVLA, this is (\u0394x, \u0394y, \u0394z, \u0394roll, \u0394pitch, \u0394yaw, gripper)\n    actions = model.vla.predict_action(\n        image, \n        instruction=\"pick up the nearest object\", \n        action_space=\"normalized\",\n        return_tensors='np',\n    )\n\n    # send the actions to your robot controller or IK solver\n    ...\n</code></pre> <p>VLA models are also supported in Agent Studio, which includes the simulator components as well.</p>"},{"location":"openvla.html#online-validation","title":"Online Validation","text":"<p>Given the challenging task domain, dynamic feedback loops, and computational demands for sim/training/inference, using VLAs for language-guided dexterous manipulation involves a significant increase in complexity over baseline usage of LLMs and VLMs.  To go from predicting logits at the token level to actions consistently correct enough over an extended series of frames to form useful behaviors, it's important to cross-check outputs and measure accuracy at each stage of the training/inference workflow to be able to identify the source of potential regressions when they occur.  </p> <p>Unlike typical applications in supervised learning, the metrics for end-task completion and success aren't measured from static pre-recorded datasets that don't account for the temporal domain and feedback from physical interactions along with compounding errors - they require online validation, either in simulation or real-world tests.  </p> <p> Closing the Sim-to-Real Gap: Training Spot Quadruped Locomotion with NVIDIA Isaac Lab</p> <p>During training the token classification accuracy is measured from the groundtruth action labels (i.e. how many action tokens were predicted exactly right), with the model optimizing to minimize this loss (as is normal for LLMs).  Action accuracy in the continuous domain is also is also measured during training from the L1 error of the detokenized real-valued outputs.  Continuous action accuracy trends slightly higher than token classification accuracy, as the later does not provide any reward for being closer to the desired result.  In practice, these should be &gt;95% accurate at this level for completing tasks successfully in similar environments.  To achieve that high degree of accuracy, it seems intentional in the work and related research to overfit the model by training it for many epochs (upwards of 30 epochs on the same 900K episodes for OpenVLA).  Transformers are known to recall specific knowledge from few training examples, and are sensitive to overfitting and forgetting previously learned information.  As such, LLMs are normally only trained for a few epochs at most to preserved their zero-shot capabilities and ability to generatize to out-of-distribution inputs.  During the fine-tuning part of this project, we characterize the impacts on model accuracy and task success from the number of distinct training episodes versus the number of epochs over repeated data.</p> <p>The actual task success rate doesn't get measured until the inference stage, when it is either connected to a simulator or physically tested in a series of time-consuming trials under similar conditions.  We integrated MimicGen directly with the OpenVLA training scripts for an endless source of unseen data, but encountered gradient instabilities after the model had received a significant number of episodes.</p>"},{"location":"openvla.html#simulation-with-mimicgen","title":"Simulation with MimicGen","text":"<p>MimicGen creates randomized episodes from as few as 10 teleoperated examples by utilizing scene graph information and task/subtask metadata about which objects in the environment are targets of the current subtask, in order to interpolate the original teloperated trajectories into their new random locations and poses.  This generates large amounts of unique training data to improve robustness, without needing large amounts of human effort for the robot learning new skills and behaviors.    </p> <p>MimicGen is built on the Robomimic and Robosuite simulators and are able to run onboard Jetson headlessly alongside the VLA, simplifying the setup for reproducibility.  The RoboCasa project is built on MimicGen and being integrated with NVIDIA Omniverse and OSMO, and in future work we'd use Isaac Lab for scalability, more accurate physics, and photorealistic rendering.</p> <p>MimicGen includes 12 tasks like block stacking, pick and place, assembly, and kitchen scenarios.  And each type of task has variants increasing in difficulty as learning progresses, which would be interesting to compare curated approaches to the purely random sequencing that OpenVLA uses with Open X-Embodiment.  In this phase of the tutorial, we focus on the block stacking task to understand the training requirements and runtime performance needed to master a new task with success rates of &gt;75-80%, similar to the paper.  This will help inform scaling to multiple behaviors and more complex scenarios that vary significantly from in-distribution examples like the MimicGen environments (as evidenced by the original OpenVLA weights scoring zero successes in them).</p>"},{"location":"openvla.html#data-generation","title":"Data Generation","text":"<p>We built MimicGen containers for Jetson from a fork of the code with some patches for aarch64+igpu along with enhancements like generation of natural language labels with random variations for the relevant tasks, along with additional domain randomization for the colors/materials of objects (these environments were added as <code>Stack_D2</code>, <code>Stack_D3</code>, and <code>Stack_D4</code>).  For training OpenVLA, the images and labels are saved to disk, whereas later inference is done with online simulation to measure the task success rate.  To that effect we integrated MimicGen with Agent Studio for interactively testing the models and quickly dropping in components like ASR for verbally commanding the robot.</p> <p>Online Training</p> <p>There's initial support for direct integration of MimicGen in this fork of OpenVLA for live simulation and validation during training and endless episodes without repeating epochs.  The models experienced spiking gradients later into LoRA's, and should try again with lower learning rates or by similarly integrating MimicGen into their full fine-tuning script using FDSP for increasing the batch size on dGPU.  </p> <p>This command will generate the specified number of training episodes, saved in Robomimic HDF5 format. We provide the rendered datasets for these on HuggingFace Hub with 1000 and 2500 episodes.  OpenVLA suggests only needing 10-150 episodes for fine-tuning and data-efficient adaptation, which perhaps performs similarly in comparable spaces, but we ultimately found insufficient for the MimicGen environments.   </p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m mimicgen.generate \\\n      --tasks Stack_D4 \\\n      --episodes 100 \\\n      --output /data/datasets/mimicgen \\\n      --cameras agentview \\\n      --camera-width 224 \\\n      --camera-height 224\n</code></pre> <p>The HDF5 dataset will be saved to <code>/data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5</code> (which is in a mounted volume under your <code>jetson-containers/data</code> directory outside of the container), along with a video of sample episodes that were rendered:</p> <p>This video is actually of Stack_D2 to avoid subjecting everyone to flickering colors.  Stack_D4 is used for training and generates blocks with random colors and positions each frame, along with language labels augmented through the random combination of various nouns, adjectives, and verbs that form the instruction (<code>Stack the red block on the green block</code>, <code>Put the little cube on top</code>).  Stack_D3 randomizes colors/positions each frame, and instructions each episode.  Stack_D2 does them all per-episode (which is typically used at runtime). Since OpenVLA uses a single frame at a time with no temporal aspect during training, applying domain randomization per-frame as opposed to per-episode is feasible provides more variance in the dataset.  The block-stacking episodes typically come out to be around ~110 frames each, and take around 10-15 seconds to generate per episode on Jetson AGX Orin with per-frame domain randomization, and 5 seconds per episode without domain randomization.</p> <p>The agentview camera looks onward from the front of the scene towards the robot. There are others available like sideview and eye_in_hand (wrist view) - we tried using the onboard wrist camera, but found the model would too easily veer off track and get 'lost' offscreen.  It may be possible for wrist-only to work should the dataset add examples of the robot recovering and returning to a wider vantage point.  Other VIT-based embodied models like Octo and CrossFormer use both cameras, and is a future experiment with VLA's based on multi-image VLM's like VILA.</p>"},{"location":"openvla.html#rlds-conversion","title":"RLDS Conversion","text":"<p>OpenVLA uses datasets in RLDS format (which is based on TFDS), so we provide a converter from HDF5.  This extra step can also be time-consuming for a large number of epiodes, like those used here.  This is one of the reasons we desire to run MimicGen online with training and performed the initial integration directly with OpenVLA.  Unless you are generating different data, you can skip this and use the MimicGen datasets that we uploaded here in RLDS format.</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.datasets \\\n        --dataset /data/datasets/mimicgen/demo_src_stack_task_D4/demo.hdf5 \\\n        --dataset-type mimicgen \\\n        --convert rlds \\\n        --remap-keys agentview:image \\\n        --output /data/datasets/mimicgen/rlds/stack_d4_ep2500\n</code></pre> <p>This will create a set of tfrecord files under the output directory that are able to be loaded by the OpenVLA training scripts.</p>"},{"location":"openvla.html#fine-tuning","title":"Fine Tuning","text":"<p>A primary objective of this project is to characterize the training needed to adapt the model to different robots and tasks.  Our development primarily consisted of running test LoRA's onboard Jetson AGX Orin 64GB and debugging issues locally, and when the results were encouraging to perform a full fine-tuning with FDSP on multiple A100/H100's from spot instance providers like Brev.dev, Vast.ai, and RunPod.  Full fine-tuning on 2x Jetson AGX Orin's was attempted with FDSP, but ran out of memory with all settings that we tried. We provide the test models we trained on HuggingFace for others to try in the inference + sim setup below.  Below are the training GPU configurations that were used, along with their batch sizes that maximized memory usage:</p> Batch Size FPS $/hr Jetson AGX Orin 64GB 8 (LoRA) 1.57 - 2x A100 SMX4 80GB 48 13.07 ~$1.50 8x H100 NVL 94GB 256 92.4 ~$25 <p>The rental fees are ballpark averages over the spot instances available with these GPUs at the time of writing, and becomes quite reasonable when used alongside a Jetson repurposed for training daily test LoRA's on a reduced amount of data.  Training until convergence on Jetson and 2xA100 took roughly 24-36 hours depending on the amount of data and number of epochs.  We kept to &lt;5 epochs for the full fine-tunes in an attempt to prevent the afformentioned overfitting, instead opting to increase the number of episodes.</p> <p>Below we provide the steps to run the OpenVLA LoRA training on Jetson, and for the dGPU systems refer to Fully Fine-Tuning OpenVLA.  Typically you will launch a spot instance with your provider of choice in a CUDA or PyTorch container, then install the OpenVLA repo and its dependencies with pip, and download your dataset to the system before launching the command (or create a bundled container with it all included to save time).  Here's the WandB Dashboard from the full fine-tuning runs that you can inspect, comparing a fewer number of episodes for more epochs, versus a larger number of episodes trained for fewer epochs:</p>"},{"location":"openvla.html#on-device-lora","title":"On-Device LoRA","text":"<p>The OpenVLA repo provides working training scripts for LoRA/qLoRA and multi-node multi-GPU full fine-tunes using PyTorch FDSP.  It was not difficult to go in a make changes and enhancements, some of which we have done for our purposes of on-device training in this fork.  Overall we found the process to be more similar than not to training other vision DNNs, just with larger datasets and rigorous validation required of the data pipeline that all the coordinate spaces and transformations matched up at every step of the sim\u2192training\u2192inference workflow.</p> <p>We built an OpenVLA container for JetPack that runs the LoRA training, which you can find the specific documentation about from the OpenVLA readme (it's also recommended to read their paper which includes many insights into the training process).</p> <pre><code>jetson-containers run $(autotag openvla) \\\n  torchrun --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py \\\n      --vla_path openvla/openvla-7b \\\n      --data_root_dir /data/datasets/mimicgen/rlds \\\n      --dataset_name stack_d4_ep2500 \\\n      --run_root_dir /data/models/openvla \\\n      --lora_rank 32 \\\n      --batch_size 8 \\\n      --grad_accumulation_steps 2 \\\n      --learning_rate 5e-4 \\\n      --image_aug False \\\n      --save_steps 250 \\\n      --epochs 5\n</code></pre> <p>This will start a TensorBoard server on port 6006 to monitor the training progress.  Typically you would set the script running for more epochs than you intend to actually run, so that you can instead stop when the model converges (typically occurring with a loss below 0.5 and token accuracy &gt;95%). This script was adapted so that if you interrupt training by pressing <code>Ctrl+D</code> from the terminal, it will gracefully stop early and still merge the LoRA weights before exiting.  If training is terminated otherwise, we added a <code>merge.py</code> script that you should run afterwards get the model ready for inference.</p>"},{"location":"openvla.html#validation","title":"Validation","text":"<p>Now that we have trained our test model (or you can download one from here), let's re-validate it again like we did above on the original OpenVLA model, but this time on unseen data from MimicGen with a different random seed (<code>dusty-nv/mimicgen-stack_d4-ep100</code>).  These commands will download and run the fully fine-tuned checkpoint (on 2500 episodes for 4 epochs) that we released to (<code>dusty-nv/openvla-7b-mimicgen</code>.  If you trained your own model, you can substitute the local path to the HF checkpoint.</p> INT4FP8FP16 <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model dusty-nv/openvla-7b-mimicgen \\\n    --quantization q4f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_mimicgen_int4.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api mlc \\\n    --model dusty-nv/openvla-7b-mimicgen \\\n    --quantization q8f16_ft \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_mimicgen_fp8.json\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.vla --api hf \\\n    --model dusty-nv/openvla-7b-mimicgen \\\n    --dataset dusty-nv/bridge_orig_ep100 \\\n    --dataset-type rlds \\\n    --max-episodes 10 \\\n    --save-stats /data/benchmarks/openvla_mimicgen_fp16.json\n</code></pre> <p>The results from this are collated in the next section along with the end-task success rates.  Time to see it in action!    </p>"},{"location":"openvla.html#inference-simulation","title":"Inference + Simulation","text":"<p>To measure how well our model actually performs at completing the task, we spin up a MimicGen environment in Agent Studio that's connected to the VLA model.  It counts the number of successful episodes by checking the reward issued by the sim, which is not used by the model but signals when the task was completed.  We use a horizon of 200 frames for evaluation, after which it is deemed to be a failure.  </p> INT4FP8FP16 <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.studio --load OpenVLA-MimicGen-INT4\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP8\n</code></pre> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.studio --load OpenVLA-MimicGen-FP16\n</code></pre> <p>To start the benchmarking, connect the output of the <code>AutoPrompt</code> node to the <code>OpenVLA</code> node.  On its own it will run forever - we did it for 100 episodes each, which can take several hours since the sim operates in lock step with the model (future experiments will train on actions accumulated from multiple timesteps and also reduce the model size to improve performance).</p> Quantization Train Accuracy Val Accuracy Task Success Avg Frames Latency FPS FP16 96.5% 85.4% 86% 132 827 ms 1.20 FP8 96.2% 85.1% 85% 131 467 ms 2.14 INT4 95.4% 84.4% 84% 138 335 ms 2.98 <p>This is using the model fine-tuned on 2500 episodes for 4 epochs, and although the task may have been simple, is evidence of achieving the sought-after success rates of ~85%.  Quantization has a negligible ~1% impact while scaling performance almost linearly.  The average number of frames is how long it took the robot to complete the task, which efficiency is another important end-metric to evalulate models by (consider the source teleop episodes were ~110 frames long, and we realized after that these averages include failed episodes during evaluation). The training dataset <code>dusty-nv/bridge_orig_ep2500</code> was used to measure the action Train Accuracy, while the previously unused and distinct <code>dusty-nv/bridge_orig_ep100</code> was used for Validation Accuracy.</p>"},{"location":"openvla.html#episodes-vs-epochs","title":"Episodes vs Epochs","text":"<p>Upon measuring the success rates of the other fine-tuned models that were trained on fewer episodes for more epochs, we can see the impact of increasing the size of the dataset:</p> Episodes Epochs Frames Task Success Avg Frames 500 10 550K 23% 186 1000 6 660K 48% 165 2500 4 1.1M 84% 138 <p>That isn't to say that the 2500-episode model still isn't overfit - it is after learning to always stack the smallr block and essentially ignore the instructions.  More task diversity in the training is required, which we can do now that we have the workflow.  And we want to branch out into real-world domains instead of test environments in simulation to prove model viability.  The amount of variance and data required to achieve generalist models in the challenging problem space of manipulation does raise interesting questions about purely random task learning versus more guided and curated approaches that ramp up in complexity as training progresses.</p>"},{"location":"openvla.html#future-research","title":"Future Research","text":"<p>Embodied agents are an exciting emerging area at the forefront of robotics and physical AI, with many promising avenues to investigate further.  Follow-up publications to OpenVLA include Embodied-CoT and CrossFormer, along with others sure to be in development.  In addition to proceeding to physical testing, these areas of interest we'll experiment with as research progresses:</p> <ul> <li>Smaller LLM with higher-resolution vision encoder(s)</li> <li>Multiple image inputs - multiple cameras and/or timesteps</li> <li>Action states from previous frame(s) as input</li> <li>Training on consecutive actions for larger timesteps</li> <li>Similar test model for UGV's in sim</li> <li>Using Isaac Lab and Robocasa</li> <li>sim2real deployment with ROS2</li> <li>Trade-offs of VLA vs VIT-based approaches</li> </ul>"},{"location":"research.html","title":"Jetson AI Lab Research Group","text":"<p>The Jetson AI Lab Research Group is a global collective for advancing open-source Edge ML, open to anyone to join and collaborate with others from the community and leverage each other's work.  Our goal is using advanced AI for good in real-world applications in accessible and responsible ways.  By coordinating together as a group, we can keep up with the rapidly evolving pace of AI and more quickly arrive at deploying intelligent multimodal agents and autonomous robots into the field.</p> <p>There are virtual meetings that anyone is welcome to join, offline discussion on the Jetson Projects forum, and guidelines for upstreaming open-source contributions. </p> <p>Next Meeting - 11/26</p> <p> The next team meeting is on Tuesday, November 26<sup>th</sup> at 9am PST - see the invite below or click here to join the meeting in progress.</p>"},{"location":"research.html#topics-of-interest","title":"Topics of Interest","text":"<p>These are some initial research topics for us to discuss and investigate. This list will vary over time as experiments evolve and the SOTA progresses:</p> \u2022 Controller LLMs for dynamic pipeline code generation \u2022 Fine-tuning LLM/VLM onboard Jetson AGX Orin 64GB \u2022 HomeAssistant.io integration for smart home [1] [2] \u2022 Continuous multi-image VLM streaming and change detection \u2022 Recurrent LLM architectures (Mamba, RKVW, ect) [1] \u2022 Lightweight low-memory streaming ASR/TTS models \u2022 Diffusion models for image processing and enhancement \u2022 Time Series Forecasting with Transformers [1] [2] \u2022 Guidance, grammars, and guardrails for constrained output \u2022 Inline LLM function calling / plugins from API definitions \u2022 ML DevOps, edge deployment, and orchestration \u2022 Robotics, IoT, and cyberphysical systems integration <p>New topics can be raised to the group either during the meetings or on the forums (people are welcome to work on whatever they want of course)</p>"},{"location":"research.html#contribution-guidelines","title":"Contribution Guidelines","text":"<p>When experiments are successful, ideally the results will be packaged in such a way that they are easily reusable for others to integrate into their own projects:</p> <ul> <li>Open-source libraries &amp; code on GitHub</li> <li>Models on HuggingFace Hub</li> <li>Containers provided by jetson-containers</li> <li>Discussions on the Jetson Projects forum</li> <li>Documentation &amp; tutorials on Jetson AI Lab</li> <li>Hackster.io for hardware-centric builds</li> </ul> <p>Ongoing technical discussions are encouraged to occur on the forums or GitHub Issues, with status updates on projects given during the meetings.</p>"},{"location":"research.html#meeting-schedule","title":"Meeting Schedule","text":"<p>We'll aim to meet monthly or bi-weekly as a team in virtual meetings that anyone is welcome to join and speak during.  We'll discuss the latest updates and experiments that we want to explore.  Please remain courteous to others during the calls.  We'll stick around after for anyone who has questions or didn't get the chance to be heard.</p> <p>Tuesday November 26<sup>th</sup> at 9am PST (11/26/24)</p> <ul> <li>Microsoft Teams - Meeting Link </li> <li>Meeting ID: <code>264 770 145 196</code></li> <li>Passcode: <code>Uwbdgj</code> </li> <li>Outlook Invite:  <code>Jetson AI Lab Research Group.ics</code> </li> </ul> <p>The agenda will be listed here beforehand - post to the forum to add agenda items.  The meetings will be recorded so anyone unable to attend live can watch them after.</p>"},{"location":"research.html#past-meetings","title":"Past Meetings","text":"November 12, 2024 <p>Topics Covered:</p> <ul> <li>SMACC in Isaac Sim (Brett Aldrich, robosoft.ai)</li> <li>Kaya sim2real Agents (Kabilan Kb, DLI Instructor)</li> </ul> October 29, 2024 <p>Topics Covered:</p> <ul> <li>ROSA Connector for Isaac Sim (Rob Royce, NASA JPL)</li> <li>The OASIS Project: A Technical Brief (Kris Kersey, Kersey Fabrications)</li> </ul> October 15, 2024 <p>Topics Covered:</p> <ul> <li>HuggingFace LeRobot (Chitoku Yato)</li> <li>Stanley H1 Humanoid (Dave Niewinski)</li> </ul> October 1, 2024 <p>Topics Covered:</p> <ul> <li>ReMEmbR: Long-Horizon Memory for Navigation (Abrar Anwar)</li> <li>Diffusion Policies, Shape LLM, 3D Encoders</li> </ul> September 17, 2024 <p>Topics Covered:</p> <ul> <li>NASA JPL - ROSA (Rob Royce &amp; Shehryar Khattak)</li> <li>LeRobot Walkthrough (Chitoku Yato)</li> <li>VLM Agent in Isaac Sim/ROS (Kabilan Kb)</li> </ul> September 3, 2024 <p>Topics Covered:</p> <ul> <li>Edge NeRF's and nerfstudio (Johnny N\u00fa\u00f1ez Cano)</li> <li>Review of OpenVLA results (Dustin Franklin)</li> <li>Oculus Interface for Jetson (Al Costa)</li> <li>TAO Toolkit 5.5</li> </ul> August 20, 2024 <p>Topics Covered:</p> <ul> <li>GenAI ROS Nodes for VLM (Khannah Shaltiel)</li> <li>Isaac Sim and Orin Nano with Hardware-in-the-Loop (Kabilan Kb)</li> <li>Miru Edge Deployment Infrastructure (Vedant Nair)</li> </ul> August 6, 2024 <p>Topics Covered:</p> <ul> <li>OpenVLA Fine-Tuning</li> <li>Gemma-2-2b (Asier Arranz)</li> <li>Ultralytics YOLOv8 (Lakshantha Dissanayake)</li> </ul> July 23, 2024 <p>Topics Covered:</p> <ul> <li>Llama-3 Function &amp; Specs</li> <li>OpenVLA with MimicGen</li> <li>Phi-3 Vision via ONNX (Jambo Chen)</li> <li>OStream GenRunner (Kerry Shih)</li> </ul> July 9, 2024 <p>Topics Covered:</p> <ul> <li>OpenVLA Quantization (openvla.github.io)</li> <li>visualnav-transformer (robodhruv/visualnav-transformer)</li> <li>Florence-2, Holoscan, Grammars (Nigel Nelson, florence-2-vision)</li> <li>LLaMa-Factory (hiyouga/LLaMA-Factory)</li> </ul> June 25, 2024 <p>Topics Covered:</p> <ul> <li>Function Calling in Agent Studio</li> <li>Jetson Copilot (Chitoku Yato)</li> <li>Jetson Platform Services (Sammy Ochoa)</li> <li>On-device Fine-tuning (Nurgaliyev Shakhizat)</li> </ul> June 11, 2024 <p>Topics Covered:</p> <ul> <li>Agent Studio</li> <li>HomeAssistant 2024.6</li> <li>AWS IoT Greengrass (Romil Shah)</li> </ul> May 29, 2024 <p>Topics Covered:</p> <ul> <li>OpenAI-style Tools with NousResearch/Hermes-2-Pro-Llama-3-8B</li> <li>Jetson Copilot with jetrag</li> <li>whisper_trt for Orin Nano</li> </ul> May 15, 2024 <p>Topics Covered:</p> <ul> <li>VILA-1.5 on Video Sequences</li> <li>Voicecraft Container (Martin Cerven)</li> <li>JetBot / Nanosaur Updates for Orin Nano (Chitoku Yato &amp; Raffaello Bonghi)</li> <li>Controller LLM &amp; Advanced Function Calling (<code>NousResearch/Hermes-2-Pro-Llama-3-8B</code>)</li> <li>RAG Samples with LlamaIndex (Chitoku Yato)</li> </ul> May 1, 2024 <p>Topics Covered:</p> <ul> <li>Function Calling with Llama-3</li> <li>Home Assistant / Wyoming (Mieszko Syty)</li> <li>Smart Sorting / Recycling (Alvaro Costa)</li> </ul> April 17, 2024 <p>Topics Covered:</p> <ul> <li>Ollama Support for Jetson Devices <li>Home Assistant Integration</li> <li><code>jetson-container</code> Updates</li> <li>Upgrading JetBot with Orin Nano</li> April 3, 2024 <p>Project Kickoffs:</p> <ul> <li>Home Assistant Integration</li> <li>Agent Controller LLM <li>ML DevOps, Containers, Core Inferencing</li>"},{"location":"research.html#team-members","title":"Team Members","text":"<p>Below are shown some of the sustaining members of the group who have been working on generative AI in edge computing:</p>              Dustin Franklin, NVIDIA                  Principal Engineer | Pittsburgh, PA                 (jetson-inference, jetson-containers)                           Nurgaliyev Shakhizat                  Institute of Smart Systems and AI | Kazakhstan                 (Assistive Devices, Vision2Audio, HPC)                           Kris Kersey, Kersey Fabrications                  Embedded Software Engineer | Atlanta, GA                 (The OASIS Project, AR/VR, 3D Fabrication)                           Johnny N\u00fa\u00f1ez Cano                  PhD Researcher in CV/AI | Barcelona, Spain                 (Recurrent LLMs, Pose &amp; Behavior Analysis)                           Doruk S\u00f6nmez, ConnectTech                  Intelligent Video Analytics Engineer | Turkey                 (NVIDIA DLI Certified Instructor, IVA, VLM)                           Akash James, Spark Cognition                  AI Architect, UC Berkeley Researcher | Oakland                 (NVIDIA AI Ambassador, Personal Assistants)                           Mieszko Syty, MS/1 Design                  AI/ML Engineer | Warsaw, Poland                 (LLM, Home Assistants, ML DevOps)                           Jim Benson, JetsonHacks                  DIY Extraordinaire | Los Angeles, CA                 (AI in Education, RACECAR/J)                           Chitoku Yato, NVIDIA                  Jetson AI DevTech | Santa Clara, CA                 (JetBot, JetRacer, MinDisk, Containers)                           Dana Sheahen, NVIDIA                  DLI Curriculum Developer | Santa Clara, CA                 (AI in Education, Jetson AI Fundamentals)                           Sammy Ochoa, NVIDIA                  Jetson AI DevTech | Austin, TX                 (Metropolis Microservices)                           John Welsh, NVIDIA                  (NanoOWL, NanoSAM, JetBot, JetRacer, torch2trt, trt_pose, Knowledge Distillation)                           Dave Niewinski Dave's Armoury | Waterloo, Ontario                 (GLaDOS, Fetch, Offroad La-Z-Boy, KUKA Bot)                           Gary Hilgemann, REBOTNIX                  CEO &amp; AI Roboticist | L\u00fcnen, Germany                 (GUSTAV, SPIKE, VisionTools, GenAI)                           Elaine Wu, Seeed Studio                  AI &amp; Robotics Partnerships | Shenzhen, China                 (reComputer, YOLOv8, LocalJARVIS, Voice Bot)                           Patty Delafuente, NVIDIA                  Data Scientist &amp; UMBC PhD Student | MD                 (AI in Education, DLI Robotics Teaching Kit)                           Song Han, MIT HAN Lab NVIDIA Research | Cambridge, MA                 (Efficient Large Models, AWQ, VILA)                           Bryan Hughes, Mimzy AI                  Founder, Entrepreneur | SF Bay Area                 (Multimodal Assistants, AI at the Edge)                           Tianqi Chen, CMU Catalyst OctoML, CTO | Seattle, WA                 (MLC, Apache TVM, XGBoost)                           Michael Gr\u00fcner, RidgeRun                  Team Lead / Architect | Costa Rica                 (Embedded Vision &amp; AI, Multimedia)                           Jesse Flot, CMU Robotics Academy                  Co-Director | Pittsburgh, PA                 (Applied AI &amp; Robotics, Autonomy Foundations)                           Paul DeCarlo, Microsoft                  Professor | University of Houston                 (Azure IoT, Video Analytics, Microsoft JARVIS)                           Mike Hansen, Nabu Casa                  Voice Engineering Lead | Urbandale, Iowa                 (Home Assistant, Piper TTS, Wyoming)              Lakshantha Dissanayake, Ultralytics                  Embedded CV Engineer | Vancouver, BC                 (YOLOv8, TensorRT, DeepStream)                           Kerry Shih, GenAI Nerds                  Founder, CEO | Los Angeles, CA                 (Gapi)                           Ziad Youssfi, CMU                  ECE Professor | Pittsburgh, PA                 (ML in Robotics &amp; Embedded Systems)                           Walter Lucetti, Stereolabs                  Robotics &amp; Vision Engineer | Paris, France                 (MyzharBot, ROS2, GStreamer)                           Raffaello Bonghi, NVIDIA                  AI &amp; Robotics Engineer | Manchester, UK                 (Nanosaur, Panther, jetson-stats)                           Alvaro Costa, ANS Group                  AI &amp; Robotics Lead | Epsom, UK                 (TeknTrash, StipraPOD)                           David Pearson, ConnectTech                  Embedded Systems Engineer | Ontario, CA                 (Edge AI Systems, Vision/Language Models)                           Jason Seawall, Numurus                  CEO | Seattle, WA                 (NEPI, Edge AI &amp; Automation)                           Martin Cerven                  AI Researcher | Germany                 (Autonomous Robotics, Voicecraft)                           Romil Shah, Amazon                  GenAI IIoT @ AWS | San Jose, CA                 (<code>aws-samples/genai-at-edge</code>)                           Kabilan Kb, Roboticist                  NVIDIA DLI Ambassador | Tamil Nadu, IN                 (ROS2 Tutorials, Autonomous Wheelchair)"},{"location":"ros.html","title":"ROS2 Nodes for Generative AI","text":"<p>The <code>ros2_nanollm</code> package provides ROS2 nodes for running optimized LLM's and VLM's locally inside a container.  These are built on NanoLLM and ROS2 Humble for deploying generative AI models onboard your robot with Jetson.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm:humble</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"ros.html#running-the-live-demo","title":"Running the Live Demo","text":"<p>Recommended</p> <p>Before you start, please review NanoVLM and Live LLaVa demos.  For primary documentation, view ROS2 NanoLLM.</p> <ol> <li> <p>Ensure you have a camera device connected</p> <pre><code>ls /dev/video*\n</code></pre> </li> <li> <p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image. </p> <pre><code>jetson-containers run $(autotag nano_llm:humble) \\\n    ros2 launch ros2_nanollm camera_input_example.launch.py\n</code></pre> <p>This command will start the launch file of the container. </p> </li> </ol> <p>By default this will load the <code>Efficient-Large-Model/Llama-3-VILA1.5-8B</code> VLM and publish the image captions and overlay to topics that can be subscribed to by your other nodes, or visualized with RViz or Foxglove.  Refer to the <code>ros2_nanollm</code> repo for documentation on the input/output topics that are exposed.</p>"},{"location":"ros.html#build-your-own-ros-nodes","title":"Build your own ROS Nodes","text":"<p>To build your own ROS2 node using LLM or VLM, first create a ROS 2 workspace and package in a directory mounted to the container (following the ROS 2 Humble Documentation).  Your src folder should then look like this: </p> <pre><code>    \u2514\u2500\u2500 src    \n        \u2514\u2500\u2500 your-package-name\n            \u251c\u2500\u2500 launch     \n                    \u2514\u2500\u2500 camera_input.launch.py\n            \u251c\u2500\u2500 resource\n                    \u2514\u2500\u2500 your-package-name\n            \u251c\u2500\u2500 your-package-name\n                    \u2514\u2500\u2500 __init__.py \n                    \u2514\u2500\u2500 your-node-name_py.py\n            \u251c\u2500\u2500 test\n                    \u2514\u2500\u2500 test_copyright.py\n                    \u2514\u2500\u2500 test_flake8.py\n                    \u2514\u2500\u2500 test_pep257.py\n            \u251c\u2500\u2500 package.xml\n            \u251c\u2500\u2500 setup.cfg\n            \u251c\u2500\u2500 setup.py\n            \u2514\u2500\u2500 README.md\n</code></pre> <p>We will create the launch folder, as well as the camera_input.launch.py and your-node-name_py.py files in later steps. </p>"},{"location":"ros.html#editing-the-setup","title":"Editing the Setup","text":"<p>Let\u2019s begin by editing the <code>setup.py</code> file. At the top of the file, add </p> <pre><code>from glob import glob \n</code></pre> <p>In the setup method, find the <code>data_files=[]</code> line, and make sure it looks like this: </p> <pre><code>data_files=[\n       ('share/ament_index/resource_index/packages',\n           ['resource/' + package_name]),\n       ('share/' + package_name, ['package.xml']),\n   ('share/' + package_name, glob('launch/*.launch.py')),\n   ],\n</code></pre> <p>Edit the maintainer line with your name. Edit the maintainer email to your email. Edit the description line to describe your package. </p> <pre><code>maintainer='kshaltiel', \nmaintainter_email='kshaltiel@nvidia.com', \ndescription='YOUR DESCRIPTION',  \n</code></pre> <p>Find the <code>console_scripts</code> line in the entry_points method. Edit the inside to be: </p> <pre><code>'your-node-name_py = your-package-name.your-node-name_py:main'\n</code></pre> <p>For example:  <pre><code>entry_points={\n       'console_scripts': [\n       'nano_llm_py = ros2_nanollm.nano_llm_py:main'\n       ],\n   },\n</code></pre> All done for this file!</p>"},{"location":"ros.html#creating-the-node","title":"Creating the Node","text":"<p>Inside your package, under the folder that shares your package's name and contains the <code>__init__.py</code> file, create a file named after your node. For NanoLLM, this file would be called <code>nano_llm_py.py</code>. </p> <p>Paste the following code into the empty file: </p> <pre><code>import rclpy \nfrom std_msgs.msg import String\nfrom sensor_msgs.msg import Image\nfrom cv_bridge import CvBridge\nfrom PIL import Image as im\nfrom MODEL_NAME import NECESSARY_MODULES\n\nclass Your_Model_Subscriber(Node):\n\n    def __init__(self):\n        super().__init__('your_model_subscriber')\n\n        #EDIT PARAMETERS HERE \n        self.declare_parameter('param1', \"param1_value\") \n        self.declare_parameter('param2', \"param2_value\")\n\n        # Subscriber for input query\n        self.query_subscription = self.create_subscription(\n            String,\n            'input_query',\n            self.query_listener_callback,\n            10)\n        self.query_subscription  # prevent unused variable warning\n\n        # Subscriber for input image\n        self.image_subscription = self.create_subscription(\n            Image,\n            'input_image',\n            self.image_listener_callback,\n            10)\n        self.image_subscription  # prevent unused variable warning\n\n        # To convert ROS image message to OpenCV image\n        self.cv_br = CvBridge() \n\n        #LOAD THE MODEL\n        self.model = INSERT_MODEL.from_pretrained(\"PATH-TO-MODEL\")\n\n        #chatHistory var \n        self.chat_history = ChatHistory(self.model)\n\n        ##  PUBLISHER\n        self.output_publisher = self.create_publisher(String, 'output', 10)\n        self.query = \"Describe the image.\"\n\n    def query_listener_callback(self, msg):\n        self.query = msg.data\n\n    def image_listener_callback(self, data): \n        input_query = self.query\n\n        # call model with input_query and input_image \n        cv_img = self.cv_br.imgmsg_to_cv2(data, 'rgb8')\n        PIL_img = im.fromarray(cv_img)\n\n        # Parsing input text prompt\n        prompt = input_query.strip(\"][()\")\n        text = prompt.split(',')\n        self.get_logger().info('Your query: %s' % text) #prints the query\n\n        #chat history \n        self.chat_history.append('user', image=PIL_img)\n        self.chat_history.append('user', prompt, use_cache=True)\n        embedding, _ = self.chat_history.embed_chat()\n\n    #GENERATE OUTPUT\n        output = self.model.generate(\n            inputs=embedding,\n            kv_cache=self.chat_history.kv_cache,\n            min_new_tokens = 10,\n            streaming = False, \n            do_sample = True,\n        )\n\n        output_msg = String()\n        output_msg.data = output\n        self.output_publisher.publish(output_msg)\n        self.get_logger().info(f\"Published output: {output}\")\n\ndef main(args=None):\n    rclpy.init(args=args)\n\n    your_model_subscriber = Your_Model_Subscriber()\n\n    rclpy.spin(your_model_subscriber)\n\n    # Destroy the node explicitly\n    # (optional - otherwise it will be done automatically\n    # when the garbage collector destroys the node object)\n    nano_llm_subscriber.destroy_node()\n    rclpy.shutdown()\n\nif __name__ == '__main__':\n    main()\n</code></pre> <p>Edit the import statement at the top of the file to import the necessary modules from the model. </p> <p>Next, edit the class name and name inside the <code>__init__()</code> function to reflect the model that will be used. </p> <p>Find the comment that reads <code>#EDIT PARAMETERS HERE</code>. Declare all parameters except for the model name following the format in the file. Under the <code>#LOAD THE MODEL</code> comment, include the path to the model. </p> <p>Lastly, edit the generate method under the <code>GENERATE OUTPUT</code> comment to include any additional parameters. </p> <p>All done for this file!</p>"},{"location":"ros.html#creating-the-launch-file","title":"Creating the Launch File","text":"<p>Inside your package, create the launch folder. Create your launch file inside of it. </p> <pre><code>mkdir launch\ncd launch \ntouch camera_input.launch.py\n</code></pre> <p>You can edit this file externally, and it will update within the container. Paste the following code into the empty file. </p> <pre><code>from launch import LaunchDescription\nfrom launch_ros.actions import Node\nfrom launch.substitutions import LaunchConfiguration\nfrom launch.actions import DeclareLaunchArgument\n\ndef generate_launch_description():\n    launch_args = [\n        DeclareLaunchArgument( \n            'param1',\n            default_value='param1_default',\n            description='Description of param1'),\n        DeclareLaunchArgument(\n            'param2',\n            default_value='param2_default',\n            description='Description of param2'),\n    ]\n\n\n    #Your model parameters \n    param1 = LaunchConfiguration('param1')\n    param2 = LaunchConfiguration('param2')\n\n    #camera node for camera input\n    cam2image_node = Node(\n            package='image_tools',\n            executable='cam2image',\n            remappings=[('image', 'input_image')],\n    )\n\n    #model node\n    model_node = Node(\n            package='your-package-name', #make sure your package is named this\n            executable='your-node-name_py', \n            parameters=[{\n                'param1': param1, \n                'param2': param2,\n            }]\n    )\n\n    final_launch_description = launch_args + [cam2image_node] + [model_node]\n\n    return LaunchDescription(final_launch_description)\n</code></pre> <p>Find the required parameters for your model. You can view this by looking at the Model API for your specific model and taking note to how the model is called. For example, NanoLLM retrieves models through the following: </p> <pre><code>model = NanoLLM.from_pretrained(\n   \"meta-llama/Llama-3-8b-hf\",  # HuggingFace repo/model name, or path to HF model checkpoint\n   api='mlc',                   # supported APIs are: mlc, awq, hf\n   quantization='q4f16_ft'      # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n</code></pre> <p>The parameters for NanoLLM would be the model name, api, and quantization. </p> <p>In the <code>generate_launch_description</code> function, edit the <code>DeclareLaunchArgument</code> to accomodate for all parameters except the model name. For NanoLLM, this would look like: </p> <pre><code>def generate_launch_description():\n    launch_args = [\n        DeclareLaunchArgument( \n            'api',\n            default_value='mlc',\n            description='The model backend to use'),\n        DeclareLaunchArgument(\n            'quantization',\n            default_value='q4f16_ft',\n            description='The quantization method to use'),\n    ]\n</code></pre> <p>Then edit the lines under <code>#Your model Parameters</code> to match the parameters of your model, again excluding the model name. Lastly, fill in the code under the <code>#model node</code> comment with your package name, the name of your node file, and all of your parameters, this time including the model. </p> <p>All done for this file!</p>"},{"location":"tensorrt_llm.html","title":"TensorRT-LLM for Jetson","text":"<p>TensorRT-LLM is a high-performance LLM inference library with advanced quantization, attention kernels, and paged KV caching.  Initial support for building TensorRT-LLM from source for JetPack 6.1 has been included in the <code>v0.12.0-jetson</code> branch of the TensorRT-LLM repo for Jetson AGX Orin.</p> <p></p> <p>We've provided pre-compiled TensorRT-LLM wheels and containers along with this guide for <code>TensorRT-LLM Deployment on Jetson Orin</code></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin *Support for other Orin devices is currently undergoing testing.</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6.1 (L4T r36.4)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>18.5GB</code> for <code>tensorrt_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tensorrt_llm.html#building-tensorrt-llm-engine-for-llama","title":"Building TensorRT-LLM Engine for Llama","text":"<p>You can find the steps for converting Llama to TensorRT-LLM under <code>examples/llama</code> in the repo, and also in the documentation.  This script will automate the process for Llama-7B with INT4 quantization applied, and run some generation and performance checks on the model:</p> <pre><code>jetson-containers run \\\n  -e HUGGINGFACE_TOKEN=YOUR_API_KEY \\\n  -e FORCE_BUILD=on \\\n  dustynv/tensorrt_llm:0.12-r36.4.0 \\\n    /opt/TensorRT-LLM/llama.sh\n</code></pre> <p>There are many such conversion procedures outlined in the TensorRT-LLM examples for different model architectures.  </p>"},{"location":"tensorrt_llm.html#openai-api-endpoint","title":"OpenAI API Endpoint","text":"<p>TensorRT-LLM has programming APIs for Python and C++ available, but it also includes an example server endpoint for the OpenAI protocol that makes it easy to substitute for other local or cloud model backends.  </p> <p>This will start the TensorRT-LLM container with the server and model that you built above:</p> <pre><code>jetson-containers run \\\n  dustynv/tensorrt_llm:0.12-r36.4.0 \\\n  python3 /opt/TensorRT-LLM/examples/apps/openai_server.py \\\n    /data/models/tensorrt_llm/Llama-2-7b-chat-hf-gptq\n</code></pre> <p>Then you can make chat completion requests against it from practically any language or from any connected device.  This example shows a simple way of testing it initially from another terminal with curl:</p> <pre><code>curl http://localhost:8000/v1/completions \\\n    -H \"Content-Type: application/json\" \\\n    -d '{\n        \"model\": &lt;model_name&gt;,\n        \"prompt\": \"Where is New York?\",\n        \"max_tokens\": 16,\n        \"temperature\": 0\n    }'\n</code></pre> <p>Or the code included with openai_client.py will handle these requests using the standard <code>openai-python</code> package that can be installed outside of the container or on another machine.</p> <pre><code>jetson-containers run \\\n  --workdir /opt/TensorRT-LLM/examples/apps \\\n  dustynv/tensorrt_llm:0.12-r36.4.0 \\\n    python3 openai_client.py --prompt \"Where is New York?\" --api chat\n</code></pre> <p>The patches in the branch above for TensorRT-LLM 0.12 are a preview release for Jetson AGX Orin, and we'll continue with validating and testing the various settings in TensorRT-LLM.  If you need any support, please post to the Jetson Developer Forums..</p>"},{"location":"tips_ram-optimization.html","title":"RAM Optimization","text":"<p>Running a LLM requires a huge RAM space.</p> <p>Especially if you are on Jetson Orin Nano that only has 8GB of RAM, it is crucial to leave as much RAM space available for models. </p> <p>Here we share a couple of ways to optimize the system RAM usage. </p>"},{"location":"tips_ram-optimization.html#disabling-the-desktop-gui","title":"Disabling the Desktop GUI","text":"<p>If you use your Jetson remotely through SSH, you can disable the Ubuntu desktop GUI. This will free up extra memory that the window manager and desktop uses (around ~800MB for Unity/GNOME).</p> <p>You can disable the desktop temporarily, run commands in the console, and then re-start the desktop when desired:</p> <pre><code>$ sudo init 3     # stop the desktop\n# log your user back into the console (Ctrl+Alt+F1, F2, ect)\n$ sudo init 5     # restart the desktop\n</code></pre> <p>If you wish to make this persistent across reboots, you can use the following commands to change the boot-up behavior:</p> <ul> <li> <p>To disable desktop on boot</p> <pre><code>sudo systemctl set-default multi-user.target\n</code></pre> </li> <li> <p>To enable desktop on boot</p> <pre><code>sudo systemctl set-default graphical.target\n</code></pre> </li> </ul>"},{"location":"tips_ram-optimization.html#disabling-misc-services","title":"Disabling misc services","text":"<pre><code>sudo systemctl disable nvargus-daemon.service\n</code></pre>"},{"location":"tips_ram-optimization.html#mounting-swap","title":"Mounting Swap","text":"<p>If you're building containers or working with large models, it's advisable to mount SWAP (typically correlated with the amount of memory in the board). Run these commands to disable ZRAM and create a swap file:</p> <p>If you have NVMe SSD storage available, it's preferred to allocate the swap file on the NVMe SSD.</p> <pre><code>sudo systemctl disable nvzramconfig\nsudo fallocate -l 16G /ssd/16GB.swap\nsudo mkswap /ssd/16GB.swap\nsudo swapon /ssd/16GB.swap\n</code></pre> <p>Then add the following line to the end of /etc/fstab to make the change persistent:</p> <pre><code>/ssd/16GB.swap  none  swap  sw 0  0\n</code></pre>"},{"location":"tips_ssd-docker.html","title":"Tips - SSD + Docker","text":"<p>Once you have your Jetson set up by flashing the latest Jetson Linux (L4T) BSP on it or by flashing the SD card with the whole JetPack image, before embarking on testing out all the great generative AI application using <code>jetson-containers</code>, you want to make sure you have a huge storage space for all the containers and the models you will download.  </p> <p>We are going to show how you can install SSD on your Jetson, and set it up for Docker.</p>"},{"location":"tips_ssd-docker.html#ssd","title":"SSD","text":""},{"location":"tips_ssd-docker.html#physical-installation","title":"Physical installation","text":"<ol> <li>Unplug power and any peripherals from the Jetson developer kit.</li> <li>Physically install an NVMe SSD card on the carrier board of your Jetson developer kit, making sure to properly seat the connector and secure with the screw.</li> <li>Reconnect any peripherals, and then reconnect the power supply to turn on the Jetson developer kit.</li> <li> <p>Once the system is up, verify that your Jetson identifies a new memory controller on PCI bus:</p> <pre><code>lspci\n</code></pre> <p>The output should look like the following:</p> <pre><code>0007:01:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1322 (rev 02)\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#format-and-set-up-auto-mount","title":"Format and set up auto-mount","text":"<ol> <li> <p>Run <code>lsblk</code> to find the device name.</p> <pre><code>lsblk\n</code></pre> <p>The output should look like the following:</p> <pre><code>NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT\nloop0          7:0    0    16M  1 loop \nmmcblk1      179:0    0  59.5G  0 disk \n\u251c\u2500mmcblk1p1  179:1    0    58G  0 part /\n\u251c\u2500mmcblk1p2  179:2    0   128M  0 part \n\u251c\u2500mmcblk1p3  179:3    0   768K  0 part \n\u251c\u2500mmcblk1p4  179:4    0  31.6M  0 part \n\u251c\u2500mmcblk1p5  179:5    0   128M  0 part \n\u251c\u2500mmcblk1p6  179:6    0   768K  0 part \n\u251c\u2500mmcblk1p7  179:7    0  31.6M  0 part \n\u251c\u2500mmcblk1p8  179:8    0    80M  0 part \n\u251c\u2500mmcblk1p9  179:9    0   512K  0 part \n\u251c\u2500mmcblk1p10 179:10   0    64M  0 part \n\u251c\u2500mmcblk1p11 179:11   0    80M  0 part \n\u251c\u2500mmcblk1p12 179:12   0   512K  0 part \n\u251c\u2500mmcblk1p13 179:13   0    64M  0 part \n\u2514\u2500mmcblk1p14 179:14   0 879.5M  0 part \nzram0        251:0    0   1.8G  0 disk [SWAP]\nzram1        251:1    0   1.8G  0 disk [SWAP]\nzram2        251:2    0   1.8G  0 disk [SWAP]\nzram3        251:3    0   1.8G  0 disk [SWAP]\nnvme0n1      259:0    0 238.5G  0 disk \n</code></pre> <p>Identify the device corresponding to your SSD. In this case, it is <code>nvme0n1</code>.</p> </li> <li> <p>Format the SSD, create a mount point, and mount it to the filesystem.</p> <pre><code>sudo mkfs.ext4 /dev/nvme0n1\n</code></pre> <p>You can choose any name for the mount point directory. We use <code>/ssd</code> here, but in <code>jetson-containers</code>' setup.md documentation, <code>/mnt</code> is used.  </p> <pre><code>sudo mkdir /ssd\n</code></pre> <pre><code>sudo mount /dev/nvme0n1 /ssd\n</code></pre> </li> <li> <p>In order to ensure that the mount persists after boot, add an entry to the <code>fstab</code> file:</p> <p>First, identify the UUID for your SSD:</p> <pre><code>lsblk -f\n</code></pre> <p>Then, add a new entry to the <code>fstab</code> file:</p> <pre><code>sudo vi /etc/fstab\n</code></pre> <p>Insert the following line, replacing the UUID with the value found from <code>lsblk -f</code>:</p> <pre><code>UUID=************-****-****-****-******** /ssd/ ext4 defaults 0 2\n</code></pre> </li> <li> <p>Finally, change the ownership of the <code>/ssd</code> directory.</p> <pre><code>sudo chown ${USER}:${USER} /ssd\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#docker","title":"Docker","text":"<ol> <li> <p>Install <code>nvidia-container</code> package.</p> <p>Note: If you used an NVIDIA-supplied SD card image to flash your SD card, all necessary JetPack components (including <code>nvidia-containers</code>) and Docker are already pre-installed, so this step can be skipped.</p> <pre><code>sudo apt update\nsudo apt install -y nvidia-container\n</code></pre> <p>JetPack 6.0 DP users</p> <p>If you flash  Jetson Linux (L4T) R36.2 (JetPack 6.0 DP) on your Jetson using SDK Manager, and install <code>nvidia-container</code> using <code>apt</code>, on JetPack 6.0 it no longer automatically installs Docker.</p> <p>Therefore, you need to run the following to manually install Docker and set it up.</p> <pre><code>sudo apt update\nsudo apt install -y nvidia-container curl\ncurl https://get.docker.com | sh &amp;&amp; sudo systemctl --now enable docker\nsudo nvidia-ctk runtime configure --runtime=docker\n</code></pre> </li> <li> <p>Restart the Docker service and add your user to the <code>docker</code> group, so that you don't need to use the command with <code>sudo</code>.</p> <pre><code>sudo systemctl restart docker\nsudo usermod -aG docker $USER\nnewgrp docker\n</code></pre> </li> <li> <p>Add default runtime in <code>/etc/docker/daemon.json</code></p> <pre><code>sudo vi /etc/docker/daemon.json\n</code></pre> <p>Insert the <code>\"default-runtime\": \"nvidia\"</code> line as following:</p> <pre><code>{\n    \"runtimes\": {\n        \"nvidia\": {\n            \"path\": \"nvidia-container-runtime\",\n            \"runtimeArgs\": []\n        }\n    },\n    \"default-runtime\": \"nvidia\"\n}\n</code></pre> </li> <li> <p>Restart Docker</p> <pre><code>sudo systemctl daemon-reload &amp;&amp; sudo systemctl restart docker\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#migrate-docker-directory-to-ssd","title":"Migrate Docker directory to SSD","text":"<p>Now that the SSD is installed and available on your device, you can use the extra storage capacity to hold the storage-demanding Docker directory.</p> <ol> <li> <p>Stop the Docker service.</p> <pre><code>sudo systemctl stop docker\n</code></pre> </li> <li> <p>Move the existing Docker folder</p> <pre><code>sudo du -csh /var/lib/docker/ &amp;&amp; \\\n    sudo mkdir /ssd/docker &amp;&amp; \\\n    sudo rsync -axPS /var/lib/docker/ /ssd/docker/ &amp;&amp; \\\n    sudo du -csh  /ssd/docker/ \n</code></pre> </li> <li> <p>Edit <code>/etc/docker/daemon.json</code></p> <pre><code>sudo vi /etc/docker/daemon.json\n</code></pre> <p>Insert <code>\"data-root\"</code> line like the following.</p> <pre><code>{\n    \"runtimes\": {\n        \"nvidia\": {\n            \"path\": \"nvidia-container-runtime\",\n            \"runtimeArgs\": []\n        }\n    },\n    \"default-runtime\": \"nvidia\",\n    \"data-root\": \"/ssd/docker\"\n}\n</code></pre> </li> <li> <p>Rename the old Docker data directory</p> <pre><code>sudo mv /var/lib/docker /var/lib/docker.old\n</code></pre> </li> <li> <p>Restart the docker daemon</p> <pre><code>sudo systemctl daemon-reload &amp;&amp; \\\n    sudo systemctl restart docker &amp;&amp; \\\n    sudo journalctl -u docker\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#test-docker-on-ssd","title":"Test Docker on SSD","text":"<ol> <li> <p>[Terminal 1] First, open a terminal to monitor the disk usage while pulling a Docker image.</p> <pre><code>watch -n1 df \n</code></pre> </li> <li> <p>[Terminal 2] Next, open a new terminal and start Docker pull.</p> <pre><code>docker pull nvcr.io/nvidia/l4t-base:r35.2.1\n</code></pre> </li> <li> <p>[Terminal 1] Observe that the disk usage on <code>/ssd</code> goes up as the container image is downloaded and extracted.</p> <pre><code>~$ docker image ls\nREPOSITORY                  TAG       IMAGE ID       CREATED        SIZE\nnvcr.io/nvidia/l4t-base     r35.2.1   dc07eb476a1d   7 months ago   713MB\n</code></pre> </li> </ol>"},{"location":"tips_ssd-docker.html#final-verification","title":"Final Verification","text":"<p>Reboot your Jetson, and verify that you observe the following:</p> <pre><code>~$ sudo blkid | grep nvme\n/dev/nvme0n1: UUID=\"9fc06de1-7cf3-43e2-928a-53a9c03fc5d8\" TYPE=\"ext4\"\n\n~$ df -h\nFilesystem      Size  Used Avail Use% Mounted on\n/dev/mmcblk1p1  116G   18G   94G  16% /\nnone            3.5G     0  3.5G   0% /dev\ntmpfs           3.6G  108K  3.6G   1% /dev/shm\ntmpfs           734M   35M  699M   5% /run\ntmpfs           5.0M  4.0K  5.0M   1% /run/lock\ntmpfs           3.6G     0  3.6G   0% /sys/fs/cgroup\ntmpfs           734M   88K  734M   1% /run/user/1000\n/dev/nvme0n1    458G  824M  434G   1% /ssd\n\n~$ docker info | grep Root\n Docker Root Dir: /ssd/docker\n\n~$ sudo ls -l /ssd/docker/\ntotal 44\ndrwx--x--x  4 root root 4096 Mar 22 11:44 buildkit\ndrwx--x---  2 root root 4096 Mar 22 11:44 containers\ndrwx------  3 root root 4096 Mar 22 11:44 image\ndrwxr-x---  3 root root 4096 Mar 22 11:44 network\ndrwx--x--- 13 root root 4096 Mar 22 16:20 overlay2\ndrwx------  4 root root 4096 Mar 22 11:44 plugins\ndrwx------  2 root root 4096 Mar 22 16:19 runtimes\ndrwx------  2 root root 4096 Mar 22 11:44 swarm\ndrwx------  2 root root 4096 Mar 22 16:20 tmp\ndrwx------  2 root root 4096 Mar 22 11:44 trust\ndrwx-----x  2 root root 4096 Mar 22 16:19 volumes\n\n~$ sudo du -chs /ssd/docker/\n752M    /ssd/docker/\n752M    total\n\n~$ docker info | grep -e \"Runtime\" -e \"Root\"\n Runtimes: io.containerd.runtime.v1.linux nvidia runc io.containerd.runc.v2\n Default Runtime: nvidia\n Docker Root Dir: /ssd/docker\n</code></pre> <p>Your Jetson is now set up with the SSD!</p>"},{"location":"try.html","title":"Try","text":"<p>Jump to NVIDIA Jetson Store.</p>"},{"location":"tutorial-intro.html","title":"Tutorial - Introduction","text":""},{"location":"tutorial-intro.html#overview","title":"Overview","text":"<p>Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated.</p>"},{"location":"tutorial-intro.html#text-llm","title":"Text (LLM)","text":"text-generation-webui Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui Ollama Get started effortlessly deploying GGUF models for chat and web UI llamaspeak Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! NanoLLM Optimized inferencing library for LLMs, multimodal agents, and speech. Small LLM (SLM) Deploy Small Language Models (SLM) with reduced memory usage and higher throughput. API Examples Learn how to write Python code for doing LLM inference using popular APIs."},{"location":"tutorial-intro.html#text-vision-vlm","title":"Text + Vision (VLM)","text":"<p>Give your locally running LLM an access to vision!</p> LLaVA Different ways to run LLaVa vision/language model on Jetson for visual understanding. Live LLaVA Run multimodal models interactively on live video streams over a repeating set of prompts. NanoVLM Use mini vision/language models and the optimized multimodal pipeline for live streaming. Llama 3.2 Vision Run Meta's multimodal Llama-3.2-11B-Vision model on Orin with HuggingFace Transformers."},{"location":"tutorial-intro.html#vision-transformers","title":"Vision Transformers","text":"EfficientVIT MIT Han Lab's EfficientViT, Multi-Scale Linear Attention for High-Resolution Dense Prediction NanoOWL OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT NanoSAM NanoSAM, SAM model variant capable of running in real-time on Jetson SAM Meta's SAM, Segment Anything model TAM TAM, Track-Anything model, is an interactive tool for video object tracking and segmentation"},{"location":"tutorial-intro.html#image-generation","title":"Image Generation","text":"Flux + ComfyUI Set up and run the ComfyUI with Flux model for image generation on Jetson Orin. Stable Diffusion Run AUTOMATIC1111's <code>stable-diffusion-webui</code> to generate images from prompts SDXL Ensemble pipeline consisting of a base model and refiner with enhanced image generation. nerfstudio Experience neural reconstruction and rendering with nerfstudio and onboard training."},{"location":"tutorial-intro.html#audio","title":"Audio","text":"Whisper OpenAI's Whisper, pre-trained model for automatic speech recognition (ASR) AudioCraft Meta's AudioCraft, to produce high-quality audio and music Voicecraft Interactive speech editing and zero shot TTS"},{"location":"tutorial-intro.html#rag-vector-database","title":"RAG &amp; Vector Database","text":"NanoDB Interactive demo to witness the impact of Vector Database that handles multimodal data LlamaIndex Realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents LlamaIndex Reference application for building your own local AI assistants using LLM, RAG, and VectorDB"},{"location":"tutorial-intro.html#api-integrations","title":"API Integrations","text":"ROS2 Nodes Optimized LLM and VLM provided as ROS2 nodes for robotics Holoscan SDK Use the Holoscan-SDK to run high-throughput, low-latency edge AI pipelines Jetson Platform Services Quickly build microservice driven vision applications with Jetson Platform Services Gapi Workflows Integrating generative AI into real world environments Gapi Micro Services Wrapping models and code to participate in systems Ultralytics YOLOv8 Run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT."},{"location":"tutorial-intro.html#about-nvidia-jetson","title":"About NVIDIA Jetson","text":"<p>Note</p> <p>We are mainly targeting Jetson Orin generation devices for deploying the latest LLMs and generative AI models.</p> Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin Developer Kit Jetson Orin Nano Developer Kit GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores RAM(CPU+GPU) 64GB 32GB 8GB Storage 64GB eMMC (+ NVMe SSD) microSD card (+ NVMe SSD)"},{"location":"tutorial_api-examples.html","title":"Tutorial - API Examples","text":"<p>It's good to know the code for generating text with LLM inference, and ancillary things like tokenization, chat templates, and prompting.  On this page we give Python examples of running various LLM APIs, and their benchmarks.</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35) JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>l4t-text-generation</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_api-examples.html#transformers","title":"Transformers","text":"<p>The HuggingFace Transformers API is the de-facto API that models are released for, often serving as the reference implementation.  It's not terribly fast, but it does have broad model support, and also supports quantization (AutoGPTQ, AWQ).  This uses streaming:</p> <pre><code>from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer\nfrom threading import Thread\n\nmodel_name='meta-llama/Llama-2-7b-chat-hf'\nmodel = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda')\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nstreamer = TextIteratorStreamer(tokenizer)\n\nprompt = [{'role': 'user', 'content': 'Can I get a recipe for French Onion soup?'}]\ninputs = tokenizer.apply_chat_template(\n    prompt,\n    add_generation_prompt=True,\n    return_tensors='pt'\n).to(model.device)\n\nThread(target=lambda: model.generate(inputs, max_new_tokens=256, streamer=streamer)).start()\n\nfor text in streamer:\n    print(text, end='', flush=True)\n</code></pre> <p>To run this (it can be found here), you can mount a directory containing the script or your jetson-containers directory:</p> <pre><code>jetson-containers run --volume $PWD/packages/llm:/mount --workdir /mount \\\n  $(autotag l4t-text-generation) \\\n    python3 transformers/test.py\n</code></pre> <p>We use the <code>l4t-text-generation</code> container because it includes the quantization libraries in addition to Transformers, for running the quanztized versions of the models like <code>TheBloke/Llama-2-7B-Chat-GPTQ</code></p>"},{"location":"tutorial_api-examples.html#benchmarks","title":"Benchmarks","text":"<p>The <code>huggingface-benchmark.py</code> script will benchmark the models:</p> <pre><code>./run.sh --volume $PWD/packages/llm/transformers:/mount --workdir /mount \\\n  $(./autotag l4t-text-generation) \\\n    python3 huggingface-benchmark.py --model meta-llama/Llama-2-7b-chat-hf\n</code></pre> <pre><code>* meta-llama/Llama-2-7b-chat-hf  AVG = 20.7077 seconds,  6.2 tokens/sec  memory=10173.45 MB\n* TheBloke/Llama-2-7B-Chat-GPTQ  AVG = 12.3922 seconds, 10.3 tokens/sec  memory=7023.36 MB\n* TheBloke/Llama-2-7B-Chat-AWQ   AVG = 11.4667 seconds, 11.2 tokens/sec  memory=4662.34 MB\n</code></pre>"},{"location":"tutorial_api-examples.html#nanollm","title":"NanoLLM","text":"<p>The <code>NanoLLM</code> library uses the optimized MLC/TVM library for inference, like on the Benchmarks page:</p> <p></p> &gt; NanoLLM Reference Documentation<pre><code>from nano_llm import NanoLLM, ChatHistory, ChatTemplates\n\n# load model\nmodel = NanoLLM.from_pretrained(\n    model='meta-llama/Meta-Llama-3-8B-Instruct', \n    quantization='q4f16_ft', \n    api='mlc'\n)\n\n# create the chat history\nchat_history = ChatHistory(model, system_prompt=\"You are a helpful and friendly AI assistant.\")\n\nwhile True:\n    # enter the user query from terminal\n    print('&gt;&gt; ', end='', flush=True)\n    prompt = input().strip()\n\n    # add user prompt and generate chat tokens/embeddings\n    chat_history.append(role='user', msg=prompt)\n    embedding, position = chat_history.embed_chat()\n\n    # generate bot reply\n    reply = model.generate(\n        embedding, \n        streaming=True, \n        kv_cache=chat_history.kv_cache,\n        stop_tokens=chat_history.template.stop,\n        max_new_tokens=256,\n    )\n\n    # append the output stream to the chat history\n    bot_reply = chat_history.append(role='bot', text='')\n\n    for token in reply:\n        bot_reply.text += token\n        print(token, end='', flush=True)\n\n    print('\\n')\n\n    # save the inter-request KV cache \n    chat_history.kv_cache = reply.kv_cache\n</code></pre> <p>This example keeps an interactive chat running with text being entered from the terminal.  You can start it like this:</p> <pre><code>jetson-containers run \\\n  --env HUGGINGFACE_TOKEN=hf_abc123def \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.chat.example\n</code></pre> <p>Or for easy editing from the host device, copy the source into your own script and mount it into the container with the <code>--volume</code> flag.  And for authenticated models, request access through HuggingFace (like with Llama) and substitute your account's API token above.</p>"},{"location":"tutorial_audiocraft.html","title":"Tutorial - AudioCraft","text":"<p>Let's run Meta's AudioCraft, to produce high-quality audio and music on Jetson!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>10.7 GB</code> for <code>audiocraft</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_audiocraft.html#how-to-start","title":"How to start","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag audiocraft)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p>"},{"location":"tutorial_audiocraft.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"<p>AudioCraft repo comes with demo Jupyter notebooks.</p> <p>On Jupyter Lab navigation pane on the left, double-click <code>demos</code> folder. </p> <p></p>"},{"location":"tutorial_audiocraft.html#audiogen-demo","title":"AudioGen demo","text":"<p>Run cells with <code>Shift + Enter</code>, first one will download models, which can take some time.</p> <p>Info</p> <p>You may encounter an error message like the following when executing the first cell, but you can keep going. <pre><code>A matching Triton is not available, some optimizations will not be enabled.\nError caught was: No module named 'triton'\n</code></pre></p> <p>In the Audio Continuation cells, you can generate continuation based on text, while in Text-conditional Generation you can generate audio based just on text descriptions.</p> <p>You can also use your own audio as prompt, and use text descriptions to generate continuation: <pre><code>prompt_waveform, prompt_sr = torchaudio.load(\"../assets/sirens_and_a_humming_engine_approach_and_pass.mp3\") # you can upload your own audio\nprompt_duration = 2\nprompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]\noutput = model.generate_continuation(prompt_waveform.expand(3, -1, -1), prompt_sample_rate=prompt_sr,descriptions=[\n        'Subway train blowing its horn',   # text descriptions for continuation\n        'Horse neighing furiously',\n        'Cat hissing'\n], progress=True)\ndisplay_audio(output, sample_rate=16000)\n</code></pre></p>"},{"location":"tutorial_audiocraft.html#musicgen-and-magnet-demos","title":"MusicGen and MAGNeT demos","text":"<p>The two other jupyter notebooks are similar to AuidioGen, where you can generate continuation or generate audio, while using models trained to generate music.</p>"},{"location":"tutorial_comfyui_flux.html","title":"ComfyUI and Flux on Jetson Orin","text":"<p>Hey there, fellow developer! \ud83d\udc4b I'm excited to share with you our latest project: Flux, an open-source model for image generation. Here at NVIDIA, we're pushing the boundaries to make Flux work seamlessly across all platforms, including our Jetson Orin devices. While we're still fine-tuning the model for the Jetson Orin Nano, we've already got it running smoothly on the Jetson AGX Orin.</p> <p>In this tutorial, I'm going to walk you through every step needed to get Flux up and running on your Jetson Orin, even if you've just flashed your system. Follow along, and you should have no trouble getting everything set up. And hey, if something doesn't work out, reach out to me\u2014I\u2019ll keep this guide updated to make sure it's always on point.</p> <p></p> <p>So, let's dive in and get Flux running on your Jetson!</p>"},{"location":"tutorial_comfyui_flux.html#1-install-miniconda-and-create-a-python-310-environment","title":"1. Install Miniconda and Create a Python 3.10 Environment","text":"<p>First things first, you'll need to install Miniconda on your Jetson Orin and create a Python 3.10 environment called <code>comfyui</code>. This will ensure all dependencies are handled properly within an isolated environment.</p> <pre><code>wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh\nchmod +x Miniconda3-latest-Linux-aarch64.sh\n./Miniconda3-latest-Linux-aarch64.sh\n\nconda update conda\n\nconda create -n comfyui python=3.10\nconda activate comfyui\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#2-install-cuda-cudnn-and-tensorrt","title":"2. Install CUDA, cuDNN, and TensorRT","text":"<p>Once your environment is set up, install CUDA 12.4 along with the necessary cuDNN and TensorRT libraries to ensure compatibility and optimal performance on your Jetson Orin.</p> <pre><code>wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb\nsudo dpkg -i cuda-keyring_1.1-1_all.deb\nsudo apt-get update\nsudo apt-get -y install cuda-toolkit-12-4 cuda-compat-12-4\nsudo apt-get install cudnn python3-libnvinfer python3-libnvinfer-dev tensorrt\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#3-verify-and-configure-cuda","title":"3. Verify and Configure CUDA","text":"<p>After installing CUDA, you'll want to verify that the correct version (12.4) is being used and make this change permanent in your environment.</p> <pre><code>ls -l /usr/local | grep cuda\nsudo ln -s /usr/local/cuda-12.4 /usr/local/cuda\n\nexport PATH=/usr/local/cuda/bin:$PATH\nnvcc --version\n\necho 'export PATH=/usr/local/cuda/bin:$PATH' &gt;&gt; ~/.bashrc\necho 'export CUDA_HOME=/usr/local/cuda' &gt;&gt; ~/.bashrc\necho 'export CUDA_PATH=/usr/local/cuda' &gt;&gt; ~/.bashrc\nsource ~/.bashrc\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#4-compile-and-install-bitsandbytes-with-cuda-support","title":"4. Compile and Install <code>bitsandbytes</code> with CUDA Support","text":"<p>Now it\u2019s time to compile and install <code>bitsandbytes</code> with CUDA support. This involves cloning the repository, configuring the build with CMake, compiling using all available cores, and installing the resulting package.</p> <pre><code>export BNB_CUDA_VERSION=124\nexport LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH\n\ngit clone https://github.com/timdettmers/bitsandbytes.git\ncd bitsandbytes\n\nmkdir -p build\ncd build\ncmake .. -DCOMPUTE_BACKEND=cuda -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.4\nmake -j$(nproc)\n\ncd ..\npython setup.py install\n</code></pre> <p>Verify the installation by importing the package in Python:</p> <pre><code>python\n&gt;&gt;&gt; import bitsandbytes as bnb\n&gt;&gt;&gt; print(bnb.__version__)\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#5-install-pytorch-torchvision-and-torchaudio","title":"5. Install PyTorch, TorchVision, and TorchAudio","text":"<p>Next up, install the essential libraries <code>PyTorch</code>, <code>torchvision</code>, and <code>torchaudio</code> for Jetson Orin. You can always check for the latest links here.</p> <pre><code>pip install http://jetson.webredirect.org/jp6/cu124/+f/5fe/ee5f5d1a75229/torch-2.3.0-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/988/cb71323efff87/torchvision-0.18.0a0+6043bc2-cp310-cp310-linux_aarch64.whl\npip install http://jetson.webredirect.org/jp6/cu124/+f/0aa/a066463c02b4a/torchaudio-2.3.0+952ea74-cp310-cp310-linux_aarch64.whl\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#6-clone-the-comfyui-repository","title":"6. Clone the ComfyUI Repository","text":"<p>Clone the ComfyUI repository from GitHub to get the necessary source code.</p> <pre><code>git clone https://github.com/comfyanonymous/ComfyUI.git\ncd ComfyUI\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#7-update-dependencies","title":"7. Update Dependencies","text":"<p>Make sure all the necessary dependencies are installed by running the <code>requirements.txt</code> file.</p> <pre><code>pip install -r requirements.txt\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#8-resolve-issues-with-numpy","title":"8. Resolve Issues with NumPy","text":"<p>If you encounter issues with NumPy, downgrade to a version below 2.0 to avoid compatibility problems.</p> <pre><code>pip install \"numpy&lt;2\"\n</code></pre>"},{"location":"tutorial_comfyui_flux.html#9-run-comfyui","title":"9. Run ComfyUI","text":"<p>Finally, run ComfyUI to ensure everything is set up correctly.</p> <pre><code>python main.py\n</code></pre> <p>Great! Now that you\u2019ve got ComfyUI up and running, let's load the workflow to start using the Flux model. </p> <ul> <li>Download the workflow file using this link. And load it from the ComfyUI interface.</li> <li>You\u2019ll need to download the Flux Schnell model <code>flux1-schnell.safetensors</code> and vae <code>ae.safetensors</code> from Hugging Face and place the model in the <code>models/unet</code> folder and vae in <code>models/vae</code> within ComfyUI.</li> <li>Download <code>clip_l.safetensors</code> and <code>t5xxl_fp8_e4m3fn.safetensors</code> from Stability's Hugging Face and place them inside <code>models/clip</code> folder.</li> </ul> <p>Alright, you're all set to launch your first run! Head over to the URL provided by ComfyUI (127.0.0.1:8188) on your Jetson AGX Orin, and hit that Queue Prompt button. The first time might take a little longer as the model loads, but after that, each generation should take around 21 seconds. Plus, you can queue up multiple prompts and let it generate images for hours!!</p> <p>Happy generating! \ud83c\udf89</p> <p>ASIER \ud83d\ude80</p> <p>Some examples: </p> <p> </p>"},{"location":"tutorial_distillation.html","title":"CLIP model distillation","text":"<p>See \"Jetson Introduction to Knowledge Distillation\" repo's README.md.</p> <p>https://github.com/NVIDIA-AI-IOT/jetson-intro-to-distillation</p>"},{"location":"tutorial_gapi_microservices.html","title":"Tutorial - Gapi Micro Services","text":"<p>A Micro Service is a process that runs a wrapper python script that integrates your custom code/models so they can integrate into Gapi Workflows.</p> <p></p> <p>You can run a Micro Service wherever you like and connect it to a Gapi Server via the streaming, hybrid binary+json message protocol.</p> <p></p> <p>There are some out-of-the-box \u201cCommunity Micro Services\" that we integrate, test and pack into Docker images. When you run them, they auto integrate, load NVIDIA layers correctly and offer logging to the host system.</p> <p></p> <p>Even better, create your own! Just implement an on_message Python handler to process requests and respond. The rest is handled for you.</p> <p></p> <p>Think of a Micro Service as a simple wrapper to some code or model you have. It works the same as any other Node inside a Workflow. When it's your Micro Service Node's turn your on_message function will be called. Your script gets the rolling Transaction data for context and you then publish your data directly back into the flow.</p> <p></p>"},{"location":"tutorial_gapi_microservices.html#running-the-community-micro-services","title":"Running the Community Micro Services","text":"<p>Requirements for Community Micro Services</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space.</p> <ul> <li>Literally 4KB for your own Micro Service</li> <li>Anywhere from ~4GB to ~10GB for each Community Micro Service</li> </ul> </li> </ol> <pre><code>#1 Login and go to the Micro Services tab\n#2 Follow the instructions in the blue box on that page to download your custom configuration\n#3 Then follow the instructions below that for installing the Micro Service you want\n</code></pre> <p>Example of instruction page: </p> <p></p>"},{"location":"tutorial_gapi_microservices.html#congrats-you-can-go-through-the-workflow-tips-now","title":"Congrats! You Can Go Through the Workflow Tips Now","text":""},{"location":"tutorial_gapi_microservices.html#creating-your-own-micro-service","title":"Creating Your Own Micro Service","text":"<p>The entire Micro Service zip file is just 4KB with 4 files:</p> <ul> <li>message_handler.py: for you to respond</li> <li>message.py: for the streaming binary/json protocol</li> <li>gapi-ms: as entry point and handler)</li> <li>requirements.txt: defines just asyncio + websockets</li> </ul> <p>Full documentation here: Gapi Micro Service Docs. Synopsis below...</p> <pre><code>#1 Create logical Micro Service in UI and copy the key\n#2 Download the zip file from the UI\n#3 python gapi-ms.py ws://0.0.0.0:8090/gapi-ws [MICROSERVICE_KEY]\n#4 Refresh the UI to confirm it's online\n#5 Edit the message_handler.py to handle binary+json input and change the output\n#6 Add a Micro Service Node to a Workflow and tie it to your Micro Service. Hit Test.\n</code></pre>"},{"location":"tutorial_gapi_workflows.html","title":"Tutorial - Gapi","text":"<p>Gapi is an embeddable API gateway that creates streaming integrations between AI micro services and the systems that users leverage everyday.</p> <p>The project's goal is to accelerate the speed of creating pilots and demos of Jetson AI Lab achievements into real world environments</p> <p>\"On Device\" generative AI doesn't mean it has to live on an island!</p> <p></p> <ul> <li>Workflow engine with low code UI with dozens of open integrations and customizable clients for mobile web and desktop.</li> <li>Micro service framework for wrapping Jetson containers (Ollama, Whisper, Piper TTS, etc. are done, with more coming). Or wrap your own models/code and integrate it into Workflows.</li> <li>Real-time, hybrid, binary+json messaging smoothens intra-service calls and reduced latency. </li> <li>A fast path to proving generative AI value to stakeholders in their actual environment.</li> </ul>"},{"location":"tutorial_gapi_workflows.html#gapi-server","title":"Gapi Server","text":"<p>Embeddable API gateway software that runs in the background with a low code workflow UI for testing. The server is a message hub and state machine for workflow 'nodes' that talk to Micro Services. Think of it as connective-tissue for applications.</p> <p></p> <p>A Micro Service is a process that runs some wrapper python scripts that integrates custom code/models into Workflows using a streaming API.</p> <ul> <li> <p>Gapi Server can run on any Jetson Orin or really any computer as the Micro Services connect outbound over secure web sockets. It doesn't use any GPU resources. There is a also a little demo version to skip the Server install (but you'll still need to run your own Micro Services).</p> </li> <li> <p>Gapi Project Page</p> </li> <li>Gapi Github</li> <li>Gapi Docs</li> <li>Gapi Hosted Demo</li> </ul> <p>What you need to run Gapi Server on Jetson</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin NX (8GB) Jetson Orin Nano (8GB) Jetson Orin Nano (4GB)</p> </li> <li> <p>Docker</p> </li> <li> <p>Sufficient storage space.</p> <ul> <li>Size: ~1.3GB</li> </ul> </li> </ol> <p>Gapi Server will run on other environments. Email us at support@GenAINerds.com if that's something you think is worthwhile.</p> <p>Explaining the Steps:</p> <ul> <li>1) On the Docker host, create working dir for persistant data</li> <li>2) Download configuration files</li> <li>3) Unzip</li> <li>4) Pull Docker image, create container and start the process (will return console to you)</li> </ul> <p>Copy and Run the Steps:</p> <pre><code>mkdir ~/gapiData &amp;&amp; cd ~/gapiData\ncurl -L https://raw.githubusercontent.com/genai-nerds/Gapi/main/gapiConfigs.zip -o gapiConfigs.zip\nunzip -q gapiConfigs.zip\ndocker run -d --name gapi --network host -v ~/gapiData:/opt/gapi/vdata genainerds/gapi:arm64 /bin/bash -c \"cd /opt/gapi/bin &amp;&amp; ./startGapi.sh\"\necho \"You may need to hit Enter now. Afterwards the Docker container 'gapi' should be running\"\n</code></pre> <p>Troubleshooting:</p> <ul> <li>Keep in mind all data read or written is in ~/gapiData</li> <li>Look at ~/gapiData/gapi.log to see what happened (if say the docker run command doesn't work)</li> <li>gapiServerConfig.json has all the initial setup</li> </ul> <p>NOTE: You will need to run some Micro Services before doing anything meaningful, so please review the mini tour below but don't do any of it in the UI untill you complete the setup (instructions at the bottom)</p>"},{"location":"tutorial_gapi_workflows.html#ui","title":"UI","text":"<ul> <li>Browse in: http://[host-device-ip]:8090</li> <li>User: root</li> <li> <p>Pass: !gapi2024</p> </li> <li> <p>Change password in Settings! Docs shows how to add SSL cert.</p> </li> </ul>"},{"location":"tutorial_gapi_workflows.html#tips-use-case-templates","title":"Tips &amp; Use Case Templates","text":"<p> When you login there will be an array of Tip Workflows that have notes and explain core concepts.</p> <p>Tips:</p> <ul> <li>Hello World: Basics plus it literally says hello</li> <li>Run a Local LLM: Play with Ollama graphically and connect it to other systems</li> <li>Streaming Speech to Text: PiperTTS</li> <li>Querying a Vector Database: Query a simple set of vectorized text documents</li> <li>Variables, Flow and Logic: Understand how to setup more robust workflows</li> <li>Calling Workflows from Outside Gapi: Configure Webhooks</li> <li>Workflows Calling Your Code: Micro Service Nodes that invoke your code</li> <li>Communications: 3rd party communications like Slack (IM), Twilio (SMS), SendGrid (EMAIL)</li> </ul>"},{"location":"tutorial_gapi_workflows.html#workflows","title":"Workflows","text":"<p>Workflows visually connect the execution and flow of data between Nodes. </p> <p>A Transaction (or single firing) has \"Rolling Input\" data it accumulates as Node to Node steps each talk to Micro Services and APIs. All Nodes enjoy variables and flow control using familiar json and javascript concepts.</p> <p></p> <p>Each Node can append or reference the rolling data in the Transaction while making decisions along the way.</p> <p></p> <p>Watch live Transactions as they start from clients, webhooks and published messages from Micro Services with visual feedback and debugging.</p>"},{"location":"tutorial_gapi_workflows.html#apis-to-business-systems","title":"APIs to Business Systems","text":"<p>Gapi can help smoothly integrate generative AI into systems that people already use everyday via APIs. It has the streaming API to Micro Services plus the state management and chops to handle the outward (webhook) style APIs to existing systems.</p> <p>Our hope is to rally contributions by the community to keep growing the out-of-the-box/tested Nodes but there is a DIY one as well to manually map what you need into your Workflows.</p> <p>Some of the out-of-the-box API Nodes: Slack (IM), Twilio (SMS), SendGrid (Email), Service Now (Ticketing), DIY Webhook</p>"},{"location":"tutorial_gapi_workflows.html#micro-services","title":"Micro Services","text":"<p>There are community published Micro Services as well as custom ones you can make yourself. Gapi Server becomes most useful when leveraging them so please follow the How To below.</p> <p>Current Community Micro Services:</p> <ul> <li>Whisper</li> <li>Ollama</li> <li>Vector</li> <li>Text to Speech</li> <li>Img to Text</li> </ul> <p>Complete the Setup: How To Run and/or Create Micro Services</p>"},{"location":"tutorial_gapi_workflows.html#support-contribute","title":"Support / Contribute","text":"<p>Gapi is a project from the GenAI Nerds and hosted on Github.</p> <ul> <li>Ask a question, support@GenAINerds.com or</li> <li>Say hello, hello@GenAINerds.com</li> <li>Contribute/create tickets on Github</li> </ul>"},{"location":"tutorial_holoscan.html","title":"Tutorial - Holoscan SDK","text":"<p>The Holoscan SDK is part of NVIDIA Holoscan, the AI sensor processing platform that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications, from embedded to edge to cloud.</p>"},{"location":"tutorial_holoscan.html#holoscan-sdk-core-concepts","title":"Holoscan SDK - Core Concepts","text":"<p>A Holoscan SDK <code>Application</code> is composed of <code>Fragments</code>, each of which runs a graph of <code>Operators</code>. The implementation of that graph is sometimes referred to as a pipeline or workflow, which can be visualized below: </p>"},{"location":"tutorial_holoscan.html#holoscan-sdk-getting-started-on-jetson","title":"Holoscan SDK - Getting Started on Jetson","text":"<p>The best place to get started using the Holoscan SDK is the HoloHub repo. This is a central repository for the NVIDIA Holoscan AI sensor processing community to share apps and extensions.</p> <p>So, let's walk through how to run the Surgical Tool Tracking example application from HoloHub!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>13.7 GB</code> for <code>efficientvit</code> container image</li> <li><code>850 Mb</code> for Tool Tracking ONNX model + example video</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_holoscan.html#launching-a-holoscan-compatible-container","title":"Launching a Holoscan-compatible Container","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build the Holoscan container.</p> <p>Use the <code>-v</code> option to mount HoloHub's <code>./build</code> and <code>./data</code> directories to the local Holoscan package directory so builds and data are cached across runs</p> <pre><code>jetson-containers run \\\n    -v ./packages/holoscan/holohub/data:/opt/nvidia/holohub/data \\\n    -v ./packages/holoscan/holohub/build:/opt/nvidia/holohub/build \\\n    $(autotag holoscan)\n</code></pre>"},{"location":"tutorial_holoscan.html#running-holohubs-endoscopy-tool-tracking-app","title":"Running HoloHub's Endoscopy Tool Tracking App","text":"<p>An example application from HoloHub is the Endoscopy Tool Tracking application. This sample application demonstrates how the Holoscan SDK can be used to build an efficient pipeline that streams a video feed, preprocesses the data, runs inference using TensorRT, post-processes the data, and renders the video feed with the inference overlays.</p> <p></p>"},{"location":"tutorial_holoscan.html#building-the-app","title":"Building The App","text":"<p>The Holoscan SDK uses CMake to build C++ applications and also leverages CMake to pull and build app dependencies. So, regardless of whether an application is implemented using C++ or Python, many apps will still require that you \"build\" them first.</p> <p>The Endoscopy Tool Tracking App has both a Python and C++ implementation. Building this app creates the C++ application program, pulls in an example video, and builds the TensorRT engine used for inference.</p> <p>Go to the HoloHub directory <pre><code>cd /opt/nvidia/holohub\n</code></pre> Build the app using the 'run' script <pre><code>./run build endoscopy_tool_tracking\n</code></pre></p>"},{"location":"tutorial_holoscan.html#running-the-python-app","title":"Running The Python App","text":"<p>First, add the Holoscan SDK and the HoloHub build directory to your PYTHONPATH environment variable. <pre><code>export HOLOHUB_BUILD_PATH=/opt/nvidia/holohub/build/endoscopy_tool_tracking\nexport PYTHONPATH=$PYTHONPATH:$HOLOHUB_BUILD_PATH/python/lib:/opt/nvidia/holoscan/python/lib\n</code></pre> Next, run the application using Python! <pre><code>python3 /opt/nvidia/holohub/applications/endoscopy_tool_tracking/python/endoscopy_tool_tracking.python --data /opt/nvidia/holohub/data/endoscopy/\n</code></pre></p> <p>Shortly after launching the application, you will see the HoloViz window that visualizes the sample video and the model's outputs:</p> <p></p>"},{"location":"tutorial_holoscan.html#running-the-c-app","title":"Running The C++ App","text":"<p>The C++ app can be run using the run script by specifying the app name: <pre><code>./run launch endoscopy_tool_tracking\n</code></pre></p>"},{"location":"tutorial_holoscan.html#next-steps","title":"Next Steps","text":"<p>Congratulations! You've successfully run a Holoscan SDK application!</p> <p>To dive deeper into the Holoscan SDK, explore the SDK's documentation on Core Concepts, Holoscan by Example, and Creating an Application.</p>"},{"location":"tutorial_jetson-copilot.html","title":"Tutorial - Jetson Copilot","text":"<p>Jetson Copilot is a reference application for a local AI assistant, which demonstrates;</p> <ul> <li>Running open-source LLMs (large language models) on device</li> <li>RAG (retrieval-augmented generation) to let LLM have access to your locally indexed knowledge</li> </ul> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6 GB</code> for <code>jetrag</code> container image</li> <li>About <code>4 GB</code> for downloading some default models (<code>llama3</code> and <code>mxbai-embed-large</code>)</li> </ul> </li> </ol> <p>Info</p> <p>To run Jetson Copilot, you do not need to have <code>jetson-containers</code> on your system. It uses the <code>jetrag</code> container image that is being managed and built on <code>jetson-containers</code>.</p>"},{"location":"tutorial_jetson-copilot.html#getting-started","title":"\ud83c\udfc3 Getting started","text":""},{"location":"tutorial_jetson-copilot.html#first-time-setup","title":"First time setup","text":"<p>If this is your first time to run Jetson Copilot on your Jetson, first run <code>setup.sh</code> to ensure you have all the necessary software installed and the environment set up. </p> <pre><code>git clone https://github.com/NVIDIA-AI-IOT/jetson-copilot/\ncd jetson-copilot\n./setup_environment.sh\n</code></pre> <p>It will install the following, if not yet.</p> <ul> <li>Chromium web browser</li> <li>Docker</li> </ul>"},{"location":"tutorial_jetson-copilot.html#how-to-start-jetson-copilot","title":"How to start Jetson Copilot","text":"<pre><code>cd jetson-copilot\n./launch_jetson_copilot.sh\n</code></pre> <p>This will start a Docker container and start a Ollama server and Streamlit app inside the container. It will shows the URL on the console in order to access the web app hosted on your Jetson.</p> <p>With your web browser on Jetson, open the Local URL (<code>http://localhost:8501</code>).Or on a PC connected on the same network as on your Jetson, access the Network URL.</p> <pre><code>Local URL: http://localhost:8501\nNetwork URL: http://10.110.50.252:8501 \n</code></pre> <p>Info</p> <p>You will need the Internet connection on Jetson when it launches for the first time, as it will pull the container image (and download the default LLM and embedding model when web UI starts for the first time).</p> <p>When you access the web UI for the first time, it will dowload the default LLM (<code>llama3</code>) and the embedding model (<code>mxbai-embed-large</code>).</p> <p>Tips</p> <p>If you are on Ubuntu Desktop, a frameless Chromium window will pop up to access the web app, to make it look like an independent application. You need to close the window as stopping the container on the console won't shutdown Chromium.</p> <p></p>"},{"location":"tutorial_jetson-copilot.html#how-to-use-jetson-copilot","title":"\ud83d\udcd6 How to use Jetson Copilot","text":""},{"location":"tutorial_jetson-copilot.html#0-interact-with-the-plain-llama3-8b","title":"0. Interact with the plain Llama3 (8b)","text":"<p>You can use Jetson Copilot just to interact with a LLM withut enabling RAG feature.</p> <p>By default, Llama3 (8b) model is downloaded when running for the first time and use as the default LLM.</p> <p>You will be surprized how much a model like Llama3 is capable, but may soon find limitations as it does not have information prior to its cutoff date nor know anything about your specific subject matter.</p>"},{"location":"tutorial_jetson-copilot.html#1-ask-jetson-related-question-using-pre-built-index","title":"1. Ask Jetson related question using pre-built index","text":"<p>On the side panel, you can toggle \"Use RAG\" on to enable RAG pipeline. The LLM will have an access to a custom knowledge/index that is selected under \"Index\".</p> <p>As a sample, a pre-build index \"<code>_L4T_README</code>\" is provided. This is built on all the README text files that supplied in the \"L4T-README\" folder on the Jetson desktop.</p> <p>It is mounted as <code>/media/&lt;USER_NAME&gt;/L4T-README/</code> once you execute <code>udisksctl mount -b /dev/disk/by-label/L4T-README</code>.</p> <p>You can ask questions like:</p> <pre><code>What IP address does Jetson gets assigned when connected to a PC via a USB cable in USB Device Mode?\n</code></pre>"},{"location":"tutorial_jetson-copilot.html#2-build-your-own-index-based-on-your-documents","title":"2. Build your own index based on your documents","text":"<p>You can build your own index based on your local and/or online documents.</p> <p>First, on the console (or on the desktop) create a directory under <code>Documents</code> directory to store your documents.</p> <pre><code>cd jetson-copilot\nmkdir Documents/Jetson-Orin-Nano\ncd Documents/Jetson-Orin-Nano\nwget https://developer.nvidia.com/downloads/assets/embedded/secure/jetson/orin_nano/docs/jetson_orin_nano_devkit_carrier_board_specification_sp.pdf\n</code></pre> <p>Now back on the web UI, open the side bar, toggle on \"Use RAG\", then click on \"\u2795Build a new index\" to jump to a \"Build Index\" page.</p> <p>Give a name for the Index you are to build. (e.g. \"JON Carrier Board\") Type in the field and hit <code>Enter</code> key, then it will check and show what path will be created for your index.</p> <p></p> <p>And then from the drop select box under \"Local documents\", select the directory you created and saved your documents in. (e.g. <code>/opt/jetson_copilot/Documents/Jetson-Orin-Nano</code>).</p> <p>It will show the summary of files found in the selected directory.</p> <p></p> <p>If you want to rather only or additionally supply URLs for the online docuemnts to be ingested, fill the text area with one URL per a line. You can skip this if you are building your index only based on your local documents.</p> <p>Info</p> <p>On the sidebar, make sure <code>mxbai-embed-large</code> is selected for the embedding model.</p> <p>Use of OpenAI embedding models is not well supported and needs more testing.</p> <p>Finally, hit \"Build Index\" button. It will show the progress in the drop-down \"status container\", so you can check the status by clicking on it. Once done, it will show the summary of your index and time it took.</p> <p>You can go back to the home screen to now select the index you just built.</p>"},{"location":"tutorial_jetson-copilot.html#3-test-different-llm-or-embedding-model","title":"3. Test different LLM or Embedding model","text":"<p>TODO</p>"},{"location":"tutorial_jetson-copilot.html#development","title":"\ud83c\udfd7\ufe0f Development","text":"<p>Streamlit based web app is very easy to develop.</p> <p>On web UI, at the top-right of the screen to choose \"Always rerun\" to automatically update your app every time you change the source codes.</p> <p>See Streamlit Documentation for the detail.</p>"},{"location":"tutorial_jetson-copilot.html#manually-run-streamlit-app-inside-the-container","title":"Manually run streamlit app inside the container","text":"<p>In case you make more fundamental changes, you can also manually run streamlit app.</p> <pre><code>cd jetson-copilot\n./launch_dev.sh\n</code></pre> <p>Once in container;</p> <pre><code>streamlit run app.py\n</code></pre>"},{"location":"tutorial_jetson-copilot.html#directory-structure","title":"\ud83e\uddf1 Directory structure","text":"<pre><code>\u2514\u2500\u2500 jetson-copilot\n    \u251c\u2500\u2500 launch_jetson_copilot.sh\n    \u251c\u2500\u2500 setup_environment.sh\n    \u251c\u2500\u2500 Documents \n    \u2502   \u2514\u2500\u2500 your_abc_docs\n    \u251c\u2500\u2500 Indexes\n    \u2502   \u251c\u2500\u2500 _L4T_README\n    \u2502   \u2514\u2500\u2500 your_abc_index\n    \u251c\u2500\u2500 logs\n    \u2502   \u251c\u2500\u2500 container.log\n    \u2502   \u2514\u2500\u2500 ollama.log\n    \u251c\u2500\u2500 ollama_models\n    \u2514\u2500\u2500 Streamlit_app\n        \u251c\u2500\u2500 app.py\n        \u251c\u2500\u2500 build_index.py\n        \u2514\u2500\u2500 download_model.py\n</code></pre> <p>Following directories inside the <code>jetson-copilot</code> directory are mounted in the Docker container.</p> Directory Name Description <code>Docuemtns</code> Directory to store your documents to be indexed <code>Indexes</code> Directory to store pre-built (or built-by-you) indexes for LLM to perform RAG on <code>logs</code> Directory for the app to store log files <code>ollama_models</code> Directory for the ollama server to store download models <code>stremlit_app</code> Directory for Python scripts to make up the web app"},{"location":"tutorial_jetson-copilot.html#troubleshooting","title":"\ud83d\udcab Troubleshooting","text":"<p>If you find any issue, please check GitHub Issues of the Jetson Copilot repo.</p>"},{"location":"tutorial_jps.html","title":"Tutorial - Jetson Platform Services","text":"<p>Jetson Plaform Services (JPS) provide a platform to simplify development, deployment and management of Edge AI applications on NVIDIA Jetson. JPS is a modular &amp; extensible architecture for developers to distill large complex applications into smaller modular microservice with APIs to integrate into other apps &amp; services. At its core are a collection of AI services leveraging generative AI, deep learning, and analytics, which provide state of the art capabilities including video analytics, video understanding and summarization, text based prompting, zero shot detection and spatio temporal analysis of object movement. </p> <p> VLM Alert Workflow built with JPS</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> </li> </ol> <p>To get started with Jetson Platform Services, follow the quickstart guide to install and setup JPS. Then explore the reference workflows to learn how to use DeepStream, Analytics, Generative AI and more with JPS:</p> <p>1) Quick Start Guide 2) AI Powered Network Video Recorder 3) Zero Shot Detection with NanoOWL 4) Visual Language Model Alerts </p> <p>The reference workflows demonstrate how to use the microservices provided in JPS to build full end to end systems on your Jetson. </p> <p> VLM Alert Workflow Architecture</p> <p>View the links below to learn more about Jetson Platform Services: VLM Alert Blog JPS Product Page JPS Documentation  VLM Alert Demo Video</p>"},{"location":"tutorial_live-llava.html","title":"Tutorial - Live LLaVA","text":"<p>Recommended</p> <p>Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for in interactive pipeline editor built from live VLMs.</p> <p>This multimodal agent runs a vision-language model on a live camera feed or video stream, repeatedly applying the same prompts to it:</p> <p></p> <p>It uses models like LLaVA or VILA and has been quantized with 4-bit precision.  This runs an optimized multimodal pipeline from the <code>NanoLLM</code> library, including running the CLIP/SigLIP vision encoder in TensorRT, event filters and alerts, and multimodal RAG (see the NanoVLM page for benchmarks)</p> <p></p>"},{"location":"tutorial_live-llava.html#running-the-live-llava-demo","title":"Running the Live Llava Demo","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Follow the chat-based LLaVA and NanoVLM tutorials first.</p> </li> <li> <p>Supported vision/language models:</p> <ul> <li><code>liuhaotian/llava-v1.5-7b</code>, <code>liuhaotian/llava-v1.5-13b</code>, <code>liuhaotian/llava-v1.6-vicuna-7b</code>, <code>liuhaotian/llava-v1.6-vicuna-13b</code></li> <li><code>Efficient-Large-Model/VILA-2.7b</code>,<code>Efficient-Large-Model/VILA-7b</code>, <code>Efficient-Large-Model/VILA-13b</code></li> <li><code>Efficient-Large-Model/VILA1.5-3b</code>,<code>Efficient-Large-Model/Llama-3-VILA1.5-8B</code>, <code>Efficient-Large-Model/VILA1.5-13b</code></li> <li><code>VILA-2.7b</code>, <code>VILA1.5-3b</code>, <code>VILA-7b</code>, <code>Llava-7b</code>, and <code>Obsidian-3B</code> can run on Orin Nano 8GB</li> </ul> </li> </ol> <p>The VideoQuery agent applies prompts to the incoming video feed with the VLM.  Navigate your browser to <code>https://&lt;IP_ADDRESS&gt;:8050</code> after launching it with your camera (Chrome is recommended with <code>chrome://flags#enable-webrtc-hide-local-ips-with-mdns</code> disabled)</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.video_query --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --video-input /dev/video0 \\\n    --video-output webrtc://@:8554/output\n</code></pre> <p>This uses <code>jetson_utils</code> for video I/O, and for options related to protocols and file formats, see Camera Streaming and Multimedia.  In the example above, it captures a V4L2 USB webcam connected to the Jetson (under the device <code>/dev/video0</code>) and outputs a WebRTC stream.</p>"},{"location":"tutorial_live-llava.html#processing-a-video-file-or-stream","title":"Processing a Video File or Stream","text":"<p>The example above was running on a live camera, but you can also read and write a video file or network stream by substituting the path or URL to the <code>--video-input</code> and <code>--video-output</code> command-line arguments like this:</p> <pre><code>jetson-containers run \\\n  -v /path/to/your/videos:/mount\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.agents.video_query --api=mlc \\\n      --model Efficient-Large-Model/VILA1.5-3b \\\n      --max-context-len 256 \\\n      --max-new-tokens 32 \\\n      --video-input /mount/my_video.mp4 \\\n      --video-output /mount/output.mp4 \\\n      --prompt \"What does the weather look like?\"\n</code></pre> <p>This example processes and pre-recorded video (in MP4, MKV, AVI, FLV formats with H.264/H.265 encoding), but it also can input/output live network streams like RTP, RTSP, and WebRTC using Jetson's hardware-accelerated video codecs.</p>"},{"location":"tutorial_live-llava.html#nanodb-integration","title":"NanoDB Integration","text":"<p>If you launch the VideoQuery agent with the <code>--nanodb</code> flag along with a path to your NanoDB database, it will perform reverse-image search on the incoming feed against the database by re-using the CLIP embeddings generated by the VLM.</p> <p>To enable this mode, first follow the NanoDB tutorial to download, index, and test the database.  Then launch VideoQuery like this:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.video_query --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --video-input /dev/video0 \\\n    --video-output webrtc://@:8554/output \\\n    --nanodb /data/nanodb/coco/2017\n</code></pre> <p>You can also tag incoming images and add them to the database using the web UI, for one-shot recognition tasks:</p>"},{"location":"tutorial_live-llava.html#video-vila","title":"Video VILA","text":"<p>The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action &amp; behavior analysis, change detection, and other temporal-based vision functions.  The <code>vision/video.py</code> example keeps a rolling history of frames:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.video \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-images 8 \\\n    --max-new-tokens 48 \\\n    --video-input /data/my_video.mp4 \\\n    --video-output /data/my_output.mp4 \\\n    --prompt 'What changes occurred in the video?'\n</code></pre> <p></p>"},{"location":"tutorial_live-llava.html#python-code","title":"Python Code","text":"<p>For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs. </p> <p>You can use this to implement customized prompting techniques and integrate with other vision pipelines.  This code applies the same set of prompts to the latest image from the video feed.  See here for the version that does multi-image queries on video sequences.</p>"},{"location":"tutorial_live-llava.html#walkthrough-videos","title":"Walkthrough Videos","text":""},{"location":"tutorial_llamaindex.html","title":"Tutorial - LlamaIndex","text":"<p>Let's use LlamaIndex, to realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin 64GB Developer Kit Jetson AGX Orin (32GB) Developer Kit Jetson Orin Nano 8GB Developer Kit</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>5.5 GB</code> for <code>llama-index</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_llamaindex.html#how-to-start-a-container-with-samples","title":"How to start a container with samples","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag llama-index:samples)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p> <p>You can follow along <code>LlamaIndex_Local-Models_L4T.ipynb</code> (which is based on the official LlamaIndex tutorial). </p> <p></p>"},{"location":"tutorial_llamaspeak.html","title":"Tutorial - llamaspeak","text":"<p>Talk live with Llama using streaming ASR/TTS, and chat about images with Llava!</p> <p></p> <ul> <li>The <code>NanoLLM</code> library provides optimized inference for LLM and speech models.</li> <li>It's recommended to run JetPack 6.0 to be able to run the latest containers.</li> </ul> <p>The <code>WebChat</code> agent has responsive conversational abilities and multimodal support for chatting about images with vision/language models, including overlapping ASR/LLM/TTS generation and verbal interruptability.</p>"},{"location":"tutorial_llamaspeak.html#running-llamaspeak","title":"Running llamaspeak","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Start the Riva server first and test the ASR examples.</p> </li> </ol> <pre><code>jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \\\n  $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.web_chat --api=mlc \\\n    --model meta-llama/Meta-Llama-3-8B-Instruct \\\n    --asr=riva --tts=piper\n</code></pre> <p>This will start llamaspeak with text LLM and ASR/TTS enabled.  You can then navigate your browser to <code>https://IP_ADDRESS:8050</code> <ul> <li>The default port is 8050, but can be changed with <code>--web-port</code> (and <code>--ws-port</code> for the websocket port)</li> <li>During bot replies, the TTS model will pause output if you speak a few words in the mic to interrupt it.</li> <li>Request access to the Llama models on HuggingFace and substitute your account's API token above.</li> </ul> </p> <p>The code and docs for the <code>WebAgent</code> that runs llamaspeak can be found in the NanoLLM library.  This block diagram shows the speech pipeline with interleaved model generation, user interruption, and streaming I/O:</p> <p></p>"},{"location":"tutorial_llamaspeak.html#multimodality","title":"Multimodality","text":"<p>If you load a multimodal vision/language model instead, you can drag images into the chat and ask questions about them:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.web_chat --api=mlc \\\n    --model Efficient-Large-Model/VILA-7b \\\n    --asr=riva --tts=piper\n</code></pre> <p>For more info about the supported vision/language models, see the NanoVLM page.</p>"},{"location":"tutorial_llamaspeak.html#function-calling","title":"Function Calling","text":"<p>There's the ability to define functions from Python that the bot has access to and can invoke based on the chat flow:</p> <p>This works by using the <code>bot_function()</code> decorator and adding the API description's to the system prompt:</p> <pre><code>from nano_llm import NanoLLM, ChatHistory, BotFunctions, bot_function\nfrom datetime import datetime\n\n@bot_function\ndef DATE():\n    \"\"\" Returns the current date. \"\"\"\n    return datetime.now().strftime(\"%A, %B %-m %Y\")\n\n@bot_function\ndef TIME():\n    \"\"\" Returns the current time. \"\"\"\n    return datetime.now().strftime(\"%-I:%M %p\")\n\nsystem_prompt = \"You are a helpful and friendly AI assistant.\" + BotFunctions.generate_docs()\n</code></pre> <p>The system prompt can be autogenerated from the Python docstrings embedded in the functions themselves, and can include parameters that the bot can supply (for example, selectively saving relevant user info to a vector database for RAG like is shown in the video).  </p> <p>For more information about this topic, see the Function Calling section of the NanoLLM documentation.  </p>"},{"location":"tutorial_llava.html","title":"Tutorial - LLaVA","text":"<p>LLaVA is a popular multimodal vision/language model that you can run locally on Jetson to answer questions about image prompts and queries.  Llava uses the CLIP vision encoder to transform images into the same embedding space as its LLM (which is the same as Llama architecture).  Below we cover different methods to run Llava on Jetson, with increasingly optimized performance:</p> <ol> <li>Chat with Llava using <code>text-generation-webui</code></li> <li>Run from the terminal with <code>llava.serve.cli</code></li> <li>Quantized GGUF models with <code>llama.cpp</code></li> <li>Optimized Multimodal Pipeline with <code>NanoVLM</code></li> </ol> Llava-13B (Jetson AGX Orin) Quantization Tokens/sec Memory <code>text-generation-webui</code> 4-bit (GPTQ) 2.3 9.7 GB <code>llava.serve.cli</code> FP16 (None) 4.2 27.7 GB <code>llama.cpp</code> 4-bit (Q4_K) 10.1 9.2 GB <code>NanoVLM</code> 4-bit (MLC) 21.1 8.7 GB <p>In addition to Llava, the <code>NanoVLM</code> pipeline supports VILA and mini vision models that run on Orin Nano as well.</p> <p></p>"},{"location":"tutorial_llava.html#1-chat-with-llava-using-text-generation-webui","title":"1. Chat with Llava using <code>text-generation-webui</code>","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.2GB</code> for <code>text-generation-webui</code> container image</li> <li>Space for models<ul> <li>CLIP model : <code>1.7GB</code></li> <li>Llava-v1.5-13B-GPTQ model : <code>7.25GB</code></li> </ul> </li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_llava.html#download-model","title":"Download Model","text":"<pre><code>jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n  python3 download-model.py --output=/data/models/text-generation-webui \\\n    TheBloke/llava-v1.5-13B-GPTQ\n</code></pre>"},{"location":"tutorial_llava.html#start-web-ui-with-multimodal-extension","title":"Start Web UI with Multimodal Extension","text":"<pre><code>jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \\\n  python3 server.py --listen \\\n    --model-dir /data/models/text-generation-webui \\\n    --model TheBloke_llava-v1.5-13B-GPTQ \\\n    --multimodal-pipeline llava-v1.5-13b \\\n    --loader autogptq \\\n    --disable_exllama \\\n    --verbose\n</code></pre> <p>Go to Chat tab, drag and drop an image into the Drop Image Here area, and your question in the text area and hit Generate:</p> <p></p>"},{"location":"tutorial_llava.html#result","title":"Result","text":""},{"location":"tutorial_llava.html#2-run-from-the-terminal-with-llavaservecli","title":"2. Run from the terminal with <code>llava.serve.cli</code>","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin 64GB Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.1GB</code> for <code>llava</code> container</li> <li><code>14GB</code> for Llava-7B (or <code>26GB</code> for Llava-13B)</li> </ul> </li> </ol> <p>This example uses the upstream Llava repo to run the original, unquantized Llava models from the command-line.  It uses more memory due to using FP16 precision, and is provided mostly as a reference for debugging.  See the Llava container readme for more info.</p>"},{"location":"tutorial_llava.html#llava-v15-7b","title":"llava-v1.5-7b","text":"<pre><code>jetson-containers run $(autotag llava) \\\n  python3 -m llava.serve.cli \\\n    --model-path liuhaotian/llava-v1.5-7b \\\n    --image-file /data/images/hoover.jpg\n</code></pre>"},{"location":"tutorial_llava.html#llava-v15-13b","title":"llava-v1.5-13b","text":"<pre><code>jetson-containers run $(autotag llava) \\\n  python3 -m llava.serve.cli \\\n    --model-path liuhaotian/llava-v1.5-13b \\\n    --image-file /data/images/hoover.jpg\n</code></pre> <p>Unquantized 13B may run only on Jetson AGX Orin 64GB due to memory requirements.</p>"},{"location":"tutorial_llava.html#3-quantized-gguf-models-with-llamacpp","title":"3. Quantized GGUF models with <code>llama.cpp</code>","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> </ol> <p>llama.cpp is one of the faster LLM API's, and can apply a variety of quantization methods to Llava to reduce its memory usage and runtime.  Despite its name, it uses CUDA.  There are pre-quantized versions of Llava-1.5 available in GGUF format for 4-bit and 5-bit:</p> <ul> <li>mys/ggml_llava-v1.5-7b</li> <li>mys/ggml_llava-v1.5-13b</li> </ul> <pre><code>jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n  /bin/bash -c './llava-cli \\\n    --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n    --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n    --n-gpu-layers 999 \\\n    --image /data/images/hoover.jpg \\\n    --prompt \"What does the sign say\"'\n</code></pre> Quantization Bits Response Tokens/sec Memory <code>Q4_K</code> 4 The sign says \"Hoover Dam, Exit 9.\" 10.17 9.2 GB <code>Q5_K</code> 5 The sign says \"Hoover Dam exit 9.\" 9.73 10.4 GB <p>A lower temperature like 0.1 is recommended for better quality (<code>--temp 0.1</code>), and if you omit <code>--prompt</code> it will describe the image:</p> <pre><code>jetson-containers run --workdir=/opt/llama.cpp/bin $(autotag llama_cpp:gguf) \\\n  /bin/bash -c './llava-cli \\\n    --model $(huggingface-downloader mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf) \\\n    --mmproj $(huggingface-downloader mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf) \\\n    --n-gpu-layers 999 \\\n    --image /data/images/lake.jpg'\n\nIn this image, a small wooden pier extends out into a calm lake, surrounded by tall trees and mountains. The pier seems to be the only access point to the lake. The serene scene includes a few boats scattered across the water, with one near the pier and the others further away. The overall atmosphere suggests a peaceful and tranquil setting, perfect for relaxation and enjoying nature.\n</code></pre> <p>You can put your own images in the mounted <code>jetson-containers/data</code> directory.  The C++ code for llava-cli can be found here.  The llama-cpp-python bindings also support Llava, however they are slower from Python (potentially handling of the tokens) </p>"},{"location":"tutorial_llava.html#4-optimized-multimodal-pipeline-with-nanovlm","title":"4. Optimized Multimodal Pipeline with <code>NanoVLM</code>","text":"<p>What's Next</p> <p>This section got too long and was moved to the NanoVLM page - check it out there for performance optimizations, mini VLMs, and live streaming!</p> <p></p> <p></p>"},{"location":"tutorial_minigpt4.html","title":"Tutorial - MiniGPT-4","text":"<p>Give your locally running LLM an access to vision, by running MiniGPT-4 on Jetson!</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>5.8GB</code> for container image</li> <li>Space for pre-quantized MiniGPT-4 model</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_minigpt4.html#start-minigpt4-container-with-models","title":"Start <code>minigpt4</code> container with models","text":"<p>To start the MiniGPT4 container and webserver with the recommended models, run this command:</p> <pre><code>jetson-containers run $(autotag minigpt4) /bin/bash -c 'cd /opt/minigpt4.cpp/minigpt4 &amp;&amp; python3 webui.py \\\n  $(huggingface-downloader --type=dataset maknee/minigpt4-13b-ggml/minigpt4-13B-f16.bin) \\\n  $(huggingface-downloader --type=dataset maknee/ggml-vicuna-v0-quantized/ggml-vicuna-13B-v0-q5_k.bin)'\n</code></pre> <p>Then, open your web browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code>.</p>"},{"location":"tutorial_minigpt4.html#results","title":"Results","text":""},{"location":"tutorial_mmj.html","title":"Tutorial mmj","text":"<p>17# First steps with Metropolis Microservices for Jetson</p> <p>NVIDIA Metropolis Microservices for Jetson simplifies the development of vision AI applications, offering a suite of customizable, cloud-native tools. Before diving into this tutorial, ensure you've filled out the Metropolis Microservices for Jetson Early Access form to gain the necessary access to launch the services. This step is crucial as it enables you to utilize all the features and capabilities discussed in this guide.</p> <p>Perfect for both newcomers and experts, this tutorial provides straightforward steps to kick-start your edge AI projects. Whether you're a student or an ecosystem partner working on a use case, this guide offers a straightforward start for every skill level.</p> <p></p>"},{"location":"tutorial_mmj.html#0-install-nvidia-jetson-services","title":"0. Install NVIDIA Jetson Services:","text":"<p>Ok, let's start by installing NVIDIA Jetson Services: <pre><code>sudo apt install nvidia-jetson-services\n</code></pre></p> <p>Let's add some performance hacks that will be needed to run the demo faster and without streaming artifacts:</p> <ul> <li> <p>If you don't have the Orin at max performance, you can use these two commands, a reboot is needed after: <pre><code>sudo nvpmodel -m 0 \nsudo jetson_clocks\n</code></pre></p> </li> <li> <p>After these two commands, a reboot is needed if your Jetson wasn't already in high-performance mode. These are optional, but they fine-tune your network buffers to ensure smoother streaming by optimizing how much data can be sent and received: <pre><code>sudo sysctl -w net.core.rmem_default=2129920\nsudo sysctl -w net.core.rmem_max=10000000\nsudo sysctl -w net.core.wmem_max=2000000\n</code></pre></p> </li> </ul>"},{"location":"tutorial_mmj.html#1-download-nvidia-cli-for-jetson","title":"1. Download NVIDIA CLI for Jetson","text":"<p>Download NGC for ARM64 from the NGC for CLI site:  <pre><code>unzip ngccli_arm64.zip\nchmod u+x ngc-cli/ngc\necho \"export PATH=\\\"\\$PATH:$(pwd)/ngc-cli\\\"\" &gt;&gt; ~/.bash_profile &amp;&amp; source ~/.bash_profile\nngc config set\n</code></pre> Here it will ask for your API Key, and the organization name, to get those you need to login into NGC and generate an API key here.</p> <p>You should then paste the API key and use the organization name you are using. You can also press [Enter] to select the default values for the remaining options. After this, you should get the message:</p> <pre><code>Successfully saved NGC configuration to /home/jetson/.ngc/config\n</code></pre> <p>Then, login with the same API key: <pre><code>sudo docker login nvcr.io -u \"\\$oauthtoken\" -p &lt;NGC-API-KEY&gt;\n</code></pre></p> <p>Now launch the Redis and Ingress services, as we need them for this tutorial. </p> <pre><code>sudo systemctl start jetson-redis\nsudo systemctl start jetson-ingress\n</code></pre>"},{"location":"tutorial_mmj.html#2-download-and-launch-nvstreamer","title":"2. Download and launch NVStreamer","text":""},{"location":"tutorial_mmj.html#_1","title":"Tutorial mmj","text":"<p>First, we need to install NVStreamer, an app that streams the videos MMJs will need to run AI on them. Follow this NVStreamer Link (In the top-left, click Download files.zip)</p> <p><pre><code>unzip files.zip\nrm files.zip\ntar -xvf nvstreamer.tar.gz\ncd nvstreamer\n</code></pre> Launch it: <pre><code>sudo docker compose -f compose_nvstreamer.yaml up -d  --force-recreate\n</code></pre></p>"},{"location":"tutorial_mmj.html#3-download-ai_nvr-and-launch","title":"3. Download AI_NVR and launch:","text":""},{"location":"tutorial_mmj.html#_2","title":"Tutorial mmj","text":"<p>AI NVR (NGC) Link (Top-left -&gt; Download files.zip)</p> <pre><code>unzip files.zip\nrm files.zip\ntar -xvf ai_nvr.tar.gz\nsudo cp ai_nvr/config/ai-nvr-nginx.conf /opt/nvidia/jetson/services/ingress/config/\ncd ai_nvr\nsudo docker compose -f compose_agx.yaml up -d --force-recreate\n</code></pre>"},{"location":"tutorial_mmj.html#4-download-some-sample-videos-and-upload-them-to-nvstreamer","title":"4. Download some sample videos and upload them to NVStreamer","text":"<p>Download them from here.</p> <p><pre><code>unzip files.zip\n</code></pre> Ok, now, this is important to understand, there are 2 web interfaces:</p> <ol> <li>The NVStream Streamer Dashboard, running in: http://localhost:31000</li> <li>The NVStreamer Camera Management Dashboard, running in: http://localhost:31000</li> </ol> <p>So, first we need to upload the file in the Streamer interface, it looks like this:</p> <p></p> <p>There, go to File Upload, and drag and drop the file in the upload squared area.</p> <p>After uploading it, go to the Dashboad option of the left menu, and copy the RTSP URL of the video you just uploaded, you will need it for the Camera Management Dashboard.</p> <p>Now jump to the Camera Management Dashboard (http://localhost:30080/vst), it looks like this:</p> <p></p> <p>Go to the Camera Management option of the menu, then use the Add device manually option, and paste the RTSP URL, add the name of your video to the Name and Location text boxes, so it will be displayed on top of the stream.</p> <p>Finally, click in the Live Streams option of the left menu, and you should be able to watch your video stream.</p> <p></p>"},{"location":"tutorial_mmj.html#5-watch-rtsp-ai-processed-streaming-from-vlc","title":"5. Watch RTSP AI processed streaming from VLC","text":"<p>Open VLC from another computer (localhost doesn't work here), and point to your Jetson Orin's IP address (you should be in the same network, or not having a firewal to access).</p> <p>The easiest way to get Jetson's ip is launching: <pre><code>ifconfig\n</code></pre> And checking the IP of the interface (usually wlan0, inet IP).</p> <p>Then go to rtsp://[JETSON_IP]:8555/ds-test using VLC like this:</p> <p></p>"},{"location":"tutorial_mmj.html#6-android-app","title":"6. Android app","text":"<p>There is an Android app that allows you to track events and create areas of interest to monitor, you can find it on Google Play as AI NVR.</p> <p></p> <p>Here is a quick walkthough where you can see how to:</p> <ul> <li>Add the IP address of the Jetson</li> <li>Track current events</li> <li>Add new areas of interest</li> <li>Add tripwire to track the flux and direction of events</li> </ul> <p></p>"},{"location":"tutorial_nano-llm.html","title":"NanoLLM - Optimized LLM Inference","text":"<p><code>NanoLLM</code> is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM\u2019s, multimodality, speech services, vector databases with RAG, and web frontends like Agent Studio.</p> <p></p> <p>It provides similar APIs to HuggingFace, backed by highly-optimized inference libraries and quantization tools:</p> NanoLLM Reference Documentation<pre><code>from nano_llm import NanoLLM\n\nmodel = NanoLLM.from_pretrained(\n   \"meta-llama/Meta-Llama-3-8B-Instruct\",  # HuggingFace repo/model name, or path to HF model checkpoint\n   api='mlc',                              # supported APIs are: mlc, awq, hf\n   api_token='hf_abc123def',               # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)\n   quantization='q4f16_ft'                 # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights\n)\n\nresponse = model.generate(\"Once upon a time,\", max_new_tokens=128)\n\nfor token in response:\n   print(token, end='', flush=True)\n</code></pre>"},{"location":"tutorial_nano-llm.html#containers","title":"Containers","text":"<p>To test a chat session with Llama from the command-line, install <code>jetson-containers</code> and run NanoLLM like this:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> Llama CLIAgent Studio <pre><code>jetson-containers run \\\n  --env HUGGINGFACE_TOKEN=hf_abc123def \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.chat --api mlc \\\n      --model meta-llama/Meta-Llama-3-8B-Instruct \\\n      --prompt \"Can you tell me a joke about llamas?\"\n</code></pre> <pre><code>jetson-containers run \\\n  --env HUGGINGFACE_TOKEN=hf_abc123def \\\n  $(autotag nano_llm) \\\n    python3 -m nano_llm.studio\n</code></pre> <p>If you haven't already, request access to the Llama models on HuggingFace and substitute your account's API token above.</p>"},{"location":"tutorial_nano-llm.html#resources","title":"Resources","text":"<p>Here's an index of the various tutorials &amp; examples using NanoLLM on Jetson AI Lab:</p> Benchmarks Benchmarking results for LLM, SLM, VLM using MLC/TVM backend. API Examples Python code examples for chat, completion, and multimodal. Documentation Reference documentation for the NanoLLM model and agent APIs. Llamaspeak Talk verbally with LLMs using low-latency ASR/TTS speech models. Small LLM (SLM) Focus on language models with reduced footprint (7B params and below) Live LLaVA Realtime live-streaming vision/language models on recurring prompts. Nano VLM Efficient multimodal pipeline with one-shot image tagging and RAG support. Agent Studio Rapidly design and experiment with creating your own automation agents. OpenVLA Robot learning with Vision/Language Action models and manipulation in simulator."},{"location":"tutorial_nano-vlm.html","title":"NanoVLM - Efficient Multimodal Pipeline","text":"<p>We saw in the previous LLaVA tutorial how to run vision-language models through tools like <code>text-generation-webui</code> and <code>llama.cpp</code>.  In a similar vein to the SLM page on Small Language Models, here we'll explore optimizing VLMs for reduced memory usage and higher performance that reaches interactive levels (like in Liva LLava).  These are great for fitting on Orin Nano and increasing the framerate.</p> <p>There are 3 model families currently supported:  Llava, VILA, and Obsidian (mini VLM)</p>"},{"location":"tutorial_nano-vlm.html#vlm-benchmarks","title":"VLM Benchmarks","text":"<p>This FPS measures the end-to-end pipeline performance for continuous streaming like with Live Llava (on yes/no question)  </p>"},{"location":"tutorial_nano-vlm.html#multimodal-chat","title":"Multimodal Chat","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;10GB</code>)</li> </ul> </li> <li> <p>Supported VLM models in <code>NanoLLM</code>:</p> <ul> <li><code>liuhaotian/llava-v1.5-7b</code>, <code>liuhaotian/llava-v1.5-13b</code>, <code>liuhaotian/llava-v1.6-vicuna-7b</code>, <code>liuhaotian/llava-v1.6-vicuna-13b</code></li> <li><code>Efficient-Large-Model/VILA-2.7b</code>,<code>Efficient-Large-Model/VILA-7b</code>, <code>Efficient-Large-Model/VILA-13b</code></li> <li><code>Efficient-Large-Model/VILA1.5-3b</code>,<code>Efficient-Large-Model/Llama-3-VILA1.5-8B</code>, <code>Efficient-Large-Model/VILA1.5-13b</code></li> <li><code>VILA-2.7b</code>, <code>VILA1.5-3b</code>, <code>VILA-7b</code>, <code>Llava-7b</code>, and <code>Obsidian-3B</code> can run on Orin Nano 8GB</li> </ul> </li> </ol> <p>The optimized <code>NanoLLM</code> library uses MLC/TVM for quantization and inference provides the highest performance.  It efficiently manages the CLIP embeddings and KV cache.  You can find Python code for the chat program used in this example here. </p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32\n</code></pre> <p>This starts an interactive console-based chat with Llava, and on the first run the model will automatically be downloaded from HuggingFace and quantized using MLC and W4A16 precision (which can take some time).  See here for command-line options.</p> <p>You'll end up at a <code>&gt;&gt; PROMPT:</code> in which you can enter the path or URL of an image file, followed by your question about the image.  You can follow-up with multiple questions about the same image.  Llava does not understand multiple images in the same chat, so when changing images, first reset the chat history by entering <code>clear</code> or <code>reset</code> as the prompt.  VILA supports multiple images (area of active research)</p>"},{"location":"tutorial_nano-vlm.html#automated-prompts","title":"Automated Prompts","text":"<p>During testing, you can specify prompts on the command-line that will run sequentially:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --prompt '/data/images/hoover.jpg' \\\n    --prompt 'what does the road sign say?' \\\n    --prompt 'what kind of environment is it?' \\\n    --prompt 'reset' \\\n    --prompt '/data/images/lake.jpg' \\\n    --prompt 'please describe the scene.' \\\n    --prompt 'are there any hazards to be aware of?'\n</code></pre> <p>You can also use <code>--prompt /data/prompts/images.json</code> to run the test sequence, the results of which are in the table below.</p>"},{"location":"tutorial_nano-vlm.html#results","title":"Results","text":"<p>\u2022 \u00a0 The model responses are with 4-bit quantization enabled, and are truncated to 128 tokens for brevity. \u2022 \u00a0 These chat questions and images are from <code>/data/prompts/images.json</code> (found in jetson-containers) </p>"},{"location":"tutorial_nano-vlm.html#json","title":"JSON","text":"<p>When prompted, these models can also output in constrained JSON formats (which the LLaVA authors cover in their LLaVA-1.5 paper), and can be used to programatically query information about the image:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model liuhaotian/llava-v1.5-13b \\\n    --prompt '/data/images/hoover.jpg' \\\n    --prompt 'extract any text from the image as json'\n\n{\n  \"sign\": \"Hoover Dam\",\n  \"exit\": \"2\",\n  \"distance\": \"1 1/2 mile\"\n}\n</code></pre>"},{"location":"tutorial_nano-vlm.html#web-ui","title":"Web UI","text":"<p>To use this through a web browser instead, see the llamaspeak tutorial: </p> <p></p>"},{"location":"tutorial_nano-vlm.html#live-streaming","title":"Live Streaming","text":"<p>These models can also be used with the Live Llava agent for continuous streaming - just substitute the desired model name below:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.agents.video_query --api=mlc \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-context-len 256 \\\n    --max-new-tokens 32 \\\n    --video-input /dev/video0 \\\n    --video-output webrtc://@:8554/output\n</code></pre> <p>Then navigate your browser to <code>https://&lt;IP_ADDRESS&gt;:8050</code> after launching it with your camera.  Using Chrome or Chromium is recommended for a stable WebRTC connection, with <code>chrome://flags#enable-webrtc-hide-local-ips-with-mdns</code> disabled.</p> <p>The Live Llava tutorial shows how to enable additional features like vector database integration, image tagging, and RAG.</p>"},{"location":"tutorial_nano-vlm.html#video-sequences","title":"Video Sequences","text":"<p>The VILA-1.5 family of models can understand multiple images per query, enabling video search/summarization, action &amp; behavior analysis, change detection, and other temporal-based vision functions.  By manipulating the KV cache and dropping off the last frame from the chat history, we can keep the stream rolling continuously beyond the maximum context length of the model.  The <code>vision/video.py</code> example shows how to use this:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.vision.video \\\n    --model Efficient-Large-Model/VILA1.5-3b \\\n    --max-images 8 \\\n    --max-new-tokens 48 \\\n    --video-input /data/my_video.mp4 \\\n    --video-output /data/my_output.mp4 \\\n    --prompt 'What changes occurred in the video?'\n</code></pre>"},{"location":"tutorial_nano-vlm.html#python-code","title":"Python Code","text":"<p>For a simplified code example of doing live VLM streaming from Python, see here in the NanoLLM docs. </p> <p>You can use this to implement customized prompting techniques and integrate with other vision pipelines.  This code applies the same set of prompts to the latest image from the video feed.  See here for the version that does multi-image queries on video sequences.</p>"},{"location":"tutorial_nanodb.html","title":"Tutorial - NanoDB","text":"<p>Let's run NanoDB's interactive demo to witness the impact of Vector Database that handles multimodal data.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>16GB</code> for container image</li> <li><code>40GB</code> for MS COCO dataset</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_nanodb.html#how-to-start","title":"How to start","text":""},{"location":"tutorial_nanodb.html#download-coco","title":"Download COCO","text":"<p>Just for an example, let's use MS COCO dataset:</p> <pre><code>cd jetson-containers\nmkdir -p data/datasets/coco/2017\ncd data/datasets/coco/2017\n\nwget http://images.cocodataset.org/zips/train2017.zip\nwget http://images.cocodataset.org/zips/val2017.zip\nwget http://images.cocodataset.org/zips/unlabeled2017.zip\n\nunzip train2017.zip\nunzip val2017.zip\nunzip unlabeled2017.zip\n</code></pre>"},{"location":"tutorial_nanodb.html#download-index","title":"Download Index","text":"<p>You can download a pre-indexed NanoDB that was already prepared over the COCO dataset from here:</p> <pre><code>cd jetson-containers/data\nwget https://nvidia.box.com/shared/static/icw8qhgioyj4qsk832r4nj2p9olsxoci.gz -O nanodb_coco_2017.tar.gz\ntar -xzvf nanodb_coco_2017.tar.gz\n</code></pre> <p>This allow you to skip the indexing process in the next step, and jump to starting the Web UI.</p>"},{"location":"tutorial_nanodb.html#indexing-data","title":"Indexing Data","text":"<p>If you didn't download the NanoDB index for COCO from above, we need to build the index by scanning your dataset directory:</p> <pre><code>jetson-containers run $(autotag nanodb) \\\n  python3 -m nanodb \\\n    --scan /data/datasets/coco/2017 \\\n    --path /data/nanodb/coco/2017 \\\n    --autosave --validate \n</code></pre> <p>This will take a few hours on AGX Orin.  Once the database has loaded and completed any start-up operations , it will drop down to a <code>&gt;</code> prompt from which the user can run search queries. You can quickly check the operation by typing your query on this prompt:</p> <pre><code>&gt; a girl riding a horse\n\n* index=80110   /data/datasets/coco/2017/train2017/000000393735.jpg      similarity=0.29991915822029114\n* index=158747  /data/datasets/coco/2017/unlabeled2017/000000189708.jpg  similarity=0.29254037141799927\n* index=123846  /data/datasets/coco/2017/unlabeled2017/000000026239.jpg  similarity=0.292171448469162\n* index=127338  /data/datasets/coco/2017/unlabeled2017/000000042508.jpg  similarity=0.29118549823760986\n* index=77416   /data/datasets/coco/2017/train2017/000000380634.jpg      similarity=0.28964102268218994\n* index=51992   /data/datasets/coco/2017/train2017/000000256290.jpg      similarity=0.28929752111434937\n* index=228640  /data/datasets/coco/2017/unlabeled2017/000000520381.jpg  similarity=0.28642547130584717\n* index=104819  /data/datasets/coco/2017/train2017/000000515895.jpg      similarity=0.285491943359375\n</code></pre> <p>You can press Ctrl+C to exit. For more info about the various options available, see the NanoDB container documentation.</p>"},{"location":"tutorial_nanodb.html#interactive-web-ui","title":"Interactive Web UI","text":"<p>Spin up the Gradio server:</p> <pre><code>jetson-containers run $(autotag nanodb) \\\n  python3 -m nanodb \\\n    --path /data/nanodb/coco/2017 \\\n    --server --port=7860\n</code></pre> <p>Then navigate your browser to <code>http://&lt;IP_ADDRESS&gt;:7860</code>, and you can enter text search queries as well as drag/upload images:</p> <p>To use the dark theme, navigate to <code>http://&lt;IP_ADDRESS&gt;:7860/?__theme=dark</code> instead"},{"location":"tutorial_ollama.html","title":"Tutorial - Ollama","text":"<p>Ollama is a popular LLM tool that's easy to get started with, and includes a built-in model library of pre-quantized weights that will automatically be downloaded and run using llama.cpp underneath for inference.  The ollama container was compiled with CUDA support.</p> <p></p> <ul> <li>The <code>ollama</code> client can run inside or outside container after starting the server.</li> <li>You can also run an Open WebUI server for supporting web clients.</li> <li>Supports the latest models like Llama-3 and Phi-3 Mini!</li> </ul>"},{"location":"tutorial_ollama.html#ollama-server","title":"Ollama Server","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>7GB</code> for <code>ollama</code> container image</li> <li>Space for models (<code>&gt;5GB</code>)</li> </ul> </li> </ol> <pre><code># models cached under jetson-containers/data\njetson-containers run --name ollama $(autotag ollama)\n\n# models cached under your user's home directory\ndocker run --runtime nvidia --rm --network=host -v ~/ollama:/ollama -e OLLAMA_MODELS=/ollama dustynv/ollama:r36.2.0\n</code></pre> <p>Running either of these will start the local Ollama server as a daemon in the background.  It will save the models it downloads under your mounted <code>jetson-containers/data/models/ollama</code> directory (or another directory that you override with <code>OLLAMA_MODELS</code>)</p>"},{"location":"tutorial_ollama.html#ollama-client","title":"Ollama Client","text":"<p>Start the Ollama command-line chat client with your desired model (for example: <code>llama3</code>, <code>phi3</code>, <code>mistral</code>)</p> <pre><code># if running inside the same container as launched above\n/bin/ollama run phi3\n\n# if launching a new container for the client in another terminal\njetson-containers run $(autotag ollama) /bin/ollama run phi3\n</code></pre> <p>Or you can install Ollama's binaries for arm64 outside of container (without CUDA, which only the server needs)</p> <pre><code># download the latest ollama release for arm64 into /bin\nsudo wget https://github.com/ollama/ollama/releases/download/$(git ls-remote --refs --sort=\"version:refname\" --tags https://github.com/ollama/ollama | cut -d/ -f3- | sed 's/-rc.*//g' | tail -n1)/ollama-linux-arm64 -O /bin/ollama\nsudo chmod +x /bin/ollama\n\n# use the client like normal outside container\n/bin/ollama run phi3\n</code></pre>"},{"location":"tutorial_ollama.html#open-webui","title":"Open WebUI","text":"<p>To run an Open WebUI server for client browsers to connect to, use the <code>open-webui</code> container:</p> <pre><code>docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main\n</code></pre> <p>You can then navigate your browser to <code>http://JETSON_IP:8080</code>, and create a fake account to login (these credentials are only local)</p> <p></p> <p>Ollama uses llama.cpp for inference, which various API benchmarks and comparisons are provided for on the Llava page.  It gets roughly half of peak performance versus the faster APIs like NanoLLM, but is generally considered fast enough for text chat.  </p>"},{"location":"tutorial_slm.html","title":"Tutorial - Small Language Models (SLM)","text":"<p>Small Language Models (SLMs) represent a growing class of language models that have &lt;7B parameters - for example StableLM, Phi-2, and Gemma-2B.  Their smaller memory footprint and faster performance make them good candidates for deploying on Jetson Orin Nano.  Some are very capable with abilities at a similar level as the larger models, having been trained on high-quality curated datasets.</p> <p></p> <p>This tutorial shows how to run optimized SLMs with quantization using the <code>NanoLLM</code> library and MLC/TVM backend.  You can run these models through tools like <code>text-generation-webui</code> and llama.cpp as well, just not as fast - and since the focus of SLMs is reduced computational and memory requirements, here we'll use the most optimized path available.  Those shown below have been profiled:</p>"},{"location":"tutorial_slm.html#slm-benchmarks","title":"SLM Benchmarks","text":"<p>\u2022 \u00a0 The HuggingFace Open LLM Leaderboard is a collection of multitask benchmarks including reasoning &amp; comprehension, math, coding, history, geography, ect. \u2022 \u00a0 The model's memory footprint includes 4-bit weights and KV cache at full context length (factor in extra for process overhead, library code, ect) \u2022 \u00a0 The <code>Chat Model</code> is the instruction-tuned variant for chatting with in the commands below, as opposed to the base completion model. </p> <p>Based on user interactions, the recommended models to try are <code>stabilityai/stablelm-zephyr-3b</code> and <code>princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT</code>, for having output quality on par with Llama-2-7B and well-optimized neural architectures. These models have also been used as the base for various fine-tunes (for example <code>Nous-Capybara-3B-V1.9</code>) and mini VLMs. Others may not be particularly coherent.</p>"},{"location":"tutorial_slm.html#chatting-with-slms","title":"Chatting with SLMs","text":"<p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>22GB</code> for <code>nano_llm</code> container image</li> <li>Space for models (<code>&gt;5GB</code>)</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol> <p>The <code>nano_llm.chat</code> program will automatically download and quantize models from HuggingFace like those listed in the table above:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT\n</code></pre> <p>\u2022 \u00a0 For models requiring authentication, use <code>--env HUGGINGFACE_TOKEN=&lt;YOUR-ACCESS-TOKEN&gt;</code> \u2022 \u00a0 Press Ctrl+C twice in succession to exit (once will interrupt bot output) </p> <p>This will enter into interactive mode where you chat back and forth using the keyboard (entering <code>reset</code> will clear the chat history)  </p> <p></p>"},{"location":"tutorial_slm.html#automated-prompts","title":"Automated Prompts","text":"<p>During testing, you can specify prompts on the command-line that will run sequentially:</p> <pre><code>jetson-containers run $(autotag nano_llm) \\\n  python3 -m nano_llm.chat --api=mlc \\\n    --model stabilityai/stablelm-zephyr-3b \\\n    --max-new-tokens 512 \\\n    --prompt 'hi, how are you?' \\\n    --prompt 'whats the square root of 900?' \\\n    --prompt 'can I get a recipie for french onion soup?'\n</code></pre> <p>You can also load JSON files containing prompt sequences, like with <code>--prompt /data/prompts/qa.json</code> (the output of which is below)</p>"},{"location":"tutorial_slm.html#results","title":"Results","text":"<p>\u2022 \u00a0 The model responses are with 4-bit quantization, and are truncated to 256 tokens for brevity. \u2022 \u00a0 These chat questions are from <code>/data/prompts/qa.json</code> (found in jetson-containers) </p>"},{"location":"tutorial_slm.html#nemotron-mini","title":"Nemotron Mini","text":"<p>Nemotron-Mini-4B-Instruct is a 4B SLM tuned for on-device deployment, RAG, and function calling and is based on Minitron-4B (pruned and distilled from Nemotron4 15B).  Inference on Jetson is available through HuggingFace Transformers and llama.cpp for quantization.  Here's how to run a local OpenAI-compatible server with llama.cpp and 4-bit quantized GGUF:</p> <pre><code>jetson-containers run $(autotag llama_cpp) \\\n  llama-server \\\n    --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n    --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n    --gpu-layers 34 \\\n    --seed 42 \\\n    --host 0.0.0.0 \\\n    --port 8080\n</code></pre> <p>For a quick test, you can navigate your browser to <code>http://JETSON_IP:8080</code>, connect other clients like Open WebUI, or have applications send requests to your server's OpenAI chat completion endpoints (i.e. from openai-python, REST, JavaScript, ect)</p> <p></p> <p>You can more easily see the performance with the <code>llama-cli</code> tool:</p> <pre><code>jetson-containers run $(autotag llama_cpp) \\\n  llama-cli \\\n    --hf-repo Obenlia/Nemotron-Mini-4B-Instruct-Q4_K_M-GGUF \\\n    --hf-file nemotron-mini-4b-instruct-q4_k_m.gguf \\\n    --gpu-layers 34 \\\n    --seed 42 \\\n    --ignore-eos \\\n    -n 128 \\\n    -p \"The meaning to life and the universe is\"\n</code></pre> <pre><code># Jetson AGX Orin\nllama_print_timings:        load time =    1408.27 ms\nllama_print_timings:      sample time =      70.05 ms /   128 runs   (    0.55 ms per token,  1827.32 tokens per second)\nllama_print_timings: prompt eval time =     120.08 ms /     9 tokens (   13.34 ms per token,    74.95 tokens per second)\nllama_print_timings:        eval time =    3303.93 ms /   127 runs   (   26.02 ms per token,    38.44 tokens per second)\nllama_print_timings:       total time =    3597.17 ms /   136 tokens\n</code></pre> <p>The model can also be previewed at build.nvidia.com (example client requests for OpenAI API are also there)</p>"},{"location":"tutorial_slm.html#llama-32","title":"Llama 3.2","text":"<p>Meta has released multilingual 1B and 3B SLMs in the latest additions to the Llama family with <code>Llama-3.2-1B</code> and <code>Llama-3.2-3B</code>.  These can be run with INT4 quantization using the latest MLC container for Jetson (<code>dustynv/mlc:0.1.2-r36.3.0</code>).  After having requested access to the models from Meta with your HuggingFace API key, you can download, quantize, and benchmark them with these commands:</p> <pre><code>HUGGINGFACE_KEY=YOUR_API_KEY \\\nMLC_VERSION=0.1.2 \\\n jetson-containers/packages/llm/mlc/benchmark.sh \\\n   meta-llama/Llama-3.2-1B\n</code></pre> <ul> <li><code>Llama-3.2-1B</code> \u00a0 Jetson Orin Nano 54.8 tokens/sec, Jetson AGX Orin 163.9 tokens/sec</li> <li><code>Llama-3.2-3B</code> \u00a0 Jetson Orin Nano 27.7 tokens/sec, Jetson AGX Orin 80.4 tokens/sec</li> </ul> <p>The Llama-3.2 SLMs use the same core Llama architecture as previous Llama releases (except <code>tie_word_embeddings=True</code>), so it is already supported with quantization and full performance on edge devices.  Thanks to Meta for continuing to advance open generative AI models with Llama.</p>"},{"location":"tutorial_stable-diffusion-xl.html","title":"Tutorial - Stable Diffusion XL","text":"<p>Stable Diffusion XL is a newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities.  All told, SDXL 1.0 has 6.6 billion model parameters, in comparison to 0.98 billion for the original SD 1.5 model.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices (SDXL requires &gt;= ~13GB memory)</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li><code>12.4GB</code> for SDXL models</li> </ul> </li> <li> <p>Have followed the previous <code>stable-diffusion-webui</code> tutorial and have the webserver container running.</p> </li> </ol>"},{"location":"tutorial_stable-diffusion-xl.html#downloading-sdxl-models","title":"Downloading SDXL Models","text":"<p>Stable Diffusion XL is supported through AUTOMATIC1111's <code>stable-diffusion-webui</code> with some additional settings.  First you need to download the SDXL models to your <code>jetson-containers</code> data directory (which is automatically mounted into the container)</p> <pre><code># run these outside of container, and replace CONTAINERS_DIR with the path to the jetson-containers repo on your device\nCONTAINERS_DIR=/path/to/your/jetson-containers\nMODEL_DIR=$CONTAINERS_DIR/data/models/stable-diffusion/models/Stable-diffusion/\n\nsudo chown -R $USER $MODEL_DIR\n\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors\nwget -P $MODEL_DIR https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors\n</code></pre> <p>It's assumed that you already have the <code>stable-diffusion-webui</code> container and webserver running from the previous tutorial.</p>"},{"location":"tutorial_stable-diffusion-xl.html#sdxl-settings","title":"SDXL Settings","text":"<p>After the models have finished downloading, click the \ud83d\udd04 button to refresh the model list in the web UI.  Select <code>sd_xl_base_1.0.safetensors</code> from the Stable Diffusion checkpoint drop-down:</p> <p></p> <p>Then under the Generation tab, expand the Refiner section, and select <code>sd_xl_refiner_1.0.safetensors</code> from the drop-down:</p> <p></p> <p>Guidance on other relevant settings:</p> <ul> <li>Change the width/height to 1024x1024.  SDXL works best at higher resolutions, and using 512x512 often results in more simplistic/cartoonish content.  Changing image resolutions impacts the actual scene contents of the image, not just the details.</li> <li>The refiner's <code>Switch at</code> setting dictates the step at which the refiner takes over from the base model.  At this point, additional subject content will no longer be added to the scene, but rather its details further refined in the image.</li> <li>Typical <code>Sampling steps</code> are between 20-40 and <code>Switch at</code> is between 0.7-0.9.  This takes experimentation to find the best combination for the characteristics of your desired output.</li> <li>Extensive negative prompts are not as necessary in the same way as SD 1.5 was (e.g. <code>low quality, jpeg artifacts, blurry, ect</code>)</li> <li><code>CFG Scale</code> controls how closely the model conforms to your prompt versus how creative it is.</li> </ul> <p>When you get a good image, remember to save your random seed and settings so you can re-generate it later!</p>"},{"location":"tutorial_stable-diffusion-xl.html#results","title":"Results","text":"<p> photograph of a friendly robot alongside a person climbing a mountain (seed 1576166644, steps 25, switch @ 0.8, cfg scale 15)</p> <p> a girl and a boy building a friendly robot in their basement workshop (seed 642273464, steps 25, switch @ 0.9, cfg scale 7)</p> <p> small friendly robots playing games with people, in a futuristic Tokyo central park gardens with cherry blossoms and water, coy fish swimming in the water, sunshine (seed 642273464, steps 40, switch @ 0.8, cfg scale 7)</p> <p> small friendly robots playing games with people in a futuristic New York City Central Park in autumn, water (seed 642273464, steps 25, switch @ 0.8, cfg scale 7)</p> <p>Want to explore using Python APIs to run diffusion models directly? See <code>jetson-containers/stable-diffusion</code>.</p>"},{"location":"tutorial_stable-diffusion.html","title":"Tutorial - Stable Diffusion","text":"<p>Let's run AUTOMATIC1111's <code>stable-diffusion-webui</code> on NVIDIA Jetson to generate images from our prompts!</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li><code>4.1GB</code> for SD 1.5 model</li> </ul> </li> </ol>"},{"location":"tutorial_stable-diffusion.html#setup-a-container-for-stable-diffusion-webui","title":"Setup a container for stable-diffusion-webui","text":"<p>The jetson-containers project provides pre-built Docker images for <code>stable-diffusion-webui</code>.  You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> <p>Info</p> <p>JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the <code>stable-diffusion-webui</code> and <code>text-generation-webui</code>.  You can find the complete article with detailed instructions here.</p> <p></p>"},{"location":"tutorial_stable-diffusion.html#how-to-start","title":"How to start","text":"<p>Use <code>jetson-containers run</code> and <code>autotag</code> tools to automatically pull or build a compatible container image:</p> <pre><code>jetson-containers run $(autotag stable-diffusion-webui)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the webserver like this:</p> <pre><code>cd /opt/stable-diffusion-webui &amp;&amp; python3 launch.py \\\n  --data=/data/models/stable-diffusion \\\n  --enable-insecure-extension-access \\\n  --xformers \\\n  --listen \\\n  --port=7860\n</code></pre> <p>You should see it downloading the model checkpoint on the first run.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code></p>"},{"location":"tutorial_stable-diffusion.html#results-output-examples","title":"Results / Output Examples","text":""},{"location":"tutorial_stable-diffusion.html#stable-diffusion-xl","title":"Stable Diffusion XL","text":"<p>To generate even higher-quality and detailed images, check out the next part of the tutorial that uses the latest Stable Diffusion XL models!</p> <p>Want to explore using Python APIs to run diffusion models directly? See <code>jetson-containers/stable-diffusion</code>.</p>"},{"location":"tutorial_text-generation.html","title":"Tutorial - text-generation-webui","text":"<p>Interact with a local AI assistant by running a LLM with oobabooga's <code>text-generaton-webui</code> on NVIDIA Jetson!</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f<sup>1</sup></p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.2GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> </ol>"},{"location":"tutorial_text-generation.html#set-up-a-container-for-text-generation-webui","title":"Set up a container for text-generation-webui","text":"<p>The jetson-containers project provides pre-built Docker images for <code>text-generation-webui</code> along with all of the loader API's built with CUDA enabled (llama.cpp, ExLlama, AutoGPTQ, Transformers, ect).  You can clone the repo to use its utilities that will automatically pull/start the correct container for you, or you can do it manually.</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> <p>Info</p> <p>JetsonHacks provides an informative walkthrough video on jetson-containers, showcasing the usage of both the <code>stable-diffusion-webui</code> and <code>text-generation-webui</code>.  You can find the complete article with detailed instructions here.</p> <p></p>"},{"location":"tutorial_text-generation.html#how-to-start","title":"How to start","text":"<p>Use <code>jetson-containers run</code> and <code>autotag</code> tools to automatically pull or build a compatible container image:</p> <pre><code>jetson-containers run $(autotag text-generation-webui)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the webserver like this:</p> <pre><code>cd /opt/text-generation-webui &amp;&amp; python3 server.py \\\n  --model-dir=/data/models/text-generation-webui \\\n  --chat \\\n  --listen\n</code></pre> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code>.</p>"},{"location":"tutorial_text-generation.html#download-a-model-on-web-ui","title":"Download a model on web UI","text":"<p>See the oobabooga documentation for instructions for downloading models - either from within the web UI, or using <code>download-model.py</code></p> <pre><code>jetson-containers run --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \\\n  'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'\n</code></pre> <p>From within the web UI, select Model tab and navigate to \"Download model or LoRA\" section.  </p> <p>You can find text generation models on Hugging Face Hub, then enter the Hugging Face username/model path (which you can have copied to your clipboard from the Hub).  Then click the Download button.</p>"},{"location":"tutorial_text-generation.html#gguf-models","title":"GGUF models","text":"<p>The fastest oobabooga model loader to use is currently llama.cpp with 4-bit quantized GGUF models.</p> <p>You can download a single model file for a particular quantization, like <code>*.Q4_K_M.bin</code>. Input the file name and hit Download button.</p> Model Quantization Memory (MB) <code>TheBloke/Llama-2-7b-Chat-GGUF</code> <code>llama-2-7b-chat.Q4_K_M.gguf</code> 5,268 <code>TheBloke/Llama-2-13B-chat-GGUF</code> <code>llama-2-13b-chat.Q4_K_M.gguf</code> 8,609 <code>TheBloke/LLaMA-30b-GGUF</code> <code>llama-30b.Q4_K_S.gguf</code> 19,045 <code>TheBloke/Llama-2-70B-chat-GGUF</code> <code>llama-2-70b-chat.Q4_K_M.gguf</code> 37,655 <p></p> <p>Info</p>"},{"location":"tutorial_text-generation.html#model-selection-for-jetson-orin-nano","title":"Model selection for Jetson Orin Nano","text":"<p>Jetson Orin Nano Developer Kit has only 8GB RAM for both CPU (system) and GPU, so you need to pick a model that fits in the RAM size - see the Model Size section below.  The 7B models with 4-bit quantization are the ones to use on Jetson Orin Nano.  Make sure you go through the RAM optimization steps before attempting to load such model on Jetson Orin Nano.</p>"},{"location":"tutorial_text-generation.html#load-a-model","title":"Load a model","text":"<p>After you have downloaded a model, click the \ud83d\udd04 button to refresh your model list, and select the model you want to use.</p> <p>For a GGUF model, remember to</p> <ul> <li>Set <code>n-gpu-layers</code> to <code>128</code></li> <li>Set <code>n_gqa</code> to <code>8</code> if you using Llama-2-70B (on Jetson AGX Orin 64GB)</li> </ul> <p>Then click the Load button.</p>"},{"location":"tutorial_text-generation.html#chat-template","title":"Chat Template","text":"<p>If you're using a Llama model fine-tuned for chat, like the models listed above (except for <code>LLaMA-30b</code>), you need to use the oobabooga Instruct mode and set the template.  On the Parameters tab, go to the Instruction Template sub-tab, then select <code>Llama-v2</code> from the Instruction Template drop-down (or Vicuna, Guanaco, ect if you are using one of those models)  </p> <p>For the base text completion models (like <code>LLaMA-30b</code>), use the Default or Notebook tab.</p> <p>Selecting the right chat template will make sure the model is being prompted correctly - you can also change the system prompt in the Context box to alter the agent's personality and behavior.  There are a lot of other settings under the Generation tab, like the maximum length it should output per reply, and token sampling parameters like <code>temperature</code> and <code>top_p</code> for controlling randomness.</p> <p>Then change back to the Chat tab, and under the mode section, make sure Instruct is selected (confusingly, not chat mode).  Then you can start chatting with the LLM!</p>"},{"location":"tutorial_text-generation.html#results","title":"Results","text":""},{"location":"tutorial_text-generation.html#things-to-do-with-your-llm","title":"Things to do with your LLM","text":"<p>Here are some common test prompts for coding, math, history ect.  You can also ask it about geography, travel, nature, recipies, fixing things, general life advice, and practically everything else.  Also Llama-2 is quite playful and likes to play games to test its logic abilities!</p> <pre><code>&gt;&gt; What games do you like to play?\n\nI'm a large language model, so I can play text-based games and answer questions on a wide variety of topics. Here are some game ideas:\n\n- 20 Questions:  You or I can think of an object, and the other player tries to guess what it is by asking up to 20 yes-or-no questions.  The goal is to guess the object in as few questions as possible.\n- Hangman: I can think of a word or phrase and you try to guess it by suggesting letters. Each incorrect guess will bring the hangman one step closer to being complete!\n- Word association: I'll give you a word and you have to come up with a word that is related to it. We can keep playing back and forth as long as you'd like!\n- Trivia: I can provide questions on a wide range of topics, from history to science to pop culture. See how many questions you can answer correctly!\n- Storytelling: I can start telling a story and then stop at a cliffhanger. You can then try to guess what happens next or even take over the storytelling and continue it in your own\n- Jokes: I love to tell jokes and make people laugh with my \"Dad humor\"!  Knock knock!  *giggles*\n</code></pre>"},{"location":"tutorial_text-generation.html#model-size-tested","title":"Model size tested","text":"<p>With llama.cpp, GGUF model, 4-bit quantization:</p> Model size Jetson AGX Orin 64GB Jetson AGX Orin 32GB Jetson Orin Nano 8GB 70B model \u2705 30B model \u2705 \u2705 13B model \u2705 \u2705 7B model \u2705 \u2705 \u2705 <p>Want to explore using Python APIs to run LLMs directly?  See <code>jetson-containers</code> for its LLM related packages and containers.</p> <ol> <li> <p>Limited to 7B model (4-bit quantized).\u00a0\u21a9</p> </li> </ol>"},{"location":"tutorial_ultralytics.html","title":"Tutorial - Ultralytics YOLOv8","text":"<p>Let's run Ultralytics YOLOv8 on Jetson with NVIDIA TensorRT.</p> <p>Ultralytics YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB) Jetson Nano (4GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 4 (L4T r32.x) JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> </ol>"},{"location":"tutorial_ultralytics.html#how-to-start","title":"How to start","text":"<p>Execute the below commands according to the JetPack version to pull the corresponding Docker container and run on Jetson.</p> JetPack 4JetPack 5JetPack 6 <pre><code>t=ultralytics/ultralytics:latest-jetson-jetpack4\nsudo docker pull $t &amp;&amp; sudo docker run -it --ipc=host --runtime=nvidia $t\n</code></pre> <pre><code>t=ultralytics/ultralytics:latest-jetson-jetpack5\nsudo docker pull $t &amp;&amp; sudo docker run -it --ipc=host --runtime=nvidia $t\n</code></pre> <pre><code>t=ultralytics/ultralytics:latest-jetson-jetpack6\nsudo docker pull $t &amp;&amp; sudo docker run -it --ipc=host --runtime=nvidia $t\n</code></pre>"},{"location":"tutorial_ultralytics.html#convert-model-to-tensorrt-and-run-inference","title":"Convert model to TensorRT and run inference","text":"<p>The YOLOv8n model in PyTorch format is converted to TensorRT to run inference with the exported model.</p> <p>Example</p> PythonCLI <pre><code>from ultralytics import YOLO\n\n# Load a YOLOv8n PyTorch model\nmodel = YOLO(\"yolov8n.pt\")\n\n# Export the model\nmodel.export(format=\"engine\")  # creates 'yolov8n.engine'\n\n# Load the exported TensorRT model\ntrt_model = YOLO(\"yolov8n.engine\")\n\n# Run inference\nresults = trt_model(\"https://ultralytics.com/images/bus.jpg\")\n</code></pre> <pre><code># Export a YOLOv8n PyTorch model to TensorRT format\nyolo export model=yolov8n.pt format=engine  # creates 'yolov8n.engine'\n\n# Run inference with the exported model\nyolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'\n</code></pre> Manufacturing Sports Wildlife Vehicle Spare Parts Detection Football Player Detection Tiger pose Detection <p>Note</p> <p>Visit the Export page to access additional arguments when exporting models to different model formats. Note that the default arguments require inference using fixed image dimensions when <code>dynamic=False</code>. To change the input source for inference, please refer to Model Prediction page. </p>"},{"location":"tutorial_ultralytics.html#benchmarks","title":"Benchmarks","text":"<p>Benchmarks of the YOLOv8 variants with TensorRT were run by Seeed Studio on their reComputer systems:</p> <p></p> Xavier NX 8GBOrin NX 16GBAGX Orin 32GB Model PyTorch FP32 FP16 INT8 YOLOv8n 32 63 120 167 YOLOv8s 25 26 69 112 YOLOv8m 11 11 33 56 YOLOv8l 6 6 20 38 Model PyTorch FP32 FP16 INT8 YOLOv8n 56 115 204 256 YOLOv8s 53 67 128 196 YOLOv8m 26 31 63 93 YOLOv8l 16 20 42 69 Model PyTorch FP32 FP16 INT8 YOLOv8n 77 192 323 385 YOLOv8s 67 119 213 303 YOLOv8m 40 56 105 145 YOLOv8l 27 38 73.5 114 <ul> <li>FP32/FP16/INT8 with TensorRT (frames per second)</li> <li>Original post with the benchmarks are found here</li> </ul>"},{"location":"tutorial_ultralytics.html#further-reading","title":"Further reading","text":"<p>To learn more, visit our comprehensive guide on running Ultralytics YOLOv8 on NVIDIA Jetson including benchmarks!</p> <p>Note</p> <p>Ultralytics YOLOv8 models are offered under <code>AGPL-3.0 License</code> which is an OSI-approved open-source license and is ideal for students and enthusiasts, promoting open collaboration and knowledge sharing. See the LICENSE file for more details.</p>"},{"location":"tutorial_ultralytics.html#one-click-run-ultralytics-yolo-on-jetson-orin-by-seeed-studio-jetson-examples","title":"One-Click Run Ultralytics YOLO on Jetson Orin - by Seeed Studio jetson-examples","text":""},{"location":"tutorial_ultralytics.html#quickstart","title":"Quickstart \u26a1","text":"<ol> <li> <p>Install the package:     <pre><code>pip install jetson-examples\n</code></pre></p> </li> <li> <p>Restart your reComputer:     <pre><code>sudo reboot\n</code></pre></p> </li> <li> <p>Run Ultralytics YOLO on Jetson with one command:     <pre><code>reComputer run ultralytics-yolo\n</code></pre></p> </li> <li>Enter <code>http://127.0.0.1:5001</code> or <code>http://device_ip:5001</code> in your browser to access the Web UI.     <p> </p></li> </ol> <p>For more details, please read: Jetson-Example: Run Ultralytics YOLO Platform Service on NVIDIA Jetson Orin .</p>"},{"location":"tutorial_ultralytics.html#follow-ultralytics-to-stay-updated","title":"Follow Ultralytics to stay updated!","text":""},{"location":"tutorial_voicecraft.html","title":"Tutorial - VoiceCraft","text":"<p>Let's run VoiceCraft, a Zero-Shot Speech Editing and Text-to-Speech in the Wild!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) </p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p> JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>15.6 GB</code> for <code>voicecraft</code> container image</li> <li>Space for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_voicecraft.html#how-to-start","title":"How to start","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag voicecraft)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Gradio app.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:7860</code>.</p>"},{"location":"tutorial_voicecraft.html#gradio-app","title":"Gradio app","text":"<p>VoiceCraft repo comes with Gradio demo app.</p> <ol> <li>Select which models you want to use, I recommend using 330M_TTSEnhanced on 32GB AGX Orin</li> <li>Click load, if you run it for the first time, models are downloaded from huggingface, otherwise are loaded from <code>/data</code> folder, where are saved to from previous runs</li> <li>Upload audio file of your choice (MP3/wav)</li> <li>Click transcribe, it will use whisper to get transcription along with start/end time of each word spoken</li> <li>Now you can edit the sentence, or use TTS. Click Run to generate output.</li> </ol> <p></p> <p>Warning</p> <p>For TTS it's okay to use only first few seconds of audio as prompt, since it consumes a lot of memory. On AGX 32GB Orin the maximal TTS length of generated audio is around ~16 seconds in headless mode.</p>"},{"location":"tutorial_voicecraft.html#resources","title":"Resources","text":"<p>If you want to know how it works under the hood, you can read following papers:</p> <ol> <li>VOICECRAFT: Zero-Shot Speech Editing and Text-to-Speech in the Wild</li> <li>High Fidelity Neural Audio Compression</li> <li>Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers</li> </ol>"},{"location":"tutorial_whisper.html","title":"Tutorial - Whisper","text":"<p>Let's run OpenAI's Whisper, pre-trained model for automatic speech recognition on Jetson!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.1 GB</code> for <code>whisper</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"tutorial_whisper.html#how-to-start","title":"How to start","text":"<p>Use <code>run.sh</code> and <code>autotag</code> script to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag whisper)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server, with SSL enabled.</p> <p>Open your browser and access <code>https://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>Attention</p> <p>Note it is <code>https</code> (not <code>http</code>).</p> <p>HTTPS (SSL) connection is needed to allow <code>ipywebrtc</code> widget to have access to your microphone (for <code>record-and-transcribe.ipynb</code>).</p> <p>You will see a warning message like this.</p> <p></p> <p>Press \"Advanced\" button and then click on \"Proceed to  (unsafe)\" link to proceed to the Jupyter Lab web interface. <p></p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p>"},{"location":"tutorial_whisper.html#run-jupyter-notebooks","title":"Run Jupyter notebooks","text":"<p>Whisper repo comes with demo Jupyter notebooks, which you can find under <code>/notebooks/</code> directory.</p> <p><code>jetson-containers</code> also adds one convenient notebook (<code>record-and-transcribe.ipynb</code>) to record your audio sample on Jupyter notebook in order to run transcribe on your recorded audio. </p> <p></p>"},{"location":"tutorial_whisper.html#record-and-transcribeipynb","title":"<code>record-and-transcribe.ipynb</code>","text":"<p>This notebook is to let you record your own audio sample using your PC's microphone and apply Whisper's <code>medium</code> model to transcribe the audio sample.</p> <p>It uses Jupyter notebook/lab's <code>ipywebrtc</code> extension to record an audio sample on your web browser.</p> <p></p> <p>Attention</p> <p>When you click the \u23fa botton, your web browser may show a pop-up to ask you to allow it to use your microphone. Be sure to allow the access.</p> <p></p> Final check <p>Once done, if you click on the \"\u26a0 Not secure\" part in the URL bar, you should see something like this.</p> <p></p>"},{"location":"tutorial_whisper.html#result","title":"Result","text":"<p>Once you go through all the steps, you should see the transcribe result in text like this.</p> <p></p>"},{"location":"vit/index.html","title":"Vision Transformers &amp; Computer Vision","text":"<p>You can quickly run these realtime Vision Transformers (ViT) and computer vision models onboard your Jetson:</p>"},{"location":"vit/index.html#efficient-vit","title":"Efficient ViT","text":""},{"location":"vit/index.html#nanosam","title":"NanoSAM","text":""},{"location":"vit/index.html#nanoowl","title":"NanoOWL","text":""},{"location":"vit/index.html#sam","title":"SAM","text":""},{"location":"vit/index.html#tam","title":"TAM","text":""},{"location":"vit/index.html#ultralytics-yolov8","title":"Ultralytics YOLOv8","text":""},{"location":"vit/tutorial_efficientvit.html","title":"Tutorial - EfficientViT","text":"<p>Let's run MIT Han Lab's EfficientViT on Jetson!</p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>10.9 GB</code> for <code>efficientvit</code> container image</li> <li>Space for checkpoints</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_efficientvit.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag efficientvit)\n</code></pre>"},{"location":"vit/tutorial_efficientvit.html#usage-of-efficientvit","title":"Usage of EfficientViT","text":"<p>The official EfficientViT repo shows the complete usage information: <code>https://github.com/mit-han-lab/efficientvit#usage</code></p>"},{"location":"vit/tutorial_efficientvit.html#run-examplebenchmark","title":"Run example/benchmark","text":"<p>Inside the container, a small benchmark script <code>benchmark.py</code> is added under <code>/opt/efficientvit</code> directory by the jetson-container build process.</p> <p>It is to test EfficientViT-L2-SAM in bounding box mode, so we can use this as an example and verify the output.</p>"},{"location":"vit/tutorial_efficientvit.html#download-l2pt-model","title":"Download <code>l2.pt</code> model","text":"<pre><code>mkdir -p /data/models/efficientvit/sam/\ncd /data/models/efficientvit/sam/\nwget https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt\n</code></pre> <p>The downloaded checkpoint file is stored on the <code>/data/</code> directory that is mounted from the Docker host.</p>"},{"location":"vit/tutorial_efficientvit.html#run-benchmark-script","title":"Run benchmark script","text":"<pre><code>cd /opt/efficientvit\npython3 ./benchmark.py\n</code></pre> <p>At the end you should see a summary like the following.</p> <pre><code>AVERAGE of 2 runs:\n  encoder --- 0.062 sec\n  latency --- 0.083 sec\nMemory consumption :  3419.68 MB\n</code></pre>"},{"location":"vit/tutorial_efficientvit.html#check-the-outputresult","title":"Check the output/result","text":"<p>The output image file (of the last inference result) is stored as <code>/data/benchmarks/efficientvit_sam_demo.png</code>.</p> <p>It is stored under <code>/data/</code> directory that is mounted from the Docker host. So you can go back to your host machine, and check <code>jetson-containers/data/benchmark/</code> directory.</p> <p>You should find the output like this.</p> <p></p>"},{"location":"vit/tutorial_nanoowl.html","title":"Tutorial - NanoOWL","text":"<p>Let's run NanoOWL, OWL-ViT optimized to run real-time on Jetson with NVIDIA TensorRT.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>7.2 GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_nanoowl.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run --workdir /opt/nanoowl $(autotag nanoowl)\n</code></pre>"},{"location":"vit/tutorial_nanoowl.html#how-to-run-the-tree-prediction-live-camera-example","title":"How to run the tree prediction (live camera) example","text":"<ol> <li> <p>Ensure you have a camera device connected</p> <pre><code>ls /dev/video*\n</code></pre> <p>If no video device is found, exit from the container and check if you can see a video device on the host side.</p> </li> <li> <p>Launch the demo     <pre><code>cd examples/tree_demo\npython3 tree_demo.py ../../data/owl_image_encoder_patch32.engine\n</code></pre></p> <p>Info</p> <p>If it fails to find or load the TensorRT engine file, build the TensorRT engine for the OWL-ViT vision encoder on your Jetson device.</p> <pre><code>python3 -m nanoowl.build_image_encoder_engine \\\n    data/owl_image_encoder_patch32.engine\n</code></pre> </li> <li> <p>Second, open your browser to <code>http://&lt;ip address&gt;:7860</code></p> </li> <li> <p>Type whatever prompt you like to see what works!  </p> <p>Here are some examples</p> <ul> <li>Example: <code>[a face [a nose, an eye, a mouth]]</code></li> <li>Example: <code>[a face (interested, yawning / bored)]</code></li> <li>Example: <code>(indoors, outdoors)</code></li> </ul> </li> </ol>"},{"location":"vit/tutorial_nanoowl.html#result","title":"Result","text":""},{"location":"vit/tutorial_nanosam.html","title":"Tutorial - NanoSAM","text":"<p>Let's run NVIDIA's NanoSAM to check out the performance gain by distillation.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.3GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_nanosam.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag nanosam)\n</code></pre>"},{"location":"vit/tutorial_nanosam.html#run-examples","title":"Run examples","text":"<p>Inside the container, you can move to <code>/opt/nanosam</code> directory, to go through all the examples demonstrated on the repo.</p> <pre><code>cd /opt/nanosam\n</code></pre> <p>To run the \"Example 1 - Segment with bounding box\":</p> <pre><code>python3 examples/basic_usage.py \\\n    --image_encoder=\"data/resnet18_image_encoder.engine\" \\\n    --mask_decoder=\"data/mobile_sam_mask_decoder.engine\"\n</code></pre> <p>The result is saved under <code>/opt/nanosam/data/basic_usage_out.jpg</code>.</p> <p>To check on your host machine, you can copy that into <code>/data</code> directory of the container where that is mounted from the host.</p> <pre><code>cp data/basic_usage_out.jpg /data/\n</code></pre> <p>Then you can go to your host system, and find the file under <code>jetson-containers/data/basic_usage_out.jpg</code></p>"},{"location":"vit/tutorial_nanosam.html#results","title":"Results","text":""},{"location":"vit/tutorial_sam.html","title":"Tutorial - SAM (Segment Anything)","text":"<p>Let's run Meta's <code>SAM</code> on NVIDIA Jetson.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson devices:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB) Jetson Orin NX (16GB) Jetson Orin Nano (8GB)\u26a0\ufe0f<sup>1</sup></p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x) JetPack 6 (L4T r36.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_sam.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag sam)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start the Jupyter Lab server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:8888</code>.</p> <p>The default password for Jupyter Lab is <code>nvidia</code>.</p>"},{"location":"vit/tutorial_sam.html#run-jupyter-notebook","title":"Run Jupyter notebook","text":"<p>In Jupyter Lab, navigate to <code>notebooks</code> and open <code>automatic_mask_generator_example.py</code> notebook.</p> <p>Create a new cell at the top, insert the model download command below and run the cell.</p> <pre><code>!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\n</code></pre> <p>Then go through executing all the cells below Set-up.</p>"},{"location":"vit/tutorial_sam.html#results","title":"Results","text":"<ol> <li> <p>The biggest <code>vit_h</code> (2.4GB) model may not ran due to OOM, but <code>vit_l</code> (1.1GB) runs on Jetson Orin Nano.\u00a0\u21a9</p> </li> </ol>"},{"location":"vit/tutorial_tam.html","title":"Tutorial - SAM (Segment Anything)","text":"<p>Let's run <code>TAM</code> to perform Segment Anything on videos on NVIDIA Jetson.</p> <p></p> <p>What you need</p> <ol> <li> <p>One of the following Jetson:</p> <p>Jetson AGX Orin (64GB) Jetson AGX Orin (32GB)</p> </li> <li> <p>Running one of the following versions of JetPack:</p> <p>JetPack 5 (L4T r35.x)</p> </li> <li> <p>Sufficient storage space (preferably with NVMe SSD).</p> <ul> <li><code>6.8GB</code> for container image</li> <li>Spaces for models</li> </ul> </li> <li> <p>Clone and setup <code>jetson-containers</code>:</p> <pre><code>git clone https://github.com/dusty-nv/jetson-containers\nbash jetson-containers/install.sh\n</code></pre> </li> </ol>"},{"location":"vit/tutorial_tam.html#how-to-start","title":"How to start","text":"<p>Use the <code>jetson-containers run</code> and <code>autotag</code> commands to automatically pull or build a compatible container image.</p> <pre><code>jetson-containers run $(autotag tam)\n</code></pre> <p>The container has a default run command (<code>CMD</code>) that will automatically start TAM's web server.</p> <p>Open your browser and access <code>http://&lt;IP_ADDRESS&gt;:12212</code>.</p>"},{"location":"vit/tutorial_tam.html#tam-web-ui","title":"TAM web UI","text":"<p>Check out the official tutorial to learn how to operate the web UI.</p>"},{"location":"vit/tutorial_tam.html#results","title":"Results","text":""},{"location":"vit/tutorial_tam.html#troubleshooting","title":"Troubleshooting","text":""},{"location":"vit/tutorial_tam.html#filenotfounderror-errno-2-no-such-file-or-directory-checkpointse2fgvi-hq-cvpr22pth","title":"<code>FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/E2FGVI-HQ-CVPR22.pth'</code>","text":"<p>You may find the TAM app fails to download a checkpoint file <code>E2FGVI-HQ-CVPR22.pth</code>.</p> <pre><code>Downloading checkpoints from Google Drive... tips: If you cannot see the progress bar, please try to download it manuall               and put it in the checkpointes directory. E2FGVI-HQ-CVPR22.pth: https://github.com/MCG-NKU/E2FGVI(E2FGVI-HQ model)\nAccess denied with the following error:\n\n        Cannot retrieve the public link of the file. You may need to change\n        the permission to 'Anyone with the link', or have had many accesses. \n\nYou may still be able to access the file from the browser:\n\n         https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \n</code></pre> <p>You can manually download the checkpoint file on your Docker host machine.</p> <pre><code>cd jetson-containers/\npip install gdown\nsource ~/.profile\ngdown https://drive.google.com/uc?id=10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3 \nmv E2FGVI-HQ-CVPR22.pth ./data/models/tam/\n</code></pre> <p>And you can try running the TAM container.</p> <pre><code>jetson-containers run $(autotag tam)\n</code></pre>"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 8109a5d0cfdf3baae431ad5d1c208b5dd2f2be24..bd75785e9f55eebcb834795801164a4fb2c317ee 100644
GIT binary patch
delta 13
Ucmb=gXP58h;9$6FJ(0Zv02-hJ-~a#s

delta 13
Ucmb=gXP58h;9%%Co5)@P02wX=l>h($

diff --git a/tensorrt_llm.html b/tensorrt_llm.html
index e835a59..0e57b75 100644
--- a/tensorrt_llm.html
+++ b/tensorrt_llm.html
@@ -872,7 +872,7 @@ <h2 id="building-tensorrt-llm-engine-for-llama">
        </p>
        <div class="highlight">
         <pre><span></span><code>jetson-containers<span class="w"> </span>run<span class="w"> </span><span class="se">\</span>
-<span class="w">  </span>-e<span class="w"> </span><span class="nv">HUGGINGFACE_TOKEN</span><span class="o">=</span>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<span class="w"> </span><span class="se">\</span>
+<span class="w">  </span>-e<span class="w"> </span><span class="nv">HUGGINGFACE_TOKEN</span><span class="o">=</span>YOUR_API_KEY<span class="w"> </span><span class="se">\</span>
 <span class="w">  </span>-e<span class="w"> </span><span class="nv">FORCE_BUILD</span><span class="o">=</span>on<span class="w"> </span><span class="se">\</span>
 <span class="w">  </span>dustynv/tensorrt_llm:0.12-r36.4.0<span class="w"> </span><span class="se">\</span>
 <span class="w">    </span>/opt/TensorRT-LLM/llama.sh