diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..f5c587c5 --- /dev/null +++ b/404.html @@ -0,0 +1,1828 @@ + + + + +
+ + + + + + + + + + + + + + + + +Rapidly design and experiment with creating your own automation agents, personal assistants, and edge AI systems in an interactive sandbox for connecting multimodal LLMs, speech and vision transformers, vector databases, prompt templating, and function calling to live sensors and I/O. Optimized for deployment onboard Jetson with on-device compute, low-latency streaming, and unified memory.
+ + +• Edge LLM inference with quantization and KV caching (NanoLLM) | +• Realtime vision/language models (ala Live Llava and Video VILA) | +
• Speech recognition and synthesis (Whisper ASR, Piper TTS, Riva) | +• Multimodal vector database from NanoDB | +
• Audio and video streaming (WebRTC, RTP, RTSP, V4L2) | +• Performance monitoring and profiling | +
• Native bot-callable functions and agent tools | +• Extensible plugins with auto-generated UI controls | +
• Save, load, and export pipeline presets | +• Native bot-callable functions and agent tools | +
What you need
+One of the following Jetson devices:
+Jetson AGX Orin (64GB) +Jetson AGX Orin (32GB) +Jetson Orin NX (16GB) +Jetson Orin Nano (8GB)
+Running one of the following versions of JetPack:
+JetPack 6 (L4T r36.x)
+Sufficient storage space (preferably with NVMe SSD).
+22GB
for nano_llm
container image>5GB
)Clone and setup jetson-containers
:
git clone https://github.com/dusty-nv/jetson-containers
+bash jetson-containers/install.sh
+
jetson-containers run --env HUGGINGFACE_TOKEN=hf_xyz123abc456 \
+ $(autotag nano_llm) \
+ python3 -m nano_llm.studio
+
This will start the server running on your device. You can then navigate your browser to https://IP_ADDRESS:8050
+
|
To make code changes without rebuilding the container, clone the NanoLLM sources and then mount them over /opt/NanoLLM
git clone https://github.com/dusty-nv/NanoLLM
+jetson-containers run \
+ -v ${PWD}/NanoLLM:/opt/NanoLLM \
+ $(autotag nano_llm)
+
You can then edit the source from outside the container. And in the terminal that starts, you can install other packages from apt/pip/ect.
+The models, I/O, and logic blocks share a lightweight plugin interface, exposing a process()
function and any runtime properties:
from nano_llm import Plugin
+
+class TextFilter(Plugin):
+ def __init__(self, search: str='bad word', replace: str='censored', **kwargs):
+ """
+ A simple plugin performing text replacement. Write docs because they're extracted for the UI,
+ and for the tool descriptions that go to the bot for function calling (and use the type hints)
+
+ Args:
+ search: The string to look for an replace in incoming text.
+ replace: What to replace occurances of the search string with.
+ """
+ super().__init__(inputs=['text'], outputs=['text'])
+
+ # set these attributes and register them as configurable
+ self.add_parameters(search=search, replace=replace)
+
+ def process(self, input, **kwargs):
+ """ Substitute the text, and the return value will be sent to the plugins connected to this. """
+ return input.replace(self.search, self.replace)
+
Typically each plugin runs asynchronously its own thread and queue of data, although some lightweight routines are executed inline. Usually they're in the same process, but could make requests to microservices. Websockets are used for clients and the web UI.
+Below are descriptions of commonly-used components. Help text for these is extracted from their Python docs and shown in the UI, along with settings that you can control. In the node editor, click on a plugin to open its grid widget or configuration window.
+nano_llm.plugins.NanoLLM
(source)
Load quantized LLM/VLM with MLC (speed), AWQ (quality), or HF Transformers (compatability)
+
+ - Inputs
+ * str, list[str], np.ndarray, torch.Tensor, cudaImage, PIL.Image
+
+ - Commands
+ * /reset /refresh /generate (these inputs can control the chat)
+
+ - Outputs
+ * delta The latest detokenized text to be generated
+ * partial All of the text generated in the response so far
+ * final The complete response sent at the end of the request
+ * words Similar to 'delta', but with at least one word at a time
+ * history A list of the messages in the chat history (as dictionaries)
+ * tools Enable the bot to call functions in plugins connected on this channel
+
nano_llm.plugins.AutoPrompt
(source)
Apply a template each time new data comes in, for example '<image> Describe the image'
+ would tag each incoming image along with that text prompt (with images, usually for VLM)
+
+ AutoPrompt can be used to compose structured messages that reference prior inputs,
+ like `Image 1: <image> Image 2: <image>` or using <text> for last text recieved:
+
+ The three most recent replies were:
+ * <text>
+ * <text>
+ * <text>
+
+These most recent inputs are used in newest to oldest order from a LIFO queue.
+
nano_llm.plugins.UserPrompt
(source)
This plugin is for inputting text on the keyboard, either from the terminal or text box in the UI.
+
+ It can also load prompts from a text or json file, and those files can reference other files to load.
+
nano_llm.plugins.TextStream
(source)
Simple plugin for viewing any stream of text from the system in a text box in the UI.
+
+ For sources emanating from ASR or LLM, can apply color highlighting to partial/final responses.
+
nano_llm.plugins.WhisperASR
(source)
Whisper streaming speech-to-text with TensorRT (github.com/NVIDIA-AI-IOT/whisper_trt)
+
+ Supports these Whisper models: 'tiny' (39M), 'base' (74M), 'small' (244M)
+
+ - Inputs
+ * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)
+
+ - Outputs
+ * final The text transcript of the entire segment spoken in the input
+ * partial Rolling partial transcript of the words spoken so far in the audio segment
+
The Whisper plugin is designed to be used in a pipeline with the VAD Filter to reduce erroneous output from background noise:
+ +nano_llm.plugins.PiperTTS
(source)
Piper text-to-speech using CUDA and onnxruntime (github.com/rhasspy/piper)
+
+ Populates a list of the available Piper models that it can download, along with the speaker voices
+ for multi-speaker models. The default model is 'en_US-libritts-high'. Has controls for voice speed.
+
+ - Inputs
+ * str either individual words, sentences, or an entire passage to synthesize
+
+ - Outputs
+ * audio np.ndarray (int16) at 16KHz sample rate (for low-quality models) or 22.5KHz (for medium/high)
+
This PiperTTS preset is available, which uses RateLimit to stream the output at 1x realtime factor (so it can be muted later)
+ +nano_llm.plugins.VADFilter
(source)
Voice activity detection model using Silero. Drops incoming audio unless it exceeds the VAD threshold.
+
+ Use this on incoming audio before ASR plugins to reduce spurious transcripts produced from background noise.
+
+ - Inputs
+ * audio bytes, list[int], np.ndarray, torch.Tensor (int16, float32)
+
+ - Outputs
+ * audio np.ndarray or torch.Tensor (only when VAD confidence > threshold)
+
Riva ASR / TTS services - these run in another container over GRPC. See these resources for setup:
+ +AudioInputDevice | USB/I2S soundcard input via PulseAudio/PyAudio (for microphone directly attached to system) |
AudioOutputDevice | USB/I2S soundcard output via PulseAudio/PyAudio (for speakers directly attached to system) |
AudioRecorder | Save an audio stream to WAV file on the server |
WebAudioIn | Recieve audio samples streamed from the client over websockets |
WebAudioOut | Transmit audio samples to the client over websockets |
VideoSource | Capture images from camera device (V4L2/CSI), network stream (RTP, RTSP), video file (MP4, MKV, AVI, FLV) |
VideoOutput | Output H264/H265-encoded video to network stream (RTP, RTSP, WebRTC), display, or file (MP4, MKV, AVI, FLV) |
VideoOverlay | Draw text on top of video streams for HUD or OSD-style display |
RateLimit | Throttle transmission to a specified rate (can also be used with audio) |
NanoDB | Optimized in-memory multimodal vectorDB for txt2img/img2img similarity search and image tagging [[↗]](tutorial_nanodb.md) |
Deduplicate | Filter incoming text against previous entries with embedding model and cosine similarity over a timeout period |
EventFilter | Look for keywords or search strings in streams of text, keeping track of the begin/end times that these occur. |
Many of the previous demos (like Llamaspeak and Live Llava) can quickly be recreated in the dynamic environment of Agent Studio without needing to touch any code. Here are some practical tips as you go about trying different combinations of model pipelines:
+jetson-containers/data/nano_llm/presets
--load
flag at startup.Clear Cache
button in the Agent menu. RateLimiter
plugin to throttle data sources and balance resources.drop_inputs
setting that some plugins expose will always keep them up with the latest request.${DATE}
and ${TIME}
${HOMEASSISTANT.DEVICES}
Image 1: <image> Image 2: <image>
or:The three most recent replies were:
+ * <text>
+ * <text>
+ * <text>
+
🤖 Have fun bot building! If you need help, reach out on the Jetson Forums or GitHub Issues.
+ + + + + + + + + + + + + + + +