NanoLLM updates

NVIDIA-AI-IOT · Apr 18, 2024 · 779fedd · 779fedd
1 parent 4758f5a
commit 779fedd
Show file tree

Hide file tree

Showing 6 changed files with 50 additions and 6 deletions.
diff --git a/docs/images/nano_llm_docs.jpg b/docs/images/nano_llm_docs.jpg
diff --git a/docs/images/nano_llm_docs_chat.jpg b/docs/images/nano_llm_docs_chat.jpg
diff --git a/docs/tutorial_api-examples.md b/docs/tutorial_api-examples.md
@@ -76,9 +76,9 @@ The [`huggingface-benchmark.py`](https://github.com/dusty-nv/jetson-containers/b
 
 ## NanoLLM
 
-The [`NanoLLM`](https://dusty-nv.github.io/) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
+The [`NanoLLM`](https://dusty-nv.github.io/NanoLLM) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
 
-<a href="benchmarks.html"><img width="600px" src="overrides/images/graph_llm-text-generation.svg"/></a>
+<a href="benchmarks.html"><iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vTJ9lFqOIZSfrdnS_0sa2WahzLbpbAbBCTlS049jpOchMCum1hIk-wE_lcNAmLkrZd0OQrI9IkKBfGp/pubchart?oid=2126319913&amp;format=interactive"></iframe></a>
 
 ```python
 from nano_llm import NanoLLM, ChatHistory, ChatTemplates

diff --git a/docs/tutorial_nano-llm.md b/docs/tutorial_nano-llm.md
@@ -0,0 +1,43 @@
+# NanoLLM - Optimized LLM Inference
+
+[`NanoLLM`](https://dusty-nv.github.io/NanoLLM) is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It's used to build many of the responsive, low-latency agents featured on this site.
+
+<a href="https://dusty-nv.github.io/NanoLLM" target="_blank"><img src="./images/nano_llm_docs.jpg" style="max-width: 50%; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.4);"></img></a>
+
+It provides <a href="tutorial_api-examples.html#nanollm" target="_blank">similar APIs</a> to HuggingFace, backed by highly-optimized inference libraries and quantization tools:
+
+```python
+from nano_llm import NanoLLM
+
+model = NanoLLM.from_pretrained(
+   "meta-llama/Llama-2-7b-hf",  # HuggingFace repo/model name, or path to HF model checkpoint
+   api='mlc',                   # supported APIs are: mlc, awq, hf
+   api_token='hf_abc123def',    # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)
+   quantization='q4f16_ft'      # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights
+)
+
+response = model.generate("Once upon a time,", max_new_tokens=128)
+
+for token in response:
+   print(token, end='', flush=True)
+```
+
+## Resources
+
+Here's an index of the various tutorials & examples using NanoLLM on Jetson AI Lab:
+
+|      |                     |
+| :---------- | :----------------------------------- |
+| **[Benchmarks](./benchmarks.md){:target="_blank"}** | Benchmarking results for LLM, SLM, VLM using MLC/TVM backend |
+| **[API Examples](./tutorial_api-examples.md#nanollm){:target="_blank"}** | Python code examples for completion and multi-turn chat |
+| **[Llamaspeak](./tutorial_llamaspeak.md){:target="_blank"}** | Talk verbally with LLMs using low-latency ASR/TTS speech models |
+| **[Small LLM (SLM)](./tutorial_slm.md){:target="_blank"}** | Focus on language models with reduced footprint (7B params and below) |
+| **[Live LLaVA](./tutorial_live-llava.md){:target="_blank"}** | Realtime live-streaming vision/language models on recurring prompts |
+| **[Nano VLM](./tutorial_nano-vlm.md){:target="_blank"}** | Efficient multimodal pipeline with one-shot RAG support |
+
+
+<div><iframe width="500" height="280" src="https://www.youtube.com/embed/UOjqF3YCGkY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+</div>
+
+
diff --git a/docs/tutorial_nano-vlm.md b/docs/tutorial_nano-vlm.md
@@ -47,7 +47,7 @@ The optimized [`NanoLLM`](https://dusty-nv.github.io/NanoLLM) library uses MLC/T
 
 ``` bash
 jetson-containers run $(autotag nano_llm) \
-  python3 -m nano_llm --api=mlc \
+  python3 -m nano_llm.chat --api=mlc \
     --model liuhaotian/llava-v1.6-vicuna-7b \
     --max-context-len 768 \
     --max-new-tokens 128
@@ -63,7 +63,7 @@ During testing, you can specify prompts on the command-line that will run sequen
 
 ```
 jetson-containers run $(autotag nano_llm) \
-  python3 -m nano_llm --api=mlc \
+  python3 -m nano_llm.chat --api=mlc \
     --model liuhaotian/llava-v1.6-vicuna-7b \
     --max-context-len 768 \
     --max-new-tokens 128 \
@@ -91,7 +91,7 @@ When prompted, these models can also output in constrained JSON formats (which t
 
 ```
 jetson-containers run $(autotag nano_llm) \
-  python3 -m nano_llm --api=mlc \
+  python3 -m nano_llm.chat --api=mlc \
     --model liuhaotian/llava-v1.5-13b \
     --prompt '/data/images/hoover.jpg' \
     --prompt 'extract any text from the image as json'

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,4 +1,4 @@
-site_name: NVIDIA Jetson Generative AI Lab
+site_name: NVIDIA Jetson AI Lab
 site_url: 
 site_description: Showcasing generative AI projects that run on Jetson
 copyright: <ul class="global-footer__links"><li><a href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/" target="_blank">Privacy Policy</a></li><li> <a href="https://www.nvidia.com/en-us/privacy-center/" target="_blank">Manage My Privacy</a> </li> <li> <a href="https://www.nvidia.com/en-us/preferences/email-preferences/" target="_blank">Do Not Sell or Share My Data</a> </li> <li> <a href="https://www.nvidia.com/en-us/about-nvidia/legal-info/" target="_blank">Legal</a> </li> <li> <a href="https://www.nvidia.com/en-us/about-nvidia/accessibility/" target="_blank">Accessibility</a> </li> <li> <a href="https://www.nvidia.com/en-us/about-nvidia/company-policies/" target="_self">Corporate Policies</a> </li> <li> <a href="https://www.nvidia.com/en-us/product-security/" target="_blank">Product Security</a> </li> <li> <a href="https://www.nvidia.com/en-us/contact/" target="_blank">Contact</a> </li> </ul><div class="global-footer__copyright">Copyright &copy; 2024 NVIDIA Corporation</div>
@@ -82,6 +82,7 @@ nav:
     - Text (LLM):
       - text-generation-webui: tutorial_text-generation.md 
       - llamaspeak: tutorial_llamaspeak.md
+      - NanoLLM: tutorial_nano-llm.md
       - Small LLM (SLM): tutorial_slm.md
       - API Examples: tutorial_api-examples.md
     - Text + Vision (VLM):