Skip to content

Commit

Permalink
NanoLLM updates
Browse files Browse the repository at this point in the history
  • Loading branch information
dusty-nv committed Apr 18, 2024
1 parent 4758f5a commit 779fedd
Show file tree
Hide file tree
Showing 6 changed files with 50 additions and 6 deletions.
Binary file added docs/images/nano_llm_docs.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/nano_llm_docs_chat.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/tutorial_api-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,9 @@ The [`huggingface-benchmark.py`](https://github.com/dusty-nv/jetson-containers/b

## NanoLLM

The [`NanoLLM`](https://dusty-nv.github.io/) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
The [`NanoLLM`](https://dusty-nv.github.io/NanoLLM) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:

<a href="benchmarks.html"><img width="600px" src="overrides/images/graph_llm-text-generation.svg"/></a>
<a href="benchmarks.html"><iframe width="600" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vTJ9lFqOIZSfrdnS_0sa2WahzLbpbAbBCTlS049jpOchMCum1hIk-wE_lcNAmLkrZd0OQrI9IkKBfGp/pubchart?oid=2126319913&amp;format=interactive"></iframe></a>

```python
from nano_llm import NanoLLM, ChatHistory, ChatTemplates
Expand Down
43 changes: 43 additions & 0 deletions docs/tutorial_nano-llm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# NanoLLM - Optimized LLM Inference

[`NanoLLM`](https://dusty-nv.github.io/NanoLLM) is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It's used to build many of the responsive, low-latency agents featured on this site.

<a href="https://dusty-nv.github.io/NanoLLM" target="_blank"><img src="./images/nano_llm_docs.jpg" style="max-width: 50%; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.4);"></img></a>

It provides <a href="tutorial_api-examples.html#nanollm" target="_blank">similar APIs</a> to HuggingFace, backed by highly-optimized inference libraries and quantization tools:

```python
from nano_llm import NanoLLM

model = NanoLLM.from_pretrained(
"meta-llama/Llama-2-7b-hf", # HuggingFace repo/model name, or path to HF model checkpoint
api='mlc', # supported APIs are: mlc, awq, hf
api_token='hf_abc123def', # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)
quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights
)

response = model.generate("Once upon a time,", max_new_tokens=128)

for token in response:
print(token, end='', flush=True)
```

## Resources

Here's an index of the various tutorials & examples using NanoLLM on Jetson AI Lab:

| | |
| :---------- | :----------------------------------- |
| **[Benchmarks](./benchmarks.md){:target="_blank"}** | Benchmarking results for LLM, SLM, VLM using MLC/TVM backend |
| **[API Examples](./tutorial_api-examples.md#nanollm){:target="_blank"}** | Python code examples for completion and multi-turn chat |
| **[Llamaspeak](./tutorial_llamaspeak.md){:target="_blank"}** | Talk verbally with LLMs using low-latency ASR/TTS speech models |
| **[Small LLM (SLM)](./tutorial_slm.md){:target="_blank"}** | Focus on language models with reduced footprint (7B params and below) |
| **[Live LLaVA](./tutorial_live-llava.md){:target="_blank"}** | Realtime live-streaming vision/language models on recurring prompts |
| **[Nano VLM](./tutorial_nano-vlm.md){:target="_blank"}** | Efficient multimodal pipeline with one-shot RAG support |


<div><iframe width="500" height="280" src="https://www.youtube.com/embed/UOjqF3YCGkY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<iframe width="500" height="280" src="https://www.youtube.com/embed/8Eu6zG0eEGY" style="display: inline-block;" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div>


6 changes: 3 additions & 3 deletions docs/tutorial_nano-vlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ The optimized [`NanoLLM`](https://dusty-nv.github.io/NanoLLM) library uses MLC/T

``` bash
jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm --api=mlc \
python3 -m nano_llm.chat --api=mlc \
--model liuhaotian/llava-v1.6-vicuna-7b \
--max-context-len 768 \
--max-new-tokens 128
Expand All @@ -63,7 +63,7 @@ During testing, you can specify prompts on the command-line that will run sequen

```
jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm --api=mlc \
python3 -m nano_llm.chat --api=mlc \
--model liuhaotian/llava-v1.6-vicuna-7b \
--max-context-len 768 \
--max-new-tokens 128 \
Expand Down Expand Up @@ -91,7 +91,7 @@ When prompted, these models can also output in constrained JSON formats (which t

```
jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm --api=mlc \
python3 -m nano_llm.chat --api=mlc \
--model liuhaotian/llava-v1.5-13b \
--prompt '/data/images/hoover.jpg' \
--prompt 'extract any text from the image as json'
Expand Down
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
site_name: NVIDIA Jetson Generative AI Lab
site_name: NVIDIA Jetson AI Lab
site_url:
site_description: Showcasing generative AI projects that run on Jetson
copyright: <ul class="global-footer__links"><li><a href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/" target="_blank">Privacy Policy</a></li><li> <a href="https://www.nvidia.com/en-us/privacy-center/" target="_blank">Manage My Privacy</a> </li> <li> <a href="https://www.nvidia.com/en-us/preferences/email-preferences/" target="_blank">Do Not Sell or Share My Data</a> </li> <li> <a href="https://www.nvidia.com/en-us/about-nvidia/legal-info/" target="_blank">Legal</a> </li> <li> <a href="https://www.nvidia.com/en-us/about-nvidia/accessibility/" target="_blank">Accessibility</a> </li> <li> <a href="https://www.nvidia.com/en-us/about-nvidia/company-policies/" target="_self">Corporate Policies</a> </li> <li> <a href="https://www.nvidia.com/en-us/product-security/" target="_blank">Product Security</a> </li> <li> <a href="https://www.nvidia.com/en-us/contact/" target="_blank">Contact</a> </li> </ul><div class="global-footer__copyright">Copyright &copy; 2024 NVIDIA Corporation</div>
Expand Down Expand Up @@ -82,6 +82,7 @@ nav:
- Text (LLM):
- text-generation-webui: tutorial_text-generation.md
- llamaspeak: tutorial_llamaspeak.md
- NanoLLM: tutorial_nano-llm.md
- Small LLM (SLM): tutorial_slm.md
- API Examples: tutorial_api-examples.md
- Text + Vision (VLM):
Expand Down

0 comments on commit 779fedd

Please sign in to comment.