diff --git a/docs/images/nano_llm_docs.jpg b/docs/images/nano_llm_docs.jpg
new file mode 100644
index 00000000..d024381b
Binary files /dev/null and b/docs/images/nano_llm_docs.jpg differ
diff --git a/docs/images/nano_llm_docs_chat.jpg b/docs/images/nano_llm_docs_chat.jpg
new file mode 100644
index 00000000..634e3618
Binary files /dev/null and b/docs/images/nano_llm_docs_chat.jpg differ
diff --git a/docs/tutorial_api-examples.md b/docs/tutorial_api-examples.md
index 1ae89634..eff789f1 100644
--- a/docs/tutorial_api-examples.md
+++ b/docs/tutorial_api-examples.md
@@ -76,9 +76,9 @@ The [`huggingface-benchmark.py`](https://github.com/dusty-nv/jetson-containers/b
## NanoLLM
-The [`NanoLLM`](https://dusty-nv.github.io/) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
+The [`NanoLLM`](https://dusty-nv.github.io/NanoLLM) library uses the optimized MLC/TVM library for inference, like on the [Benchmarks](benchmarks.md) page:
-
+
```python
from nano_llm import NanoLLM, ChatHistory, ChatTemplates
diff --git a/docs/tutorial_nano-llm.md b/docs/tutorial_nano-llm.md
new file mode 100644
index 00000000..9fffa5ea
--- /dev/null
+++ b/docs/tutorial_nano-llm.md
@@ -0,0 +1,43 @@
+# NanoLLM - Optimized LLM Inference
+
+[`NanoLLM`](https://dusty-nv.github.io/NanoLLM) is a lightweight, high-performance library using optimized inferencing APIs for quantized LLM’s, multimodality, speech services, vector databases with RAG, and web frontends. It's used to build many of the responsive, low-latency agents featured on this site.
+
+
+
+It provides similar APIs to HuggingFace, backed by highly-optimized inference libraries and quantization tools:
+
+```python
+from nano_llm import NanoLLM
+
+model = NanoLLM.from_pretrained(
+ "meta-llama/Llama-2-7b-hf", # HuggingFace repo/model name, or path to HF model checkpoint
+ api='mlc', # supported APIs are: mlc, awq, hf
+ api_token='hf_abc123def', # HuggingFace API key for authenticated models ($HUGGINGFACE_TOKEN)
+ quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights
+)
+
+response = model.generate("Once upon a time,", max_new_tokens=128)
+
+for token in response:
+ print(token, end='', flush=True)
+```
+
+## Resources
+
+Here's an index of the various tutorials & examples using NanoLLM on Jetson AI Lab:
+
+| | |
+| :---------- | :----------------------------------- |
+| **[Benchmarks](./benchmarks.md){:target="_blank"}** | Benchmarking results for LLM, SLM, VLM using MLC/TVM backend |
+| **[API Examples](./tutorial_api-examples.md#nanollm){:target="_blank"}** | Python code examples for completion and multi-turn chat |
+| **[Llamaspeak](./tutorial_llamaspeak.md){:target="_blank"}** | Talk verbally with LLMs using low-latency ASR/TTS speech models |
+| **[Small LLM (SLM)](./tutorial_slm.md){:target="_blank"}** | Focus on language models with reduced footprint (7B params and below) |
+| **[Live LLaVA](./tutorial_live-llava.md){:target="_blank"}** | Realtime live-streaming vision/language models on recurring prompts |
+| **[Nano VLM](./tutorial_nano-vlm.md){:target="_blank"}** | Efficient multimodal pipeline with one-shot RAG support |
+
+
+