Update colab links

neuralmagic · Mar 7, 2024 · 3ae527f · 3ae527f
1 parent 8d617e5
commit 3ae527f
Show file tree

Hide file tree

Showing 3 changed files with 3 additions and 3 deletions.
diff --git a/examples-neuralmagic/deploy_compressed_huggingface_models/README.md b/examples-neuralmagic/deploy_compressed_huggingface_models/README.md
@@ -1,6 +1,6 @@
 # Deploy Compressed LLMs from Hugging Face with nm-vllm
 
-[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuralmagic/nm-vllm/blob/main/examples-neuralmagic/deploy_compressed_huggingface_models/Deploy_Compressed_LLMs_from_Hugging_Face_with_nm_vllm.ipynb)
+[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://bit.ly/4a3K5Iw)
 
 
 This notebook walks through how to deploy compressed models with nm-vllm's latest memory and performance optimizations.

diff --git a/examples-neuralmagic/marlin_quantization_and_deploy/README.md b/examples-neuralmagic/marlin_quantization_and_deploy/README.md
@@ -1,6 +1,6 @@
 # Performantly Quantize LLMs to 4-bits with Marlin and nm-vllm
 
-[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuralmagic/nm-vllm/blob/main/examples-neuralmagic/marlin_quantization_and_deploy/Performantly_Quantize_LLMs_to_4_bits_with_Marlin_and_nm_vllm.ipynb)
+[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://bit.ly/3uY6NTx)
 
 This notebook walks through how to compress a pretrained LLM and deploy it with `nm-vllm`. To create a new 4-bit quantized model, we can leverage AutoGPTQ. Quantizing reduces the model's precision from FP16 to INT4 which effectively reduces the file size by ~70%. The main benefits are lower latency and memory usage.
 

diff --git a/examples-neuralmagic/sparsegpt_compress_and_deploy/README.md b/examples-neuralmagic/sparsegpt_compress_and_deploy/README.md
@@ -1,6 +1,6 @@
 # Apply SparseGPT to LLMs and deploy with nm-vllm
 
-[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuralmagic/nm-vllm/blob/main/examples-neuralmagic/sparsegpt_compress_and_deploy/Apply_SparseGPT_to_LLMs_and_deploy_with_nm_vllm.ipynb)
+[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://bit.ly/4c5jT1S)
 
 
 This notebook walks through how to sparsify a pretrained LLM. To create a pruned model, you can leverage SparseGPT. Quantizing reduces the model's precision from FP16 to INT4 which effectively reduces the file size by ~70%. The main benefits are lower latency and memory usage.