diff --git a/docs/blog/posts/amd-on-runpod.md b/docs/blog/posts/amd-on-runpod.md
index f96aaf12c..87ba0518b 100644
--- a/docs/blog/posts/amd-on-runpod.md
+++ b/docs/blog/posts/amd-on-runpod.md
@@ -50,7 +50,7 @@ you can now specify an AMD GPU under `resources`. Below are a few examples.
     
     image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
       - TRUST_REMOTE_CODE=true
       - ROCM_USE_FLASH_ATTN_V2_TRITON=true
@@ -83,7 +83,7 @@ you can now specify an AMD GPU under `resources`. Below are a few examples.
     
     image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - ROCM_USE_FLASH_ATTN_V2_TRITON=true
     ide: vscode
     
diff --git a/docs/blog/posts/tpu-on-gcp.md b/docs/blog/posts/tpu-on-gcp.md
index 7973f2494..2bfe35b91 100644
--- a/docs/blog/posts/tpu-on-gcp.md
+++ b/docs/blog/posts/tpu-on-gcp.md
@@ -58,7 +58,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
     
     image: dstackai/optimum-tpu:llama31
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
       - MAX_TOTAL_TOKENS=4096
       - MAX_BATCH_PREFILL_TOKENS=4095
@@ -89,7 +89,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
 
     env:
       - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - DATE=20240828
       - TORCH_VERSION=2.5.0
       - VLLM_TARGET_DEVICE=tpu
@@ -167,7 +167,7 @@ name: optimum-tpu-llama-train
 python: "3.11"
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 commands:
   - git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git
   - mkdir -p optimum-tpu/examples/custom/
diff --git a/docs/blog/posts/volumes-on-runpod.md b/docs/blog/posts/volumes-on-runpod.md
index 116a121bb..58b02d4a9 100644
--- a/docs/blog/posts/volumes-on-runpod.md
+++ b/docs/blog/posts/volumes-on-runpod.md
@@ -33,7 +33,7 @@ scaling:
   
 image: ghcr.io/huggingface/text-generation-inference:latest
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - MAX_INPUT_LENGTH=4000
   - MAX_TOTAL_TOKENS=4096
@@ -110,7 +110,7 @@ volumes:
   
 image: ghcr.io/huggingface/text-generation-inference:latest
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - MAX_INPUT_LENGTH=4000
   - MAX_TOTAL_TOKENS=4096
diff --git a/docs/docs/guides/protips.md b/docs/docs/guides/protips.md
index e433e8ab2..0749be141 100644
--- a/docs/docs/guides/protips.md
+++ b/docs/docs/guides/protips.md
@@ -181,7 +181,7 @@ name: vscode
 python: "3.10"
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 ide: vscode
 ```
 
@@ -190,20 +190,20 @@ ide: vscode
 Then, you can pass the environment variable either via the shell:
 
 ```shell
-HUGGING_FACE_HUB_TOKEN=... dstack apply -f .dstack.yml
+HF_TOKEN=... dstack apply -f .dstack.yml
 ```
 
 Or via the `-e` option of the `dstack apply` command:
 
 ```shell
-dstack apply -f .dstack.yml -e HUGGING_FACE_HUB_TOKEN=...
+dstack apply -f .dstack.yml -e HF_TOKEN=...
 ```
 
 ??? info ".env"
     A better way to configure environment variables not hardcoded in YAML is by specifying them in a `.env` file:
 
     ```
-    HUGGING_FACE_HUB_TOKEN=...
+    HF_TOKEN=...
     ```
     
     If you install [`direnv` :material-arrow-top-right-thin:{ .external }](https://direnv.net/){:target="_blank"},
diff --git a/docs/docs/reference/dstack.yml/dev-environment.md b/docs/docs/reference/dstack.yml/dev-environment.md
index 3e1bd0b5b..ba19fe966 100644
--- a/docs/docs/reference/dstack.yml/dev-environment.md
+++ b/docs/docs/reference/dstack.yml/dev-environment.md
@@ -151,7 +151,7 @@ name: vscode
 
 # Environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - HF_HUB_ENABLE_HF_TRANSFER=1
 
 ide: vscode
@@ -159,7 +159,7 @@ ide: vscode
 
 </div>
 
-> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), 
+> If you don't assign a value to an environment variable (see `HF_TOKEN` above), 
 `dstack` will require the value to be passed via the CLI or set in the current process.
 
 For instance, you can define environment variables in a `.envrc` file and utilize tools like `direnv`.
diff --git a/docs/docs/reference/dstack.yml/service.md b/docs/docs/reference/dstack.yml/service.md
index 5d638ec41..4aa768583 100644
--- a/docs/docs/reference/dstack.yml/service.md
+++ b/docs/docs/reference/dstack.yml/service.md
@@ -312,7 +312,7 @@ python: "3.10"
 
 # Environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL=NousResearch/Llama-2-7b-chat-hf
 # Commands of the service
 commands:
@@ -328,7 +328,7 @@ resources:
 
 </div>
 
-If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above),
+If you don't assign a value to an environment variable (see `HF_TOKEN` above),
 `dstack` will require the value to be passed via the CLI or set in the current process.
 
 For instance, you can define environment variables in a `.envrc` file and utilize tools like `direnv`.
diff --git a/docs/docs/reference/dstack.yml/task.md b/docs/docs/reference/dstack.yml/task.md
index e2e052968..4e069d68f 100644
--- a/docs/docs/reference/dstack.yml/task.md
+++ b/docs/docs/reference/dstack.yml/task.md
@@ -201,7 +201,7 @@ python: "3.10"
 
 # Environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - HF_HUB_ENABLE_HF_TRANSFER=1
 
 # Commands of the task
@@ -212,7 +212,7 @@ commands:
 
 </div>
 
-> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), 
+> If you don't assign a value to an environment variable (see `HF_TOKEN` above), 
 `dstack` will require the value to be passed via the CLI or set in the current process.
 
 For instance, you can define environment variables in a `.envrc` file and utilize tools like `direnv`.
diff --git a/docs/docs/services.md b/docs/docs/services.md
index c916580d1..bdb84aa38 100644
--- a/docs/docs/services.md
+++ b/docs/docs/services.md
@@ -30,7 +30,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 commands:
   - pip install vllm
   - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
@@ -72,7 +72,7 @@ To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-a
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 
 $ dstack apply -f service.dstack.yml
 
diff --git a/docs/docs/tasks.md b/docs/docs/tasks.md
index da4f9df1e..684387d83 100644
--- a/docs/docs/tasks.md
+++ b/docs/docs/tasks.md
@@ -25,7 +25,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 # Commands of the task
 commands:
@@ -60,7 +60,7 @@ To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-a
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 $ WANDB_API_KEY=...
 
 $ dstack apply -f examples/.dstack.yml
diff --git a/examples/.dstack.yml b/examples/.dstack.yml
index 568c14139..0a1d4480c 100644
--- a/examples/.dstack.yml
+++ b/examples/.dstack.yml
@@ -11,8 +11,6 @@ ide: vscode
 # Use either spot or on-demand instances
 spot_policy: auto
 
+# Required resources
 resources:
-   memory: 16MB..
-   shm_size: 8MB
-#  gpu: A10
-#  disk: 100GB..
+  gpu: 24GB
diff --git a/examples/accelerators/amd/README.md b/examples/accelerators/amd/README.md
index 874852e72..482c38828 100644
--- a/examples/accelerators/amd/README.md
+++ b/examples/accelerators/amd/README.md
@@ -21,7 +21,7 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
 
     # Required environment variables
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
       - TRUST_REMOTE_CODE=true
       - ROCM_USE_FLASH_ATTN_V2_TRITON=true
@@ -61,7 +61,7 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h
     
     # Required environment variables
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
       - MAX_MODEL_LEN=126192
     # Commands of the task
@@ -135,7 +135,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
 
     # Required environment variables
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
     # Commands of the task
     commands:
       - export PATH=/opt/conda/envs/py_3.10/bin:$PATH
@@ -177,7 +177,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
     image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
     # Required environment variables
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
     # Commands of the task
     commands:
       - export PATH=/opt/conda/envs/py_3.10/bin:$PATH
@@ -224,7 +224,7 @@ cloud resources and run the configuration.
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 $ dstack apply -f examples/deployment/vllm/amd/service.dstack.yml
 ```
 
diff --git a/examples/accelerators/tpu/README.md b/examples/accelerators/tpu/README.md
index 471481cae..77ce85685 100644
--- a/examples/accelerators/tpu/README.md
+++ b/examples/accelerators/tpu/README.md
@@ -25,7 +25,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
     
     image: dstackai/optimum-tpu:llama31
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
       - MAX_TOTAL_TOKENS=4096
       - MAX_BATCH_PREFILL_TOKENS=4095
@@ -61,7 +61,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
 
     env:
       - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - DATE=20240828
       - TORCH_VERSION=2.5.0
       - VLLM_TARGET_DEVICE=tpu
@@ -135,7 +135,7 @@ name: optimum-tpu-llama-train
 python: "3.11"
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 commands:
   - git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git
   - mkdir -p optimum-tpu/examples/custom/
diff --git a/examples/deployment/lorax/serve-task.dstack.yml b/examples/deployment/lorax/serve-task.dstack.yml
index 36ac0e950..13adea218 100644
--- a/examples/deployment/lorax/serve-task.dstack.yml
+++ b/examples/deployment/lorax/serve-task.dstack.yml
@@ -3,7 +3,7 @@ type: task
 image: ghcr.io/predibase/lorax:latest
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
 
 commands:
diff --git a/examples/deployment/lorax/serve.dstack.yml b/examples/deployment/lorax/serve.dstack.yml
index a48513cfd..4513c0640 100644
--- a/examples/deployment/lorax/serve.dstack.yml
+++ b/examples/deployment/lorax/serve.dstack.yml
@@ -3,7 +3,7 @@ type: service
 image: ghcr.io/predibase/lorax:latest
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1
 
 commands:
diff --git a/examples/deployment/optimum-tpu/.dstack.yml b/examples/deployment/optimum-tpu/.dstack.yml
index f34d3e9bb..3f2a187fc 100644
--- a/examples/deployment/optimum-tpu/.dstack.yml
+++ b/examples/deployment/optimum-tpu/.dstack.yml
@@ -7,7 +7,7 @@ name: vscode-optimum-tpu
 image: dstackai/optimum-tpu:llama31
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 ide: vscode
 
 resources:
diff --git a/examples/deployment/optimum-tpu/service.dstack.yml b/examples/deployment/optimum-tpu/service.dstack.yml
index 1b9ad8db3..663257f60 100644
--- a/examples/deployment/optimum-tpu/service.dstack.yml
+++ b/examples/deployment/optimum-tpu/service.dstack.yml
@@ -7,7 +7,7 @@ name: llama31-service-optimum-tpu
 image: dstackai/optimum-tpu:llama31
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - MAX_TOTAL_TOKENS=4096
   - MAX_BATCH_PREFILL_TOKENS=4095
diff --git a/examples/deployment/optimum-tpu/task.dstack.yml b/examples/deployment/optimum-tpu/task.dstack.yml
index 8a581e14b..e183c1b94 100644
--- a/examples/deployment/optimum-tpu/task.dstack.yml
+++ b/examples/deployment/optimum-tpu/task.dstack.yml
@@ -7,7 +7,7 @@ name: llama31-task-optimum-tpu
 image: dstackai/optimum-tpu:llama31
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - MAX_TOTAL_TOKENS=4096
   - MAX_BATCH_PREFILL_TOKENS=4095
diff --git a/examples/deployment/tgi/amd/.dstack.yml b/examples/deployment/tgi/amd/.dstack.yml
index 345b135bb..2443c20a7 100644
--- a/examples/deployment/tgi/amd/.dstack.yml
+++ b/examples/deployment/tgi/amd/.dstack.yml
@@ -4,7 +4,7 @@ name: dev-tgi-amd
 image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ROCM_USE_FLASH_ATTN_V2_TRITON=true
 ide: vscode
 
diff --git a/examples/deployment/tgi/amd/service.dstack.yml b/examples/deployment/tgi/amd/service.dstack.yml
index 686e78a31..f3bedcd6c 100644
--- a/examples/deployment/tgi/amd/service.dstack.yml
+++ b/examples/deployment/tgi/amd/service.dstack.yml
@@ -3,7 +3,7 @@ name: service-tgi-amd
 
 image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ROCM_USE_FLASH_ATTN_V2_TRITON=true
   - TRUST_REMOTE_CODE=true
   - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
diff --git a/examples/deployment/tgi/serve-task.dstack.yml b/examples/deployment/tgi/serve-task.dstack.yml
index 5376e635b..d35b7b2d1 100644
--- a/examples/deployment/tgi/serve-task.dstack.yml
+++ b/examples/deployment/tgi/serve-task.dstack.yml
@@ -3,7 +3,7 @@ type: task
 
 image: ghcr.io/huggingface/text-generation-inference:latest
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.2
 commands:
   - text-generation-launcher --port 8000 --trust-remote-code
diff --git a/examples/deployment/tgi/serve.dstack.yml b/examples/deployment/tgi/serve.dstack.yml
index 81a27e4f0..c1af47a7a 100644
--- a/examples/deployment/tgi/serve.dstack.yml
+++ b/examples/deployment/tgi/serve.dstack.yml
@@ -3,7 +3,7 @@ type: service
 
 image: ghcr.io/huggingface/text-generation-inference:latest
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.2
 commands:
   - text-generation-launcher --port 8000 --trust-remote-code
diff --git a/examples/deployment/vllm/amd/.dstack.yml b/examples/deployment/vllm/amd/.dstack.yml
index 6aaed21a0..053bb390f 100644
--- a/examples/deployment/vllm/amd/.dstack.yml
+++ b/examples/deployment/vllm/amd/.dstack.yml
@@ -4,7 +4,7 @@ name: dev-vLLM-amd
 image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 
 ide: vscode
 
diff --git a/examples/deployment/vllm/amd/build.vllm-rocm.yaml b/examples/deployment/vllm/amd/build.vllm-rocm.yaml
index 00112df96..a4648ac49 100644
--- a/examples/deployment/vllm/amd/build.vllm-rocm.yaml
+++ b/examples/deployment/vllm/amd/build.vllm-rocm.yaml
@@ -4,7 +4,7 @@ name: build-vllm-rocm
 image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - AWS_ACCESS_KEY_ID
   - AWS_SECRET_ACCESS_KEY
   - AWS_REGION
diff --git a/examples/deployment/vllm/amd/service.dstack.yml b/examples/deployment/vllm/amd/service.dstack.yml
index e91858f28..aabe2daac 100644
--- a/examples/deployment/vllm/amd/service.dstack.yml
+++ b/examples/deployment/vllm/amd/service.dstack.yml
@@ -4,7 +4,7 @@ name: llama31-service-vllm-amd
 image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
   - MAX_MODEL_LEN=126192
 
diff --git a/examples/deployment/vllm/service-tpu.dstack.yml b/examples/deployment/vllm/service-tpu.dstack.yml
index 230a1c539..6c87082f6 100644
--- a/examples/deployment/vllm/service-tpu.dstack.yml
+++ b/examples/deployment/vllm/service-tpu.dstack.yml
@@ -3,7 +3,7 @@ type: service
 name: llama31-service-vllm-tpu
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - DATE=20240828
   - TORCH_VERSION=2.5.0
diff --git a/examples/fine-tuning/alignment-handbook/.dstack.yml b/examples/fine-tuning/alignment-handbook/.dstack.yml
index fc97d6b96..21d73d70a 100644
--- a/examples/fine-tuning/alignment-handbook/.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/.dstack.yml
@@ -7,7 +7,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 
diff --git a/examples/fine-tuning/alignment-handbook/README.md b/examples/fine-tuning/alignment-handbook/README.md
index 799fdc972..c0e0209a3 100644
--- a/examples/fine-tuning/alignment-handbook/README.md
+++ b/examples/fine-tuning/alignment-handbook/README.md
@@ -44,7 +44,7 @@ nvcc: true
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 # Commands of the task
@@ -79,7 +79,7 @@ To run the task, use `dstack apply`:
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 $ WANDB_API_KEY=...
 
 $ dstack apply -f examples/fine-tuning/alignment-handbook/train.dstack.yml
@@ -109,7 +109,7 @@ nodes: 2
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 # Commands of the task (dstack runs it on each node)
diff --git a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml
index b33902a5d..47eb56b50 100644
--- a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml
@@ -7,7 +7,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 # Commands of the task (dstack runs it on each node)
diff --git a/examples/fine-tuning/alignment-handbook/train.dstack.yml b/examples/fine-tuning/alignment-handbook/train.dstack.yml
index a52a3b08f..6aeedafdf 100644
--- a/examples/fine-tuning/alignment-handbook/train.dstack.yml
+++ b/examples/fine-tuning/alignment-handbook/train.dstack.yml
@@ -7,7 +7,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 # Commands of the task
diff --git a/examples/fine-tuning/axolotl/.dstack.yml b/examples/fine-tuning/axolotl/.dstack.yml
index 4b9096cfa..de7161ef1 100644
--- a/examples/fine-tuning/axolotl/.dstack.yml
+++ b/examples/fine-tuning/axolotl/.dstack.yml
@@ -7,7 +7,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 
 ide: vscode
diff --git a/examples/fine-tuning/axolotl/README.md b/examples/fine-tuning/axolotl/README.md
index 2946594ca..4265fecaa 100644
--- a/examples/fine-tuning/axolotl/README.md
+++ b/examples/fine-tuning/axolotl/README.md
@@ -41,7 +41,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 # Commands of the task
 commands:
@@ -73,7 +73,7 @@ cloud resources and run the configuration.
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 $ WANDB_API_KEY=...
 $ dstack apply -f examples/fine-tuning/axolotl/train.dstack.yml
 ```
@@ -116,7 +116,7 @@ If you'd like to play with the example using a dev environment, run
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 $ WANDB_API_KEY=...
 $ dstack apply -f examples/fine-tuning/axolotl/.dstack.yaml 
 ```
diff --git a/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml b/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml
index 1468bf8dc..c60e993a3 100644
--- a/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml
+++ b/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml
@@ -6,7 +6,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - GPU_ARCHS="gfx90a;gfx942"
   - AWS_ACCESS_KEY_ID
   - AWS_SECRET_ACCESS_KEY
diff --git a/examples/fine-tuning/axolotl/amd/build.xformers.yaml b/examples/fine-tuning/axolotl/amd/build.xformers.yaml
index a3733ec50..49cbc1e7a 100644
--- a/examples/fine-tuning/axolotl/amd/build.xformers.yaml
+++ b/examples/fine-tuning/axolotl/amd/build.xformers.yaml
@@ -6,7 +6,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - GPU_ARCHS="gfx90a;gfx942"
   - AWS_ACCESS_KEY_ID
   - AWS_SECRET_ACCESS_KEY
diff --git a/examples/fine-tuning/axolotl/amd/train.dstack.yaml b/examples/fine-tuning/axolotl/amd/train.dstack.yaml
index 5de02b353..80921fdb4 100644
--- a/examples/fine-tuning/axolotl/amd/train.dstack.yaml
+++ b/examples/fine-tuning/axolotl/amd/train.dstack.yaml
@@ -6,7 +6,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 # Commands of the task
 commands:
   - export PATH=/opt/conda/envs/py_3.10/bin:$PATH
diff --git a/examples/fine-tuning/axolotl/train.dstack.yaml b/examples/fine-tuning/axolotl/train.dstack.yaml
index 3dd8b8ddb..38d543110 100644
--- a/examples/fine-tuning/axolotl/train.dstack.yaml
+++ b/examples/fine-tuning/axolotl/train.dstack.yaml
@@ -7,7 +7,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 # Commands of the task
 commands:
diff --git a/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml b/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml
index 8dc522e0e..2577f2f9a 100644
--- a/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml
+++ b/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml
@@ -7,7 +7,7 @@ python: "3.11"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 
 # Refer to Note section in examples/gpus/tpu/README.md for more information about the optimum-tpu repository.
 # Uncomment if you want the environment to be pre-installed
diff --git a/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml b/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml
index 04fdfb744..4a4234177 100644
--- a/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml
+++ b/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml
@@ -6,7 +6,7 @@ python: "3.11"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 
 # Commands of the task
 commands:
diff --git a/examples/fine-tuning/qlora/train.dstack.yml b/examples/fine-tuning/qlora/train.dstack.yml
index a51bb1ff0..5f8785e4a 100644
--- a/examples/fine-tuning/qlora/train.dstack.yml
+++ b/examples/fine-tuning/qlora/train.dstack.yml
@@ -3,7 +3,7 @@ type: task
 python: "3.11"
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - HF_HUB_ENABLE_HF_TRANSFER=1
 
 commands:
diff --git a/examples/fine-tuning/trl/.dstack.yml b/examples/fine-tuning/trl/.dstack.yml
index 13685d624..b9720c326 100644
--- a/examples/fine-tuning/trl/.dstack.yml
+++ b/examples/fine-tuning/trl/.dstack.yml
@@ -7,7 +7,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 # Uncomment if you want the environment to be pre-installed
diff --git a/examples/fine-tuning/trl/README.md b/examples/fine-tuning/trl/README.md
index 5cffec021..03b33ff4e 100644
--- a/examples/fine-tuning/trl/README.md
+++ b/examples/fine-tuning/trl/README.md
@@ -32,7 +32,7 @@ python: "3.10"
 nvcc: true
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 commands:
   - pip install "transformers>=4.43.2"
@@ -108,7 +108,7 @@ python: "3.10"
 nvcc: true
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 commands:
   - pip install "transformers>=4.43.2"
diff --git a/examples/fine-tuning/trl/amd/train.dstack.yaml b/examples/fine-tuning/trl/amd/train.dstack.yaml
index 69b8744c3..3a41dc3fe 100644
--- a/examples/fine-tuning/trl/amd/train.dstack.yaml
+++ b/examples/fine-tuning/trl/amd/train.dstack.yaml
@@ -7,7 +7,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 
 commands:
   - export PATH=/opt/conda/envs/py_3.10/bin:$PATH
diff --git a/examples/fine-tuning/trl/train-distrib.dstack.yml b/examples/fine-tuning/trl/train-distrib.dstack.yml
index d8af736cf..18987f80e 100644
--- a/examples/fine-tuning/trl/train-distrib.dstack.yml
+++ b/examples/fine-tuning/trl/train-distrib.dstack.yml
@@ -9,7 +9,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 # Commands of the task
diff --git a/examples/fine-tuning/trl/train.dstack.yml b/examples/fine-tuning/trl/train.dstack.yml
index c91783b7a..a0f3f674c 100644
--- a/examples/fine-tuning/trl/train.dstack.yml
+++ b/examples/fine-tuning/trl/train.dstack.yml
@@ -6,7 +6,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - ACCELERATE_LOG_LEVEL=info
   - WANDB_API_KEY
 # Commands of the task
diff --git a/examples/llms/llama31/.dstack.yml b/examples/llms/llama31/.dstack.yml
index b9782c82a..e19289978 100644
--- a/examples/llms/llama31/.dstack.yml
+++ b/examples/llms/llama31/.dstack.yml
@@ -7,7 +7,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 ide: vscode
 
 # Use either spot or on-demand instances
diff --git a/examples/llms/llama31/README.md b/examples/llms/llama31/README.md
index 605b51cfd..6230db264 100644
--- a/examples/llms/llama31/README.md
+++ b/examples/llms/llama31/README.md
@@ -34,7 +34,7 @@ Below is the configuration file for the task.
     python: "3.10"
     
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
       - MAX_MODEL_LEN=4096
     commands:
@@ -67,7 +67,7 @@ Below is the configuration file for the task.
     image: ghcr.io/huggingface/text-generation-inference:latest
     
     env:
-      - HUGGING_FACE_HUB_TOKEN
+      - HF_TOKEN
       - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
       - MAX_INPUT_LENGTH=4000
       - MAX_TOTAL_TOKENS=4096
@@ -161,7 +161,7 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 
 $ dstack apply -f examples/llms/llama31/vllm/task.dstack.yml
 
@@ -226,7 +226,7 @@ python: "3.10"
 nvcc: true
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 commands:
   - pip install "transformers>=4.43.2"
@@ -312,7 +312,7 @@ python: "3.10"
 nvcc: true
 
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - WANDB_API_KEY
 commands:
   - pip install "transformers>=4.43.2"
diff --git a/examples/llms/llama31/tgi/.dstack.yml b/examples/llms/llama31/tgi/.dstack.yml
index c1ddb4ab0..e2ce95819 100644
--- a/examples/llms/llama31/tgi/.dstack.yml
+++ b/examples/llms/llama31/tgi/.dstack.yml
@@ -7,7 +7,7 @@ image: ghcr.io/huggingface/text-generation-inference:latest
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 ide: vscode
 
 # Use either spot or on-demand instances
diff --git a/examples/llms/llama31/tgi/task.dstack.yml b/examples/llms/llama31/tgi/task.dstack.yml
index 42ebf5c77..219b24b87 100644
--- a/examples/llms/llama31/tgi/task.dstack.yml
+++ b/examples/llms/llama31/tgi/task.dstack.yml
@@ -7,7 +7,7 @@ image: ghcr.io/huggingface/text-generation-inference:latest
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - MAX_INPUT_LENGTH=4000
   - MAX_TOTAL_TOKENS=4096
diff --git a/examples/llms/llama31/vllm/task.dstack.yml b/examples/llms/llama31/vllm/task.dstack.yml
index 67606f2a7..427bef351 100644
--- a/examples/llms/llama31/vllm/task.dstack.yml
+++ b/examples/llms/llama31/vllm/task.dstack.yml
@@ -6,7 +6,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - MAX_MODEL_LEN=4096
 commands:
diff --git a/examples/llms/llama32/.dstack.yml b/examples/llms/llama32/.dstack.yml
index 6915cd3fc..84be302d3 100644
--- a/examples/llms/llama32/.dstack.yml
+++ b/examples/llms/llama32/.dstack.yml
@@ -7,7 +7,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 ide: vscode
 
 # Use either spot or on-demand instances
diff --git a/examples/llms/llama32/README.md b/examples/llms/llama32/README.md
index 7d30ea1cc..f1e392f46 100644
--- a/examples/llms/llama32/README.md
+++ b/examples/llms/llama32/README.md
@@ -31,7 +31,7 @@ name: llama32-task-vllm
 python: "3.10"
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Llama-3.2-11B-Vision-Instruct
   - MAX_MODEL_LEN=13488
   - MAX_NUM_SEQS=40
@@ -85,7 +85,7 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
 <div class="termy">
 
 ```shell
-$ HUGGING_FACE_HUB_TOKEN=...
+$ HF_TOKEN=...
 
 $ dstack apply -f examples/llms/llama32/vllm/task.dstack.yml
 
diff --git a/examples/llms/llama32/vllm/task.dstack.yml b/examples/llms/llama32/vllm/task.dstack.yml
index 0dffb1169..e537e0a43 100644
--- a/examples/llms/llama32/vllm/task.dstack.yml
+++ b/examples/llms/llama32/vllm/task.dstack.yml
@@ -6,7 +6,7 @@ python: "3.10"
 
 # Required environment variables
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=meta-llama/Llama-3.2-11B-Vision-Instruct
   - MAX_MODEL_LEN=13488
   - MAX_NUM_SEQS=40
diff --git a/examples/llms/mixtral/tgi.dstack.yml b/examples/llms/mixtral/tgi.dstack.yml
index 31868de97..db90043a8 100644
--- a/examples/llms/mixtral/tgi.dstack.yml
+++ b/examples/llms/mixtral/tgi.dstack.yml
@@ -3,7 +3,7 @@ type: service
 
 image: ghcr.io/huggingface/text-generation-inference:latest
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
   - MODEL_ID=mistralai/Mixtral-8x7B-Instruct-v0.1
 commands:
   - text-generation-launcher
diff --git a/examples/llms/mixtral/vllm.dstack.yml b/examples/llms/mixtral/vllm.dstack.yml
index 59d0376a3..31bc11908 100644
--- a/examples/llms/mixtral/vllm.dstack.yml
+++ b/examples/llms/mixtral/vllm.dstack.yml
@@ -3,7 +3,7 @@ type: service
 
 python: "3.11"
 env:
-  - HUGGING_FACE_HUB_TOKEN
+  - HF_TOKEN
 commands:
   - pip install vllm
   - python -m vllm.entrypoints.openai.api_server
diff --git a/src/dstack/_internal/server/services/gateways/options.py b/src/dstack/_internal/server/services/gateways/options.py
index c46ad8662..6c75acb2c 100644
--- a/src/dstack/_internal/server/services/gateways/options.py
+++ b/src/dstack/_internal/server/services/gateways/options.py
@@ -10,7 +10,7 @@
 def complete_service_model(model_info: AnyModel, env: Dict[str, str]):
     if model_info.type == "chat" and model_info.format == "tgi":
         if model_info.chat_template is None or model_info.eos_token is None:
-            hf_token = env.get("HUGGING_FACE_HUB_TOKEN", None)
+            hf_token = env.get("HF_TOKEN", env.get("HUGGING_FACE_HUB_TOKEN"))
             tokenizer_config = get_tokenizer_config(model_info.name, hf_token=hf_token)
             if model_info.chat_template is None:
                 model_info.chat_template = tokenizer_config[
@@ -35,9 +35,9 @@ def get_tokenizer_config(model_id: str, hf_token: Optional[str] = None) -> dict:
         if resp.status_code == 403:
             raise ServerClientError("Private HF models are not supported")
         if resp.status_code == 401:
-            message = "Failed to access gated model. Specify HUGGING_FACE_HUB_TOKEN env."
+            message = "Failed to access gated model. Specify HF_TOKEN env."
             if hf_token is not None:
-                message = "Failed to access gated model. Invalid HUGGING_FACE_HUB_TOKEN env."
+                message = "Failed to access gated model. Invalid HF_TOKEN env."
             raise ServerClientError(message)
         resp.raise_for_status()
     except requests.RequestException as e: