diff --git a/docs/blog/posts/amd-on-runpod.md b/docs/blog/posts/amd-on-runpod.md index f96aaf12c..87ba0518b 100644 --- a/docs/blog/posts/amd-on-runpod.md +++ b/docs/blog/posts/amd-on-runpod.md @@ -50,7 +50,7 @@ you can now specify an AMD GPU under `resources`. Below are a few examples. image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct - TRUST_REMOTE_CODE=true - ROCM_USE_FLASH_ATTN_V2_TRITON=true @@ -83,7 +83,7 @@ you can now specify an AMD GPU under `resources`. Below are a few examples. image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ROCM_USE_FLASH_ATTN_V2_TRITON=true ide: vscode diff --git a/docs/blog/posts/tpu-on-gcp.md b/docs/blog/posts/tpu-on-gcp.md index 7973f2494..2bfe35b91 100644 --- a/docs/blog/posts/tpu-on-gcp.md +++ b/docs/blog/posts/tpu-on-gcp.md @@ -58,7 +58,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm- image: dstackai/optimum-tpu:llama31 env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_TOTAL_TOKENS=4096 - MAX_BATCH_PREFILL_TOKENS=4095 @@ -89,7 +89,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm- env: - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - DATE=20240828 - TORCH_VERSION=2.5.0 - VLLM_TARGET_DEVICE=tpu @@ -167,7 +167,7 @@ name: optimum-tpu-llama-train python: "3.11" env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN commands: - git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git - mkdir -p optimum-tpu/examples/custom/ diff --git a/docs/blog/posts/volumes-on-runpod.md b/docs/blog/posts/volumes-on-runpod.md index 116a121bb..58b02d4a9 100644 --- a/docs/blog/posts/volumes-on-runpod.md +++ b/docs/blog/posts/volumes-on-runpod.md @@ -33,7 +33,7 @@ scaling: image: ghcr.io/huggingface/text-generation-inference:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_INPUT_LENGTH=4000 - MAX_TOTAL_TOKENS=4096 @@ -110,7 +110,7 @@ volumes: image: ghcr.io/huggingface/text-generation-inference:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_INPUT_LENGTH=4000 - MAX_TOTAL_TOKENS=4096 diff --git a/docs/docs/guides/protips.md b/docs/docs/guides/protips.md index e433e8ab2..0749be141 100644 --- a/docs/docs/guides/protips.md +++ b/docs/docs/guides/protips.md @@ -181,7 +181,7 @@ name: vscode python: "3.10" env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN ide: vscode ``` @@ -190,20 +190,20 @@ ide: vscode Then, you can pass the environment variable either via the shell: ```shell -HUGGING_FACE_HUB_TOKEN=... dstack apply -f .dstack.yml +HF_TOKEN=... dstack apply -f .dstack.yml ``` Or via the `-e` option of the `dstack apply` command: ```shell -dstack apply -f .dstack.yml -e HUGGING_FACE_HUB_TOKEN=... +dstack apply -f .dstack.yml -e HF_TOKEN=... ``` ??? info ".env" A better way to configure environment variables not hardcoded in YAML is by specifying them in a `.env` file: ``` - HUGGING_FACE_HUB_TOKEN=... + HF_TOKEN=... ``` If you install [`direnv` :material-arrow-top-right-thin:{ .external }](https://direnv.net/){:target="_blank"}, diff --git a/docs/docs/reference/dstack.yml/dev-environment.md b/docs/docs/reference/dstack.yml/dev-environment.md index 3e1bd0b5b..ba19fe966 100644 --- a/docs/docs/reference/dstack.yml/dev-environment.md +++ b/docs/docs/reference/dstack.yml/dev-environment.md @@ -151,7 +151,7 @@ name: vscode # Environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - HF_HUB_ENABLE_HF_TRANSFER=1 ide: vscode @@ -159,7 +159,7 @@ ide: vscode -> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), +> If you don't assign a value to an environment variable (see `HF_TOKEN` above), `dstack` will require the value to be passed via the CLI or set in the current process. For instance, you can define environment variables in a `.envrc` file and utilize tools like `direnv`. diff --git a/docs/docs/reference/dstack.yml/service.md b/docs/docs/reference/dstack.yml/service.md index 5d638ec41..4aa768583 100644 --- a/docs/docs/reference/dstack.yml/service.md +++ b/docs/docs/reference/dstack.yml/service.md @@ -312,7 +312,7 @@ python: "3.10" # Environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL=NousResearch/Llama-2-7b-chat-hf # Commands of the service commands: @@ -328,7 +328,7 @@ resources: -If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), +If you don't assign a value to an environment variable (see `HF_TOKEN` above), `dstack` will require the value to be passed via the CLI or set in the current process. For instance, you can define environment variables in a `.envrc` file and utilize tools like `direnv`. diff --git a/docs/docs/reference/dstack.yml/task.md b/docs/docs/reference/dstack.yml/task.md index e2e052968..4e069d68f 100644 --- a/docs/docs/reference/dstack.yml/task.md +++ b/docs/docs/reference/dstack.yml/task.md @@ -201,7 +201,7 @@ python: "3.10" # Environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - HF_HUB_ENABLE_HF_TRANSFER=1 # Commands of the task @@ -212,7 +212,7 @@ commands: -> If you don't assign a value to an environment variable (see `HUGGING_FACE_HUB_TOKEN` above), +> If you don't assign a value to an environment variable (see `HF_TOKEN` above), `dstack` will require the value to be passed via the CLI or set in the current process. For instance, you can define environment variables in a `.envrc` file and utilize tools like `direnv`. diff --git a/docs/docs/services.md b/docs/docs/services.md index c916580d1..bdb84aa38 100644 --- a/docs/docs/services.md +++ b/docs/docs/services.md @@ -30,7 +30,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN commands: - pip install vllm - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096 @@ -72,7 +72,7 @@ To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-a
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ dstack apply -f service.dstack.yml diff --git a/docs/docs/tasks.md b/docs/docs/tasks.md index da4f9df1e..684387d83 100644 --- a/docs/docs/tasks.md +++ b/docs/docs/tasks.md @@ -25,7 +25,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY # Commands of the task commands: @@ -60,7 +60,7 @@ To run a configuration, use the [`dstack apply`](reference/cli/index.md#dstack-a
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ WANDB_API_KEY=... $ dstack apply -f examples/.dstack.yml diff --git a/examples/.dstack.yml b/examples/.dstack.yml index 568c14139..0a1d4480c 100644 --- a/examples/.dstack.yml +++ b/examples/.dstack.yml @@ -11,8 +11,6 @@ ide: vscode # Use either spot or on-demand instances spot_policy: auto +# Required resources resources: - memory: 16MB.. - shm_size: 8MB -# gpu: A10 -# disk: 100GB.. + gpu: 24GB diff --git a/examples/accelerators/amd/README.md b/examples/accelerators/amd/README.md index 874852e72..482c38828 100644 --- a/examples/accelerators/amd/README.md +++ b/examples/accelerators/amd/README.md @@ -21,7 +21,7 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct - TRUST_REMOTE_CODE=true - ROCM_USE_FLASH_ATTN_V2_TRITON=true @@ -61,7 +61,7 @@ Llama 3.1 70B in FP16 using [TGI :material-arrow-top-right-thin:{ .external }](h # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct - MAX_MODEL_LEN=126192 # Commands of the task @@ -135,7 +135,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN # Commands of the task commands: - export PATH=/opt/conda/envs/py_3.10/bin:$PATH @@ -177,7 +177,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN # Commands of the task commands: - export PATH=/opt/conda/envs/py_3.10/bin:$PATH @@ -224,7 +224,7 @@ cloud resources and run the configuration.
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ dstack apply -f examples/deployment/vllm/amd/service.dstack.yml ``` diff --git a/examples/accelerators/tpu/README.md b/examples/accelerators/tpu/README.md index 471481cae..77ce85685 100644 --- a/examples/accelerators/tpu/README.md +++ b/examples/accelerators/tpu/README.md @@ -25,7 +25,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm- image: dstackai/optimum-tpu:llama31 env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_TOTAL_TOKENS=4096 - MAX_BATCH_PREFILL_TOKENS=4095 @@ -61,7 +61,7 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm- env: - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - DATE=20240828 - TORCH_VERSION=2.5.0 - VLLM_TARGET_DEVICE=tpu @@ -135,7 +135,7 @@ name: optimum-tpu-llama-train python: "3.11" env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN commands: - git clone -b add_llama_31_support https://github.com/dstackai/optimum-tpu.git - mkdir -p optimum-tpu/examples/custom/ diff --git a/examples/deployment/lorax/serve-task.dstack.yml b/examples/deployment/lorax/serve-task.dstack.yml index 36ac0e950..13adea218 100644 --- a/examples/deployment/lorax/serve-task.dstack.yml +++ b/examples/deployment/lorax/serve-task.dstack.yml @@ -3,7 +3,7 @@ type: task image: ghcr.io/predibase/lorax:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1 commands: diff --git a/examples/deployment/lorax/serve.dstack.yml b/examples/deployment/lorax/serve.dstack.yml index a48513cfd..4513c0640 100644 --- a/examples/deployment/lorax/serve.dstack.yml +++ b/examples/deployment/lorax/serve.dstack.yml @@ -3,7 +3,7 @@ type: service image: ghcr.io/predibase/lorax:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.1 commands: diff --git a/examples/deployment/optimum-tpu/.dstack.yml b/examples/deployment/optimum-tpu/.dstack.yml index f34d3e9bb..3f2a187fc 100644 --- a/examples/deployment/optimum-tpu/.dstack.yml +++ b/examples/deployment/optimum-tpu/.dstack.yml @@ -7,7 +7,7 @@ name: vscode-optimum-tpu image: dstackai/optimum-tpu:llama31 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN ide: vscode resources: diff --git a/examples/deployment/optimum-tpu/service.dstack.yml b/examples/deployment/optimum-tpu/service.dstack.yml index 1b9ad8db3..663257f60 100644 --- a/examples/deployment/optimum-tpu/service.dstack.yml +++ b/examples/deployment/optimum-tpu/service.dstack.yml @@ -7,7 +7,7 @@ name: llama31-service-optimum-tpu image: dstackai/optimum-tpu:llama31 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_TOTAL_TOKENS=4096 - MAX_BATCH_PREFILL_TOKENS=4095 diff --git a/examples/deployment/optimum-tpu/task.dstack.yml b/examples/deployment/optimum-tpu/task.dstack.yml index 8a581e14b..e183c1b94 100644 --- a/examples/deployment/optimum-tpu/task.dstack.yml +++ b/examples/deployment/optimum-tpu/task.dstack.yml @@ -7,7 +7,7 @@ name: llama31-task-optimum-tpu image: dstackai/optimum-tpu:llama31 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_TOTAL_TOKENS=4096 - MAX_BATCH_PREFILL_TOKENS=4095 diff --git a/examples/deployment/tgi/amd/.dstack.yml b/examples/deployment/tgi/amd/.dstack.yml index 345b135bb..2443c20a7 100644 --- a/examples/deployment/tgi/amd/.dstack.yml +++ b/examples/deployment/tgi/amd/.dstack.yml @@ -4,7 +4,7 @@ name: dev-tgi-amd image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ROCM_USE_FLASH_ATTN_V2_TRITON=true ide: vscode diff --git a/examples/deployment/tgi/amd/service.dstack.yml b/examples/deployment/tgi/amd/service.dstack.yml index 686e78a31..f3bedcd6c 100644 --- a/examples/deployment/tgi/amd/service.dstack.yml +++ b/examples/deployment/tgi/amd/service.dstack.yml @@ -3,7 +3,7 @@ name: service-tgi-amd image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ROCM_USE_FLASH_ATTN_V2_TRITON=true - TRUST_REMOTE_CODE=true - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct diff --git a/examples/deployment/tgi/serve-task.dstack.yml b/examples/deployment/tgi/serve-task.dstack.yml index 5376e635b..d35b7b2d1 100644 --- a/examples/deployment/tgi/serve-task.dstack.yml +++ b/examples/deployment/tgi/serve-task.dstack.yml @@ -3,7 +3,7 @@ type: task image: ghcr.io/huggingface/text-generation-inference:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.2 commands: - text-generation-launcher --port 8000 --trust-remote-code diff --git a/examples/deployment/tgi/serve.dstack.yml b/examples/deployment/tgi/serve.dstack.yml index 81a27e4f0..c1af47a7a 100644 --- a/examples/deployment/tgi/serve.dstack.yml +++ b/examples/deployment/tgi/serve.dstack.yml @@ -3,7 +3,7 @@ type: service image: ghcr.io/huggingface/text-generation-inference:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=mistralai/Mistral-7B-Instruct-v0.2 commands: - text-generation-launcher --port 8000 --trust-remote-code diff --git a/examples/deployment/vllm/amd/.dstack.yml b/examples/deployment/vllm/amd/.dstack.yml index 6aaed21a0..053bb390f 100644 --- a/examples/deployment/vllm/amd/.dstack.yml +++ b/examples/deployment/vllm/amd/.dstack.yml @@ -4,7 +4,7 @@ name: dev-vLLM-amd image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04 env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN ide: vscode diff --git a/examples/deployment/vllm/amd/build.vllm-rocm.yaml b/examples/deployment/vllm/amd/build.vllm-rocm.yaml index 00112df96..a4648ac49 100644 --- a/examples/deployment/vllm/amd/build.vllm-rocm.yaml +++ b/examples/deployment/vllm/amd/build.vllm-rocm.yaml @@ -4,7 +4,7 @@ name: build-vllm-rocm image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04 env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - AWS_ACCESS_KEY_ID - AWS_SECRET_ACCESS_KEY - AWS_REGION diff --git a/examples/deployment/vllm/amd/service.dstack.yml b/examples/deployment/vllm/amd/service.dstack.yml index e91858f28..aabe2daac 100644 --- a/examples/deployment/vllm/amd/service.dstack.yml +++ b/examples/deployment/vllm/amd/service.dstack.yml @@ -4,7 +4,7 @@ name: llama31-service-vllm-amd image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04 env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct - MAX_MODEL_LEN=126192 diff --git a/examples/deployment/vllm/service-tpu.dstack.yml b/examples/deployment/vllm/service-tpu.dstack.yml index 230a1c539..6c87082f6 100644 --- a/examples/deployment/vllm/service-tpu.dstack.yml +++ b/examples/deployment/vllm/service-tpu.dstack.yml @@ -3,7 +3,7 @@ type: service name: llama31-service-vllm-tpu env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - DATE=20240828 - TORCH_VERSION=2.5.0 diff --git a/examples/fine-tuning/alignment-handbook/.dstack.yml b/examples/fine-tuning/alignment-handbook/.dstack.yml index fc97d6b96..21d73d70a 100644 --- a/examples/fine-tuning/alignment-handbook/.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/.dstack.yml @@ -7,7 +7,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY diff --git a/examples/fine-tuning/alignment-handbook/README.md b/examples/fine-tuning/alignment-handbook/README.md index 799fdc972..c0e0209a3 100644 --- a/examples/fine-tuning/alignment-handbook/README.md +++ b/examples/fine-tuning/alignment-handbook/README.md @@ -44,7 +44,7 @@ nvcc: true # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY # Commands of the task @@ -79,7 +79,7 @@ To run the task, use `dstack apply`:
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ WANDB_API_KEY=... $ dstack apply -f examples/fine-tuning/alignment-handbook/train.dstack.yml @@ -109,7 +109,7 @@ nodes: 2 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY # Commands of the task (dstack runs it on each node) diff --git a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml index b33902a5d..47eb56b50 100644 --- a/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/train-distrib.dstack.yml @@ -7,7 +7,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY # Commands of the task (dstack runs it on each node) diff --git a/examples/fine-tuning/alignment-handbook/train.dstack.yml b/examples/fine-tuning/alignment-handbook/train.dstack.yml index a52a3b08f..6aeedafdf 100644 --- a/examples/fine-tuning/alignment-handbook/train.dstack.yml +++ b/examples/fine-tuning/alignment-handbook/train.dstack.yml @@ -7,7 +7,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY # Commands of the task diff --git a/examples/fine-tuning/axolotl/.dstack.yml b/examples/fine-tuning/axolotl/.dstack.yml index 4b9096cfa..de7161ef1 100644 --- a/examples/fine-tuning/axolotl/.dstack.yml +++ b/examples/fine-tuning/axolotl/.dstack.yml @@ -7,7 +7,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY ide: vscode diff --git a/examples/fine-tuning/axolotl/README.md b/examples/fine-tuning/axolotl/README.md index 2946594ca..4265fecaa 100644 --- a/examples/fine-tuning/axolotl/README.md +++ b/examples/fine-tuning/axolotl/README.md @@ -41,7 +41,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY # Commands of the task commands: @@ -73,7 +73,7 @@ cloud resources and run the configuration.
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ WANDB_API_KEY=... $ dstack apply -f examples/fine-tuning/axolotl/train.dstack.yml ``` @@ -116,7 +116,7 @@ If you'd like to play with the example using a dev environment, run
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ WANDB_API_KEY=... $ dstack apply -f examples/fine-tuning/axolotl/.dstack.yaml ``` diff --git a/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml b/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml index 1468bf8dc..c60e993a3 100644 --- a/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml +++ b/examples/fine-tuning/axolotl/amd/build.flash-attention.yaml @@ -6,7 +6,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - GPU_ARCHS="gfx90a;gfx942" - AWS_ACCESS_KEY_ID - AWS_SECRET_ACCESS_KEY diff --git a/examples/fine-tuning/axolotl/amd/build.xformers.yaml b/examples/fine-tuning/axolotl/amd/build.xformers.yaml index a3733ec50..49cbc1e7a 100644 --- a/examples/fine-tuning/axolotl/amd/build.xformers.yaml +++ b/examples/fine-tuning/axolotl/amd/build.xformers.yaml @@ -6,7 +6,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - GPU_ARCHS="gfx90a;gfx942" - AWS_ACCESS_KEY_ID - AWS_SECRET_ACCESS_KEY diff --git a/examples/fine-tuning/axolotl/amd/train.dstack.yaml b/examples/fine-tuning/axolotl/amd/train.dstack.yaml index 5de02b353..80921fdb4 100644 --- a/examples/fine-tuning/axolotl/amd/train.dstack.yaml +++ b/examples/fine-tuning/axolotl/amd/train.dstack.yaml @@ -6,7 +6,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN # Commands of the task commands: - export PATH=/opt/conda/envs/py_3.10/bin:$PATH diff --git a/examples/fine-tuning/axolotl/train.dstack.yaml b/examples/fine-tuning/axolotl/train.dstack.yaml index 3dd8b8ddb..38d543110 100644 --- a/examples/fine-tuning/axolotl/train.dstack.yaml +++ b/examples/fine-tuning/axolotl/train.dstack.yaml @@ -7,7 +7,7 @@ image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.1 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY # Commands of the task commands: diff --git a/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml b/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml index 8dc522e0e..2577f2f9a 100644 --- a/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml +++ b/examples/fine-tuning/optimum-tpu/llama31/.dstack.yml @@ -7,7 +7,7 @@ python: "3.11" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN # Refer to Note section in examples/gpus/tpu/README.md for more information about the optimum-tpu repository. # Uncomment if you want the environment to be pre-installed diff --git a/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml b/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml index 04fdfb744..4a4234177 100644 --- a/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml +++ b/examples/fine-tuning/optimum-tpu/llama31/train.dstack.yml @@ -6,7 +6,7 @@ python: "3.11" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN # Commands of the task commands: diff --git a/examples/fine-tuning/qlora/train.dstack.yml b/examples/fine-tuning/qlora/train.dstack.yml index a51bb1ff0..5f8785e4a 100644 --- a/examples/fine-tuning/qlora/train.dstack.yml +++ b/examples/fine-tuning/qlora/train.dstack.yml @@ -3,7 +3,7 @@ type: task python: "3.11" env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - HF_HUB_ENABLE_HF_TRANSFER=1 commands: diff --git a/examples/fine-tuning/trl/.dstack.yml b/examples/fine-tuning/trl/.dstack.yml index 13685d624..b9720c326 100644 --- a/examples/fine-tuning/trl/.dstack.yml +++ b/examples/fine-tuning/trl/.dstack.yml @@ -7,7 +7,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY # Uncomment if you want the environment to be pre-installed diff --git a/examples/fine-tuning/trl/README.md b/examples/fine-tuning/trl/README.md index 5cffec021..03b33ff4e 100644 --- a/examples/fine-tuning/trl/README.md +++ b/examples/fine-tuning/trl/README.md @@ -32,7 +32,7 @@ python: "3.10" nvcc: true env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY commands: - pip install "transformers>=4.43.2" @@ -108,7 +108,7 @@ python: "3.10" nvcc: true env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY commands: - pip install "transformers>=4.43.2" diff --git a/examples/fine-tuning/trl/amd/train.dstack.yaml b/examples/fine-tuning/trl/amd/train.dstack.yaml index 69b8744c3..3a41dc3fe 100644 --- a/examples/fine-tuning/trl/amd/train.dstack.yaml +++ b/examples/fine-tuning/trl/amd/train.dstack.yaml @@ -7,7 +7,7 @@ image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04 # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN commands: - export PATH=/opt/conda/envs/py_3.10/bin:$PATH diff --git a/examples/fine-tuning/trl/train-distrib.dstack.yml b/examples/fine-tuning/trl/train-distrib.dstack.yml index d8af736cf..18987f80e 100644 --- a/examples/fine-tuning/trl/train-distrib.dstack.yml +++ b/examples/fine-tuning/trl/train-distrib.dstack.yml @@ -9,7 +9,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY # Commands of the task diff --git a/examples/fine-tuning/trl/train.dstack.yml b/examples/fine-tuning/trl/train.dstack.yml index c91783b7a..a0f3f674c 100644 --- a/examples/fine-tuning/trl/train.dstack.yml +++ b/examples/fine-tuning/trl/train.dstack.yml @@ -6,7 +6,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - ACCELERATE_LOG_LEVEL=info - WANDB_API_KEY # Commands of the task diff --git a/examples/llms/llama31/.dstack.yml b/examples/llms/llama31/.dstack.yml index b9782c82a..e19289978 100644 --- a/examples/llms/llama31/.dstack.yml +++ b/examples/llms/llama31/.dstack.yml @@ -7,7 +7,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN ide: vscode # Use either spot or on-demand instances diff --git a/examples/llms/llama31/README.md b/examples/llms/llama31/README.md index 605b51cfd..6230db264 100644 --- a/examples/llms/llama31/README.md +++ b/examples/llms/llama31/README.md @@ -34,7 +34,7 @@ Below is the configuration file for the task. python: "3.10" env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_MODEL_LEN=4096 commands: @@ -67,7 +67,7 @@ Below is the configuration file for the task. image: ghcr.io/huggingface/text-generation-inference:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_INPUT_LENGTH=4000 - MAX_TOTAL_TOKENS=4096 @@ -161,7 +161,7 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ dstack apply -f examples/llms/llama31/vllm/task.dstack.yml @@ -226,7 +226,7 @@ python: "3.10" nvcc: true env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY commands: - pip install "transformers>=4.43.2" @@ -312,7 +312,7 @@ python: "3.10" nvcc: true env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - WANDB_API_KEY commands: - pip install "transformers>=4.43.2" diff --git a/examples/llms/llama31/tgi/.dstack.yml b/examples/llms/llama31/tgi/.dstack.yml index c1ddb4ab0..e2ce95819 100644 --- a/examples/llms/llama31/tgi/.dstack.yml +++ b/examples/llms/llama31/tgi/.dstack.yml @@ -7,7 +7,7 @@ image: ghcr.io/huggingface/text-generation-inference:latest # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN ide: vscode # Use either spot or on-demand instances diff --git a/examples/llms/llama31/tgi/task.dstack.yml b/examples/llms/llama31/tgi/task.dstack.yml index 42ebf5c77..219b24b87 100644 --- a/examples/llms/llama31/tgi/task.dstack.yml +++ b/examples/llms/llama31/tgi/task.dstack.yml @@ -7,7 +7,7 @@ image: ghcr.io/huggingface/text-generation-inference:latest # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_INPUT_LENGTH=4000 - MAX_TOTAL_TOKENS=4096 diff --git a/examples/llms/llama31/vllm/task.dstack.yml b/examples/llms/llama31/vllm/task.dstack.yml index 67606f2a7..427bef351 100644 --- a/examples/llms/llama31/vllm/task.dstack.yml +++ b/examples/llms/llama31/vllm/task.dstack.yml @@ -6,7 +6,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_MODEL_LEN=4096 commands: diff --git a/examples/llms/llama32/.dstack.yml b/examples/llms/llama32/.dstack.yml index 6915cd3fc..84be302d3 100644 --- a/examples/llms/llama32/.dstack.yml +++ b/examples/llms/llama32/.dstack.yml @@ -7,7 +7,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN ide: vscode # Use either spot or on-demand instances diff --git a/examples/llms/llama32/README.md b/examples/llms/llama32/README.md index 7d30ea1cc..f1e392f46 100644 --- a/examples/llms/llama32/README.md +++ b/examples/llms/llama32/README.md @@ -31,7 +31,7 @@ name: llama32-task-vllm python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Llama-3.2-11B-Vision-Instruct - MAX_MODEL_LEN=13488 - MAX_NUM_SEQS=40 @@ -85,7 +85,7 @@ To run a configuration, use the [`dstack apply`](https://dstack.ai/docs/referenc
```shell -$ HUGGING_FACE_HUB_TOKEN=... +$ HF_TOKEN=... $ dstack apply -f examples/llms/llama32/vllm/task.dstack.yml diff --git a/examples/llms/llama32/vllm/task.dstack.yml b/examples/llms/llama32/vllm/task.dstack.yml index 0dffb1169..e537e0a43 100644 --- a/examples/llms/llama32/vllm/task.dstack.yml +++ b/examples/llms/llama32/vllm/task.dstack.yml @@ -6,7 +6,7 @@ python: "3.10" # Required environment variables env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=meta-llama/Llama-3.2-11B-Vision-Instruct - MAX_MODEL_LEN=13488 - MAX_NUM_SEQS=40 diff --git a/examples/llms/mixtral/tgi.dstack.yml b/examples/llms/mixtral/tgi.dstack.yml index 31868de97..db90043a8 100644 --- a/examples/llms/mixtral/tgi.dstack.yml +++ b/examples/llms/mixtral/tgi.dstack.yml @@ -3,7 +3,7 @@ type: service image: ghcr.io/huggingface/text-generation-inference:latest env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN - MODEL_ID=mistralai/Mixtral-8x7B-Instruct-v0.1 commands: - text-generation-launcher diff --git a/examples/llms/mixtral/vllm.dstack.yml b/examples/llms/mixtral/vllm.dstack.yml index 59d0376a3..31bc11908 100644 --- a/examples/llms/mixtral/vllm.dstack.yml +++ b/examples/llms/mixtral/vllm.dstack.yml @@ -3,7 +3,7 @@ type: service python: "3.11" env: - - HUGGING_FACE_HUB_TOKEN + - HF_TOKEN commands: - pip install vllm - python -m vllm.entrypoints.openai.api_server diff --git a/src/dstack/_internal/server/services/gateways/options.py b/src/dstack/_internal/server/services/gateways/options.py index c46ad8662..6c75acb2c 100644 --- a/src/dstack/_internal/server/services/gateways/options.py +++ b/src/dstack/_internal/server/services/gateways/options.py @@ -10,7 +10,7 @@ def complete_service_model(model_info: AnyModel, env: Dict[str, str]): if model_info.type == "chat" and model_info.format == "tgi": if model_info.chat_template is None or model_info.eos_token is None: - hf_token = env.get("HUGGING_FACE_HUB_TOKEN", None) + hf_token = env.get("HF_TOKEN", env.get("HUGGING_FACE_HUB_TOKEN")) tokenizer_config = get_tokenizer_config(model_info.name, hf_token=hf_token) if model_info.chat_template is None: model_info.chat_template = tokenizer_config[ @@ -35,9 +35,9 @@ def get_tokenizer_config(model_id: str, hf_token: Optional[str] = None) -> dict: if resp.status_code == 403: raise ServerClientError("Private HF models are not supported") if resp.status_code == 401: - message = "Failed to access gated model. Specify HUGGING_FACE_HUB_TOKEN env." + message = "Failed to access gated model. Specify HF_TOKEN env." if hf_token is not None: - message = "Failed to access gated model. Invalid HUGGING_FACE_HUB_TOKEN env." + message = "Failed to access gated model. Invalid HF_TOKEN env." raise ServerClientError(message) resp.raise_for_status() except requests.RequestException as e: