[Usage]: File Access Error When Using RunAI Model Streamer with S3 in VLLM #12311

nskumz · 2025-01-22T09:21:16Z

Your current environment

I am encountering a persistent issue when attempting to serve a model from an S3 bucket using the vllm serve command with the --load-format runai_streamer option. Despite having proper access to the S3 bucket and all required files being present, the process fails with a "File access error." Below are the details of the issue:

Command Used:
vllm serve s3://hip-general/benchmark-model-loading/ --load-format runai_streamer

Error Message:
Exception: Could not send runai_request to libstreamer due to: b'File access error'

Environment Details:
VLLM version: 0.6.6
Python version: 3.12
RunAI Model Streamer version: 0.11.2
S3 Region: us-west-2


Files in S3 Bucket:
config.json
generation_config.json
model-00001-of-00004.safetensors
model-00002-of-00004.safetensors
model-00003-of-00004.safetensors
model-00004-of-00004.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json

my deployment file is

apiVersion: apps/v1
kind: Deployment
metadata:
name: benchmark-model-8b
namespace: workload
spec:
replicas: 1
selector:
matchLabels:
app: benchmark-model-8b
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
app: benchmark-model-8b
spec:
containers:
- command:
- sh
- -c
- exec tail -f /dev/null
env:
- name: HF_HOME
value: /huggingface
- name: HUGGINGFACE_HUB_CACHE
value: /huggingface/hub
- name: HF_HUB_ENABLE_HF_TRANSFER
value: "False"
- name: HUGGING_FACE_HUB_TOKEN
value: ""
image: vllm/vllm-openai:v0.6.6
imagePullPolicy: IfNotPresent
name: benchmark-model-8b
ports:
- containerPort: 8888
name: http
protocol: TCP
resources:
limits:
nvidia.com/gpu: "1"
requests:
cpu: "5"
memory: 128Gi
securityContext:
capabilities:
add:
- SYS_ADMIN
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /huggingface
name: hf-volume
- mountPath: /dev/shm
name: dshm
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: hf-volume
persistentVolumeClaim:
claimName: benchmark-model-pvc
- emptyDir:
medium: Memory
sizeLimit: 90Gi
name: dshm

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

nskumz · 2025-01-22T10:40:32Z

Could you please pickup this one @omer-dayan

noa-neria · 2025-01-22T14:50:43Z

Hi,

Try to pass the credentials as environment variables in the command line:
AWS_ACCESS_KEY_ID=my_key AWS_SECRET_ACCESS_KEY=my_secret vllm serve s3://hip-general/benchmark-model-loading/ --load-format runai_streamer

Our implementation is using the AWS S3 C++ SDK, which applies the default authentication chain of AWS and is aligned with AWS CLI.

In order to find the problem you can check the AWS trace logs. Add the following environment variable to the command line
RUNAI_STREAMER_S3_TRACE=1

Trace logs are written into a file in the location of the executable (where the vllm is running)

There can be various reasons why the AWS CLI succeeds but not the SDK, for example

Credentials issues
The SDK may not be resolving credentials the same way the AWS CLI does.
- Ensure the environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN) are correctly set if used.
- By default, the SDK uses the default profile unless another is specified. If using the shared credentials file, ensure the AWS_PROFILE environment variable is set correctly, or the default profile is configured correctly.
- If using an IAM role (e.g., on EC2), ensure the instance or container has the correct permissions attached.
- If using credentials file, the SDK might not be looking in the same location as the CLI for the credentials file. Pass the correct location e.g. AWS_SHARED_CREDENTIALS_FILE=~/.aws/credentials
- If using credentials file, verify its format since the SDK is more strict. Avoid trailing spaces or malformed entries.
Region mismatch
Check the logs for a line similar to Resolved region: us-east-1 and compare to the CLI region aws configure get region

nskumz · 2025-01-23T11:27:44Z

Thanks for the quick response @noa-neria , so as you suggested i just configured in above way using this command AWS_ACCESS_KEY_ID="ASIAXYKJSZPEDFHGSUHDF2" AWS_SECRET_ACCESS_KEY="8CltEJHjedfjkjfhWDJHHuue/h" vllm serve s3://hip-general/benchmark-model-loading/ --load-format runai_streamer

for this, i am getting as below error,
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/tmp/tmpx3kfctt4'. Use repo_type argument if needed.

for this i have passed the require argument like --model <> ,still i am getting this.
Could you please provide something about it to resolve.

omer-dayan · 2025-01-23T12:35:20Z

Hey @nskumz
Thanks for the report!

It is indeed a bug.
I opened a PR for fixing it:
#12353

For now, a workaround is to remove the trailing "/" at the end of the path:
s3://oip-general/benchmark-model-loading/ -> s3://oip-general/benchmark-model-loading

Sorry for the inconvenience.

nskumz added the usage How to use vllm label Jan 22, 2025

omer-dayan linked a pull request Jan 23, 2025 that will close this issue

[Bugfix] Path join when building local path for S3 clone #12353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: File Access Error When Using RunAI Model Streamer with S3 in VLLM #12311

[Usage]: File Access Error When Using RunAI Model Streamer with S3 in VLLM #12311

nskumz commented Jan 22, 2025 •

edited

Loading

nskumz commented Jan 22, 2025

noa-neria commented Jan 22, 2025 •

edited

Loading

nskumz commented Jan 23, 2025 •

edited

Loading

omer-dayan commented Jan 23, 2025

[Usage]: File Access Error When Using RunAI Model Streamer with S3 in VLLM #12311

[Usage]: File Access Error When Using RunAI Model Streamer with S3 in VLLM #12311

Comments

nskumz commented Jan 22, 2025 • edited Loading

Your current environment

my deployment file is

Before submitting a new issue...

nskumz commented Jan 22, 2025

noa-neria commented Jan 22, 2025 • edited Loading

nskumz commented Jan 23, 2025 • edited Loading

omer-dayan commented Jan 23, 2025

nskumz commented Jan 22, 2025 •

edited

Loading

noa-neria commented Jan 22, 2025 •

edited

Loading

nskumz commented Jan 23, 2025 •

edited

Loading