Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM backend Hugging Face feature branch model loading #7963

Open
knitzschke opened this issue Jan 23, 2025 · 0 comments
Open

vLLM backend Hugging Face feature branch model loading #7963

knitzschke opened this issue Jan 23, 2025 · 0 comments

Comments

@knitzschke
Copy link

knitzschke commented Jan 23, 2025

Is your feature request related to a problem? Please describe.

Currently we cant load different branches off of HF model repos directly from the HF repo.

Current set up of multi-lora.json:

{ 
    "lora_1": "company/lora_1_model_repo", 
    "lora_2": "company/lora_2_model_repo/tree/feature_branch"
}

Describe the solution you'd like

I have a main branch but also a feature branch in my HF model repo that has a new version of a lora model. I want to directly load in a specific branch based on my multi_lora.json config file.

Additional context

When I have the config above, the container build correctly in the system logs, however when i go to run inference against the 8000 endpoint I get this error:

{"error":"Error generating stream: Loading lora company/lora_2_model_repo/tree/feature_branch failed"}

for additional context, I am sort of implementing my code off this example, however I loading everything from HF instead of locally and saving it in the image. Triton Multi-lora example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant