vLLM backend Hugging Face feature branch model loading #7963

knitzschke · 2025-01-23T15:03:27Z

Is your feature request related to a problem? Please describe.

Currently we cant load different branches off of HF model repos directly from the HF repo.

Current set up of multi-lora.json:

{ 
    "lora_1": "company/lora_1_model_repo", 
    "lora_2": "company/lora_2_model_repo/tree/feature_branch"
}

Describe the solution you'd like

I have a main branch but also a feature branch in my HF model repo that has a new version of a lora model. I want to directly load in a specific branch based on my multi_lora.json config file.

Additional context

When I have the config above, the container build correctly in the system logs, however when i go to run inference against the 8000 endpoint I get this error:

{"error":"Error generating stream: Loading lora company/lora_2_model_repo/tree/feature_branch failed"}

for additional context, I am sort of implementing my code off this example, however I loading everything from HF instead of locally and saving it in the image. Triton Multi-lora example

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM backend Hugging Face feature branch model loading #7963

vLLM backend Hugging Face feature branch model loading #7963

knitzschke commented Jan 23, 2025 •

edited

Loading

vLLM backend Hugging Face feature branch model loading #7963

vLLM backend Hugging Face feature branch model loading #7963

Comments

knitzschke commented Jan 23, 2025 • edited Loading

knitzschke commented Jan 23, 2025 •

edited

Loading