Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vllm custom model support for OpenAI compatibility #224

Open
Navanit-git opened this issue Dec 12, 2024 · 4 comments
Open

Add vllm custom model support for OpenAI compatibility #224

Navanit-git opened this issue Dec 12, 2024 · 4 comments

Comments

@Navanit-git
Copy link
Contributor

Navanit-git commented Dec 12, 2024

Hi,
Is there a way where we can add the vLLM OpenAI compatibility support.
vllm OpenAI Support

So that anyone can use any llm model calls.

@samuelcolvin
Copy link
Member

Should work the same as ollama, see the code here - #112 (comment).

Happy to consider adding vllm as another custom model, but it would need more people to want it before we do the work.

@samuelcolvin
Copy link
Member

See #239, that would mean we could add VLLMModel.

@daavoo
Copy link

daavoo commented Dec 18, 2024

See #239, that would mean we could add VLLMModel.

Ola! I was testing pydantic.ai alongside vLLM and llama.cpp server, which I think both fulfill the rules for adding a new model.

I have looked at the existing Ollama code and I am not sure I understand the value of adding a new VLLMModel.
Looks like there is no custom logic (beyond providing a default api_key value) and provides an ~arbitrary hardcoded list of model names (which don't cover all available models)

I got this same snippet working with both vLLM and llama.cpp server out of the box:

# For example with llama.cpp:
docker run -v ./models:/models -p 8080:8080 \
ghcr.io/ggerganov/llama.cpp:server -m /models/smollm2-360m-instruct-q8_0.gguf
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

model = OpenAIModel(
    "mymodel", 
    base_url="http://localhost:8080/v1", 
    api_key="foo"
)

agent = Agent(  
    model=model,
    system_prompt='Be concise, reply with one sentence.',  
)

result = agent.run_sync('Where does "hello world" come from?')  
print(result.data)

So, is it better to just send a small documentation patch?

pd. I don't have a problem on contributing a new VLLM/LLAMACPP model myself, just wondering if it makes sense to keep adding those

@sadransh
Copy link

@daavoo This wouldnt work with tool calling and non str result type. you can re-use the example from here: to double check #398

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants