Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ollama's structured outputs feature #242

Closed
daniel-j-h opened this issue Dec 13, 2024 · 18 comments
Closed

Use ollama's structured outputs feature #242

daniel-j-h opened this issue Dec 13, 2024 · 18 comments
Labels
enhancement New feature or request

Comments

@daniel-j-h
Copy link

I'm looking into pydantic-ai with small and locally running ollama models as backbones.

I'm noticing that sometimes even for simple models it's possible to run into unexpected ValidationErrors.

Here's what I mean: With a pydantic model as simple as

class Answer(BaseModel):
    value: str = ""

I can see pytandic-ai sometimes retrying and failing in validation.

Having experience with llama.cpp's grammars this was unexpected to me. I was under the assumption that pydantic-ai would transform the pydantic model into a grammar or json schema to hard-restrict the llm's output accordingly. Then validation could never fail by design since the llm's output is restricted to the specific grammar.

Instead when I debug the request pydantic-ai sends to the locally running ollama with

nc -l -p 11434

I can see pydantic-ai turning the pydantic model into a tool use invocation.

With ollama v0.5.0 structured output via json schema is now supported:

https://github.com/ollama/ollama/releases/tag/v0.5.0

I was wondering if that would solve the issue of small locally running models sometimes running into validation errors, since we hard-restrict the output to the shape of our pydantic model.

Any thoughts on this, or ideas why validation can fail with tool usage as implemented right now? Any pointers in terms of for which model providers validation might fail and for what reason? Thanks!

@daniel-j-h
Copy link
Author

I have looked into this a bit more and here is what is happening

  1. The user specifies a return type for a pydantic-ai agent in form of a pydantic model
  2. The pydantic model's jsonschema gets passed to the llm in form of a tool and its arguments
  3. Because tool usage is optional the ollama models on the smaller side often never use the tool
  4. pydantic-ai then fails validation and retries, and more often than not the 2nd try also doesn't use the tool

The underlying issue here is that tool usage is optional. This makes the pydantic-ai approach of validating and parsing types unreliable and non-deterministic.

There exists a tool_choice=required parameter in the OpenAI API but it's not supported in ollama as of today. From the ollama tool blog post:

> Future improvements [..] Tool choice: force a model to use a tool

I see the following ways forward

  1. Change the documentation and make it clear that validation and parsing may or may not happen and that especially with smaller models it appears like it's often failing due to lack of tool usage. This undermines the main selling point of pydantic-ai
  2. Look into and implement the structured output approach which, I believe, should make validation and parsing deterministic

Note: I have only looked at the ollama model as implemented in pydantic-ai. I do not know if other models are affected by this, too, or if their output is deterministically constrained.

@samuelcolvin
Copy link
Member

Look into and implement the structured output approach which, I believe, should make validation and parsing deterministic

Is this part of the OpenAI API, or would we need a dedicated model for it?

Happy to consumer consider both, just trying to understand.

@samuelcolvin
Copy link
Member

Looks like we should switch from the OpenAI compatibility API to using the Ollama python library.

@renkehohl
Copy link

As I was also facing several validation errors when working with Ollama, I have made an own implementation of the OllamaModel utilizing Ollama's new Structured Output feature. If you are interested in it, I can contribute it here.

@gt732
Copy link

gt732 commented Dec 18, 2024

@renkehohl do you mind sharing? I'm running into the same issue running local models it sometimes works but its random. I'm mostly testing with qwen models.

@renkehohl
Copy link

@gt732 here is the gist: https://gist.github.com/renkehohl/407cdcdd3bfc8d0baee3783782b31e3d

Usage is same as PydanticAI's OllamaModel:

from pydantic_ai import Agent
from typing import List
from ollama_model import OllamaModel

agent = Agent(
     model=OllamaModel(model_name="llama3.1"),
     result_type=List[str]
)

Please keep in mind, that I've not implemented stream responses yet.

@christopherfowers
Copy link

I don't see Message,ModelAnyResponse, ModelStructuredResponse, ToolCall, or ModelTextResponse on pydantic_ai.messages at all. Nor do I see any of these types noted in the messages documentation. What am I missing to be able to replicate your gist @renkehohl?

@renkehohl
Copy link

@christopherfowers the message format has been changed in the commit from December 15th. I was using PydanticAI at version 0.0.12, so my implementation needs to get updated for later versions.

@gabrielgrant
Copy link

gabrielgrant commented Dec 19, 2024

@samuelcolvin to answer your question from earlier about whether this is something OpenAI also supports: yes, they have structured outputs (as of end of Aug 2024 iirc?): https://platform.openai.com/docs/guides/structured-outputs

This avoids the whole roundabout method of having to ask for a function call when you know that you just want a response to always conform to a specific schema. Would recommend looking into switching to this for OpenAI calls too (this is their official recommendation)

Their docs on it are a bit lacking. That page shows using the client.beta.chat.completions.parse() call, which accepts a pydantic BaseModel directly in the response_format arg, and returns a populated model instance (iirc from looking at the code a while ago it does a bit more work behind the scenes).

Not documented on that page, but you can also just use openai.chat.completions.create() and pass a JSON schema into response_format directly:

response_format: {
        // See /docs/guides/structured-outputs
        type: "json_schema",
        json_schema: {
            name: "email_schema",
            schema: {
                type: "object",
                properties: {
                    email: {
                        description: "The email address that appears in the input",
                        type: "string"
                    }
                },
                additionalProperties: false
            }
        }
    }

@samuelcolvin
Copy link
Member

Looks like we should switch from the OpenAI compatibility API to using the Ollama python library.

This is blocked on ollama/ollama-python#380.

@andrewdmalone
Copy link
Contributor

andrewdmalone commented Jan 9, 2025

FWIW to add to the discussion - the OpenAI library's formatted response feature works just fine with Ollama. Here's a modification of the example from the OpenAI Python docs. The "client.beta.chat.completions.parse" form's BaseModel integration is very nice compared to the "JSON Mode" provided via "client.chat.completions.create".

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(api_key="ollama", base_url='http://localhost:11434/v1/')

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="llama3.2",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

Value of "event" using Ollama 3.2:

CalendarEvent(name='Event Name', date='Friday', participants=['Alice', 'Bob'])

Seems fine with nested models too:

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(api_key="ollama", base_url='http://localhost:11434/v1/')

class Country(BaseModel):
    name: str
    abbreviation: str

class State(BaseModel):
    name: str
    abbreviation: str

class City(BaseModel):
    name: str
    nickname: str

class Location(BaseModel):
    city: City
    state: State
    country: Country

completion = client.beta.chat.completions.parse(
    model="llama3.2",
    messages=[
        {"role": "system", "content": "Extract the location information."},
        {"role": "user", "content": "I am in the windy city - Chicago, Illinois - in the United States of America"},
    ],
    response_format=Location,
)

event = completion.choices[0].message.parsed

Value of "event" using Ollama 3.2:

Location(city=City(name='Chicago', nickname='The Windy City'), state=State(name='Illinois', abbreviation='IL'), country=Country(name='United States of America', abbreviation='USA'))

@Finndersen
Copy link

BTW both Anthropic and Gemini also support structured outputs - seems like it's the industry standard approach that should be used instead of via tool calling?

@gabrielgrant
Copy link

gabrielgrant commented Jan 13, 2025

@Finndersen I don't think Anthropic currently supports forcing structured outputs in the same way as OpenAI and Gemini -- the page you've linked gives some examples of ways to encourage use of a given format (system prompt and response prefilling being the most applicable), but afaict the only way to truly ensure adherence to a specific JSON schema with Claude is to force use of a tool. This is their recommended approach:

use tools anytime you want the model to return JSON output that follows a provided schema

Expanded example here: https://docs.anthropic.com/en/docs/build-with-claude/tool-use#json-mode

@Finndersen
Copy link

Finndersen commented Jan 14, 2025

@gabrielgrant right apologies I didn't read it properly. However I'd imagine that OpenAI and Gemini's implementations are probably just an extra system prompt anyway (like the example in the Anthropic docs), so it could probably be achieved that way with Claude instead of tool calling?

Because from that page you linked:

When using tools in this way:

  • You usually want to provide a single tool

I guess it seems like it's working fine with multiple tools but maybe not ideal?

@YanSte
Copy link
Contributor

YanSte commented Jan 16, 2025

Is my bug will be solve with your new feature ?

#667

@YanSte
Copy link
Contributor

YanSte commented Jan 17, 2025

Hi,
I encountered couple of issues with Ollama.

It seems like Ollama is not very reliable, so I created this pull request to switch to LMStudio: Pull Request #705.

For those who want local LLM.

@sydney-runkle
Copy link
Member

Going to close this in favor of #582 which covers the more broad request 👍

@daniel-j-h
Copy link
Author

daniel-j-h commented Jan 25, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants