Skip to content

Latest commit

 

History

History
365 lines (295 loc) · 22.3 KB

openai.md

File metadata and controls

365 lines (295 loc) · 22.3 KB

OpenAI compatibility

Note: OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama Python library, JavaScript library and REST API.

Ollama provides experimental compatibility with parts of the OpenAI API to help connect existing applications to Ollama.

Usage

OpenAI Python library

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',

    # required but ignored
    api_key='ollama',
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "",
                },
            ],
        }
    ],
    max_tokens=300,
)

completion = client.completions.create(
    model="llama3.2",
    prompt="Say this is a test",
)

list_completion = client.models.list()

model = client.models.retrieve("llama3.2")

embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)

Structured outputs

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

# Define the schema for the response
class FriendInfo(BaseModel):
    name: str
    age: int 
    is_available: bool

class FriendList(BaseModel):
    friends: list[FriendInfo]

try:
    completion = client.beta.chat.completions.parse(
        temperature=0,
        model="llama3.1:8b",
        messages=[
            {"role": "user", "content": "I have two friends. The first is Ollama 22 years old busy saving the world, and the second is Alonso 23 years old and wants to hang out. Return a list of friends in JSON format"}
        ],
        response_format=FriendList,
    )

    friends_response = completion.choices[0].message
    if friends_response.parsed:
        print(friends_response.parsed)
    elif friends_response.refusal:
        print(friends_response.refusal)
except Exception as e:
    print(f"Error: {e}")

OpenAI JavaScript library

import OpenAI from 'openai'

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1/',

  // required but ignored
  apiKey: 'ollama',
})

const chatCompletion = await openai.chat.completions.create({
    messages: [{ role: 'user', content: 'Say this is a test' }],
    model: 'llama3.2',
})

const response = await openai.chat.completions.create({
    model: "llava",
    messages: [
        {
        role: "user",
        content: [
            { type: "text", text: "What's in this image?" },
            {
            type: "image_url",
            image_url: "",
            },
        ],
        },
    ],
})

const completion = await openai.completions.create({
    model: "llama3.2",
    prompt: "Say this is a test.",
})

const listCompletion = await openai.models.list()

const model = await openai.models.retrieve("llama3.2")

const embedding = await openai.embeddings.create({
  model: "all-minilm",
  input: ["why is the sky blue?", "why is the grass green?"],
})

curl

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llava",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
               "url": ""
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

curl http://localhost:11434/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "prompt": "Say this is a test"
    }'

curl http://localhost:11434/v1/models

curl http://localhost:11434/v1/models/llama3.2

curl http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
        "model": "all-minilm",
        "input": ["why is the sky blue?", "why is the grass green?"]
    }'

Endpoints

/v1/chat/completions

Supported features

  • Chat completions
  • Streaming
  • JSON mode
  • Reproducible outputs
  • Vision
  • Tools
  • Logprobs

Supported request fields

  • model
  • messages
    • Text content
    • Image content
      • Base64 encoded image
      • Image URL
    • Array of content parts
  • frequency_penalty
  • presence_penalty
  • response_format
  • seed
  • stop
  • stream
  • stream_options
    • include_usage
  • temperature
  • top_p
  • max_tokens
  • tools
  • tool_choice
  • logit_bias
  • user
  • n

/v1/completions

Supported features

  • Completions
  • Streaming
  • JSON mode
  • Reproducible outputs
  • Logprobs

Supported request fields

  • model
  • prompt
  • frequency_penalty
  • presence_penalty
  • seed
  • stop
  • stream
  • stream_options
    • include_usage
  • temperature
  • top_p
  • max_tokens
  • suffix
  • best_of
  • echo
  • logit_bias
  • user
  • n

Notes

  • prompt currently only accepts a string

/v1/models

Notes

  • created corresponds to when the model was last modified
  • owned_by corresponds to the ollama username, defaulting to "library"

/v1/models/{model}

Notes

  • created corresponds to when the model was last modified
  • owned_by corresponds to the ollama username, defaulting to "library"

/v1/embeddings

Supported request fields

  • model
  • input
    • string
    • array of strings
    • array of tokens
    • array of token arrays
  • encoding format
  • dimensions
  • user

Models

Before using a model, pull it locally ollama pull:

ollama pull llama3.2

Default model names

For tooling that relies on default OpenAI model names such as gpt-3.5-turbo, use ollama cp to copy an existing model name to a temporary name:

ollama cp llama3.2 gpt-3.5-turbo

Afterwards, this new model name can be specified the model field:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gpt-3.5-turbo",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Setting the context size

The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a Modelfile which looks like:

FROM <some model>
PARAMETER num_ctx <context size>

Use the ollama create mymodel command to create a new model with the updated context size. Call the API with the updated model name:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mymodel",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'