Unified experience for all API requests #8144

sterankin · 2025-01-31T14:31:35Z

sterankin
Jan 31, 2025

@krrishdholakia Thanks for resolving: #8085 (comment) and I was able to get it working. But I want to run something by you to see what you think, as the above PR only gets so far to solving a problem that large corporations have.

Problem: Large companies are using multi-cloud and AI providers as they don't want to put all their eggs in one basket (i.e. OpenAI). So they need to be able to call models from different providers.

Large companies will (at the beginning) want this to be centralised through a single service, which all client use cases would use. This allows many benefits such as cost tracking, rate limiting, auth, content filtering, logging etc.

They way they do this is to stand up some sort of API Gateway to access the back end services. This intelligent gateway will route the requests to whatever service is called by the client - whether it be Bedrock, Azure OpenAI, Vertex or even self hosted Open Source.

                            | --> Bedrock
Client --> API Gateway -->  |--> Azure OpenAI
                            |--> Vertex
                            |--> Open Source Internal

Ideally Data Scientists and use case client apps would interact with this REST endpoint in the same way - and it looks to me like LiteLLM is almost there for a unified python experience.

For example, lets assume our Api Gateway - which authenticates clients with a bearer token, has the following endpoint:

https://internal-company-endpoint/

This endpoint will take any client request and send it to the correct back-end service In your latest release you included something that could be used for Bedrock Claude e.g.

claude_response = completion(
    model="bedrock/converse_like/anthropic.claude-v2",
    messages=[{"role": "user", "content": "What's AWS?"}],
    api_base="https://internal-company-endpoint/bedrock/claude-3-5-sonnet-20241022-v2",
    headers={"user": "some_unique_user_id_required_field", "Authorization": bearer_token}
)
print(claude_response.choices[0].message.content)

This now works great as LiteLLM will make the REST call and understand the converse response (as an aside note all the Nova models are missing)

Lets say another client want to call Azure OpenAI:

gpt4o_response = completion(
    model="azure/gpt-4o-2024-08-06",
    messages=[{"role": "user", "content": "What's AWS?"}],
    azure_ad_token=bearer_token,
    api_version="2024-08-01-preview",
    api_base="https://internal-company-endpoint/openai/deployments/gpt-4o-2024-08-06",
    headers={"user": "some_unique_user_id_required_field"}
)
print(gpt4o_response.choices[0].message.content)

This also works, the format is a bit different however.

Lets try embeddings:

embedding_response = embedding(
    model="azure/text-embedding-ada-002-v2",         
    api_key=bearer_token,   
    api_version="2024-02-01",
    api_base="https://internal-company-endpoint/openai/openai/deployments/text-embedding-ada-002-v2",     # set API Base of your Custom OpenAI Endpoint
    headers={"user": "some_unique_user_id_required_field"},
    input=["good morning from litellm"]
)
print(embedding_response)

This also works, again because the url is structured correctly. Perhaps LiteLLM should allow to provide a fully qualified URL to the model path and not append the model itself?

Lets try Vertex:

vertex_response = completion(
    model="azure/gemini-1.5-flash-002",
    messages=[{"role": "user", "content": "What's AWS?"}],
    azure_ad_token=bearer_token,
    api_version="2024-08-01-preview",
    api_base="https://internal-company-endpoint/vertexai/gemini-1.5-flash-002",
    extra_headers={"user": "some_unique_user_id_required_field"}
)
print(vertex_response.choices[0].message.content)

This doesn't work at all, and I don't think there would be a way to make it work here (have tried azure, gemini etc in the model path name).

Lets try a reank model which is open source hosted internally but still sits behind the same API endpoint:

query = "What is the capital of the United States?"
documents = [
    "Carson City is the capital city of the American state of Nevada.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
    "Washington, D.C. is the capital of the United States.",
    "Capital punishment has existed in the United States since before it was a country.",
]
reranker_response = rerank(
    model="cohere/bge-m3-reranker",  
    query=query,
    cohere_key=bearer_token,
    documents=documents,
    top_n=3,             
    api_base="https://internal-company-endpoint/opensource/bge-m3-reranker",  
    headers={"user": "some_unique_user_id_required_field"}
)
print(reranker_response)

This doesn't work at all either, again I am guessing because its forced to use Cohere.

It would be ideal if there way a way to unify all these different approaches and make it more generic such that a client could:

specify a fully qualified endpoint (or at least api_base and api_path)
the model type or schema type, e.g. converse, openai etc
auth type, e.g. Bearer token, or could be a header
custom header

So Litellm should be able to take any api endpoint and call it - as long as the format/expected response type is provided).

Hope this makes sense!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified experience for all API requests #8144

{{title}}

Replies: 0 comments

Select a reply

Unified experience for all API requests #8144

sterankin Jan 31, 2025

Replies: 0 comments

sterankin
Jan 31, 2025