You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@krrishdholakia Thanks for resolving: #8085 (comment) and I was able to get it working. But I want to run something by you to see what you think, as the above PR only gets so far to solving a problem that large corporations have.
Problem: Large companies are using multi-cloud and AI providers as they don't want to put all their eggs in one basket (i.e. OpenAI). So they need to be able to call models from different providers.
Large companies will (at the beginning) want this to be centralised through a single service, which all client use cases would use. This allows many benefits such as cost tracking, rate limiting, auth, content filtering, logging etc.
They way they do this is to stand up some sort of API Gateway to access the back end services. This intelligent gateway will route the requests to whatever service is called by the client - whether it be Bedrock, Azure OpenAI, Vertex or even self hosted Open Source.
| --> Bedrock
Client --> API Gateway --> |--> Azure OpenAI
|--> Vertex
|--> Open Source Internal
Ideally Data Scientists and use case client apps would interact with this REST endpoint in the same way - and it looks to me like LiteLLM is almost there for a unified python experience.
For example, lets assume our Api Gateway - which authenticates clients with a bearer token, has the following endpoint:
This endpoint will take any client request and send it to the correct back-end service In your latest release you included something that could be used for Bedrock Claude e.g.
This also works, the format is a bit different however.
Lets try embeddings:
embedding_response = embedding(
model="azure/text-embedding-ada-002-v2",
api_key=bearer_token,
api_version="2024-02-01",
api_base="https://internal-company-endpoint/openai/openai/deployments/text-embedding-ada-002-v2", # set API Base of your Custom OpenAI Endpoint
headers={"user": "some_unique_user_id_required_field"},
input=["good morning from litellm"]
)
print(embedding_response)
This also works, again because the url is structured correctly. Perhaps LiteLLM should allow to provide a fully qualified URL to the model path and not append the model itself?
This doesn't work at all, and I don't think there would be a way to make it work here (have tried azure, gemini etc in the model path name).
Lets try a reank model which is open source hosted internally but still sits behind the same API endpoint:
query = "What is the capital of the United States?"
documents = [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
]
reranker_response = rerank(
model="cohere/bge-m3-reranker",
query=query,
cohere_key=bearer_token,
documents=documents,
top_n=3,
api_base="https://internal-company-endpoint/opensource/bge-m3-reranker",
headers={"user": "some_unique_user_id_required_field"}
)
print(reranker_response)
This doesn't work at all either, again I am guessing because its forced to use Cohere.
It would be ideal if there way a way to unify all these different approaches and make it more generic such that a client could:
specify a fully qualified endpoint (or at least api_base and api_path)
the model type or schema type, e.g. converse, openai etc
auth type, e.g. Bearer token, or could be a header
custom header
So Litellm should be able to take any api endpoint and call it - as long as the format/expected response type is provided).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
@krrishdholakia Thanks for resolving: #8085 (comment) and I was able to get it working. But I want to run something by you to see what you think, as the above PR only gets so far to solving a problem that large corporations have.
Problem: Large companies are using multi-cloud and AI providers as they don't want to put all their eggs in one basket (i.e. OpenAI). So they need to be able to call models from different providers.
Large companies will (at the beginning) want this to be centralised through a single service, which all client use cases would use. This allows many benefits such as cost tracking, rate limiting, auth, content filtering, logging etc.
They way they do this is to stand up some sort of API Gateway to access the back end services. This intelligent gateway will route the requests to whatever service is called by the client - whether it be Bedrock, Azure OpenAI, Vertex or even self hosted Open Source.
Ideally Data Scientists and use case client apps would interact with this REST endpoint in the same way - and it looks to me like LiteLLM is almost there for a unified python experience.
For example, lets assume our Api Gateway - which authenticates clients with a bearer token, has the following endpoint:
https://internal-company-endpoint/
This endpoint will take any client request and send it to the correct back-end service In your latest release you included something that could be used for Bedrock Claude e.g.
This now works great as LiteLLM will make the REST call and understand the converse response (as an aside note all the Nova models are missing)
Lets say another client want to call Azure OpenAI:
This also works, the format is a bit different however.
Lets try embeddings:
This also works, again because the url is structured correctly. Perhaps LiteLLM should allow to provide a fully qualified URL to the model path and not append the model itself?
Lets try Vertex:
This doesn't work at all, and I don't think there would be a way to make it work here (have tried azure, gemini etc in the model path name).
Lets try a reank model which is open source hosted internally but still sits behind the same API endpoint:
This doesn't work at all either, again I am guessing because its forced to use Cohere.
It would be ideal if there way a way to unify all these different approaches and make it more generic such that a client could:
So Litellm should be able to take any api endpoint and call it - as long as the format/expected response type is provided).
Hope this makes sense!
Beta Was this translation helpful? Give feedback.
All reactions