-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎅 I WISH LITELLM HAD... #361
Comments
[LiteLLM Client] Add new models via UI Thinking aloud it seems intuitive that you'd be able to add new models / remap completion calls to different models via UI. Unsure on real problem though. |
User / API Access Management Different users have access to different models. It'd be helpful if there was a way to maybe leverage the BudgetManager to gate access. E.g. GPT-4 is expensive, i don't want to expose that to my free users but i do want my paid users to be able to use it. |
cc: @yujonglee @WilliamEspegren @zakhar-kogan @ishaan-jaff @PhucTranThanh feel free to add any requests / ideas here. |
[Spend Dashboard] View analytics for spend per llm and per user
|
Auto select the best LLM for a given task If it's a simple task like responding to "hello" litlellm should auto-select a cheaper but faster llm like j2-light |
Integration with NLP Cloud |
That's awesome @Pipboyguy - dm'ing on linkedin to learn more! |
@ishaan-jaff check out this truncate param in the cohere api This looks super interesting. Similar to your token trimmer. If the prompt exceeds context window, trim in a particular manner. I would maybe only run trimming on user/assistant messages. Not touch the system prompt (works for RAG scenarios as well). |
Option to use Inference API so we can use any model from Hugging Face 🤗 |
@haseeb-heaven you can already do this -
from litellm import completion
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) |
Wow great thanks its working. Nice feature |
Support for inferencing using models hosted on Petals swarms (https://github.com/bigscience-workshop/petals), both public and private. |
@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests |
finetuning wrapper for openai, huggingface etc. |
@shauryr i created an issue to track this - feel free to add any missing details here |
Specifically for my aims, I'm running a private swarm as a experiment with a view to implementing with in private organization, who have idle GPU resources, but it's distributed. The initial target would be inferencing and if litellm was able to be the abstraction layer, it would allow flexibility to go another direction with hosting in the future. |
I wish the litellm to have a direct support for finetuning the model. Based on the below blog post, I understand that in order to fine tune, one needs to have a specific understanding on the LLM provider and then follow their instructions or library for fine tuning the model. Why not the LiteLLM do all the abstraction and handle the fine-tuning aspects as well? https://docs.litellm.ai/docs/tutorials/finetuned_chat_gpt |
I wish LiteLLM has a support for open-source embeddings like sentence-transformers, hkunlp/instructor-large etc. Sorry, based on the below documentation, it seems there's only support for the Open AI embedding. |
I wish LiteLLM has the integration to cerebrium platform. Please check the below link for the prebuilt-models. |
@ranjancse26 what models on cerebrium do you want to use with LiteLLM ? |
@ishaan-jaff The cerebrium has got a lot of pre-built model. The focus should be on consuming the open-source models first ex: Lama 2, GPT4All, Falcon, FlanT5 etc. I am mentioning this as a first step. However, it's a good idea to have the Litellm take care of the internal communication with the custom-built models too. In-turn based on the API which the cerebrium is exposing. |
@smig23 We've added support for petals to LiteLLM https://docs.litellm.ai/docs/providers/petals |
I wish Litellm has a built-in support for the majority of the provider operations than targeting the text generation alone. Consider an example of Cohere, the below one allows users to have conversations with a Large Language Model (LLM) from Cohere. |
I wish Litellm has a ton of support and examples for users to develop apps with RAG pattern. It's kind of mandatory to go with the standard best practices and we all wish to have the same support. |
I wish Litellm has use-case driven examples for beginners. Keeping in mind of the day-to-day use-cases, it's a good idea to come up with a great sample which covers the following aspects.
|
I wish Litellm to support for various known or popular vector db's. Here are couple of them to begin with.
|
I wish Litellm has a built-in support for performing the web-scrapping or to get the real-time data using known provider like serpapi. It will be helpful for users to build the custom AI models or integrate with the LLMs for performing the retrieval augmented based generation. https://serpapi.com/blog/llms-vs-serpapi/#serpapi-google-local-results-parser |
I wish there was vision support for LLM providers that provide vision support through their official documentation. Case in point- Groq. Reference: https://console.groq.com/docs/vision |
@githubuser16384 litellm already supports vision on all models - https://docs.litellm.ai/docs/completion/vision Created a ticket to add an example on groq docs for this. |
ui chat should render the output in markdown |
Admin UI chat added the model used either directly before or after the "Assistant" so that it's clear which model provided the given assistant output. |
@FireballDWF - can you leave your additional feedback here #7440 ? |
|
Also, the litellm ollama docs say you recommend |
Most companies disable API key based access as they deem it is not that secure. Instead, role based access control (RBAC) is enabled. |
@vishnu-dev this is already supported - https://docs.litellm.ai/docs/providers/azure#authentication |
@krrishdholakia does litellm support making chat completion calls to finetuned mistral ai codestral models ? |
@anmolbhatia05 is it any different to this? https://docs.litellm.ai/docs/providers/codestral if it's on vertex then it's here - https://docs.litellm.ai/docs/providers/vertex#mistral-api |
Support for Amazon titan image generator |
I would love for LiteLLM to offer a MCP proxy add-on. As someone working on AI for a large enterprise, with multiple UI experiences for LLM workflows, I have began coalescing toward an architectural mode, of LiteLLM > MCP (or home built method for hosting tool servers independent of LLM or a particular frontend > Continue.dev, Custom Chat UI, Narrow focus systems, LLM integrations to legacy enterprise systems etc… I think LiteLLM could capitalize on the middle space given your lead on in the LLM abstraction space. There are solutions such as this: https://github.com/acehoss/mcp-gateway but I would love to be able to leverage the key management and rate management capabilities that LiteLLM already has. I would be open to wrenching on a PR if this is something that ya’ll feel would fit in the product vision of LiteLLM. |
Hi @Jflick58 a PR here with testing is welcome. i don't understand MCP well enough - but open to exploring how we can help! |
I wish LiteLLM could act as a MCP Proxy so that one could call MCP enhanced LLM with normal OpenAI API protocols. |
I wish LiteLLM could support custom provider for |
It actually does. Just define your embeddings model as any other completion model, and set |
We use litellm with langfuse as our success callback. We proxy through litellm both chat completion models and embedding models, and currently everything is being logged through langfuse. I wish there was a config.yaml flag where we can disable certain models from triggering the success callback. Perhaps it's a flag in the individual model configurations themselves, or perhaps it's defined as an additional success_callback model white/black list. In our case, we love being able to observe chat completion observations but have absolutely no need for embedding observations to be passed through to langfuse. |
SQLite support. It's a tremendously useful tool to use locally, but having to run Postgres together with it on a laptop is a pain. |
I wish I could stream R1's reasoning tokens with OpenRouter, i.e see what it's thinking before it sends the output. |
Usable docs |
Hey @boosh any specific improvements we can make on docs?
Hey @V4G4X replied on ticket 👍
is this solved if we just add a supported call types for langfuse (similar to caching) - |
It's not clear what's enterprise and not tbh. Some pages suggest the router and proxy are others don't. |
Hey @boosh we have the parent list here - https://docs.litellm.ai/docs/proxy/enterprise Can we do a 20min feedback call to see how we can make things clearer? Here's my calendly, if that's helpful - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat |
@krrishdholakia having a supported call types would work amazing! |
This is a ticket to track a wishlist of items you wish LiteLLM had.
COMMENT BELOW 👇
With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs
Respond with ❤️ to any request you would also like to see
P.S.: Come say hi 👋 on the Discord
The text was updated successfully, but these errors were encountered: