Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Implement Model Directory #623

Open
4 tasks
barronstone opened this issue Jun 13, 2024 · 0 comments
Open
4 tasks

EPIC: Implement Model Directory #623

barronstone opened this issue Jun 13, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request EPIC ⚔️ EPIC issue to consolidate several sub-issues

Comments

@barronstone
Copy link
Collaborator

barronstone commented Jun 13, 2024

Overview

LeapfrogAI should implement a model directory to store and serve models. This would significantly reduce the size of LeapfrogAI Zarf packages, provide users with more model options, and enable the backends to dynamically swap out models for different tasks.

Background

Currently, model parameters are directly “baked in” to backend Zarf packages. This makes LeapfrogAI packages extra large, often exceeding 8GB, which causes automated deployment issues that require manual intervention to work around. Additionally, each backend, such as vllm, is only packaged with a single model. If a user wants access to multiple LLMs for different tasks (e.g., one model for chat and another for coding assistance) then they would need two vllm packages, one for each model. That current approach makes it cumbersome and impractical for users to switch between models.

Externally, we are hearing demand for users to be able to select from a collection of models to use from organizations like Platform One. This is a common and expected feature that is available in most modern AI chat interfaces. Internally, for the LeapfrogAI team to efficiently perform evaluations on a variety of models, we need a way to quickly and efficiently swap out models. Incorporating a model directory into LeapfrogAI could solve these challenges - decoupling model parameters from Zarf packages and enabling users to dynamically select which models to use from those available in the model directory.

Goals

  • Models are no longer directly “baked in” to the backend Zarf packages
  • Multiple models can be included in the UDS bundle for initial air-gapped deployment
  • The model directory can store and serve one or more models for each of the following types:
    • Text-to-text (LLM)
    • Text-to-vector (Embeddings)
    • Speech-to-text (Whisper)
  • System administrators can add new models to the model directory of an existing LeapfrogAI deployment in an air-gapped manner (No Internet connection required)
  • Users can select from available LLM models for an Assistant to use via the GUI and the API

User Stories

As a delivery engineer deploying LeapfrogAI
I want the models to be separate from the backend Zarf packages
So that I can easily choose which models to deploy without having to rebuild packages, and
So that the packages are smaller and therefore do not require manual steps to push them

As a LeapfrogAI end user
I want to be able to select from multiple LLMs
So that I can choose the best model to use for a specific task

Acceptance Criteria - TODO

Given [a state]
When [an action is taken]
Then [something happens]

Additional context

In-work technical design doc in LeapfrogAI Coda: https://coda.io/d/_dGmk3eNjmm8/Model-Directory_suuoJYJF

Tasks

Preview Give feedback
  1. ADR 🧐
    YrrepNoj
  2. python tech-debt wontfix
    jamestexas
  3. blocked 🛑 enhancement ui
    andrewrisse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request EPIC ⚔️ EPIC issue to consolidate several sub-issues
Projects
None yet
Development

No branches or pull requests

3 participants