-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple models support for LLM TGI #835
base: main
Are you sure you want to change the base?
Conversation
…l field for ChatQnAGateway and LLMParams respectively
…els. Uses load_model_configs method from utils
…or different models
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
|
logger.error(f"Input model {input.model} not present in model_configs") | ||
raise ConfigError(f"Input model {input.model} not present in model_configs") | ||
|
||
llm = AsyncInferenceClient(model=llm_endpoint, timeout=600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Fixed it.
Signed-off-by: sgurunat <[email protected]>
I am confused by this PR. why we want user to pass a model_config to support different models? each OPEA microservice's instance will only support 1 model during deployment, The model_id will not be changed. and the endpoint is not configurable, it's predefined in OPEA API spec which is openai API compatible I don't think this is right requirement to support changing different models during inference request. |
Signed-off-by: sgurunat <[email protected]>
Description
To support multiple llm models for ChatQnA, the changes are incorporated into llms TGI text-generation. Multiple models can be provided in model_configs.json file which will be loaded into MODEL_CONFIGS environment variable.
Type of change
New feature (non-breaking change which adds new functionality)
##Changes
To support this the model parameter has been added in the ChatQnAGateway and LLMParams from gateway.py and docarray.py respectively.
Added load_model_configs method in utils.py to validate all the required fields ( 'model_name', 'displayName', 'endpoint', 'minToken', 'maxToken') and then load the configurations. This is added in utils so that it can be reused.
Updated llm.py from llms text-generation tgi to support multiple models and transfer the call to right endpoint.
Updated the template.py file from llms text-generation tgi to have new template for models meta-llama/Meta-Llama-3.1-70B-Instruct" and "meta-llama/Meta-Llama-3.1-8B-Instruct"