Multiple models support for LLM TGI #835

sgurunat · 2024-10-30T07:37:55Z

Description
To support multiple llm models for ChatQnA, the changes are incorporated into llms TGI text-generation. Multiple models can be provided in model_configs.json file which will be loaded into MODEL_CONFIGS environment variable.

Type of change
New feature (non-breaking change which adds new functionality)
##Changes
To support this the model parameter has been added in the ChatQnAGateway and LLMParams from gateway.py and docarray.py respectively.
Added load_model_configs method in utils.py to validate all the required fields ( 'model_name', 'displayName', 'endpoint', 'minToken', 'maxToken') and then load the configurations. This is added in utils so that it can be reused.
Updated llm.py from llms text-generation tgi to support multiple models and transfer the call to right endpoint.
Updated the template.py file from llms text-generation tgi to have new template for models meta-llama/Meta-Llama-3.1-70B-Instruct" and "meta-llama/Meta-Llama-3.1-8B-Instruct"

…l field for ChatQnAGateway and LLMParams respectively

…del_configs

…els. Uses load_model_configs method from utils

…or different models

for more information, see https://pre-commit.ci

codecov · 2024-10-30T07:40:39Z

Codecov Report

Attention: Patch coverage is 19.35484% with 25 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
comps/cores/mega/utils.py	16.66%	25 Missing ⚠️

Files with missing lines	Coverage Δ
comps/cores/mega/gateway.py	`30.24% <ø> (-2.07%)`	⬇️
comps/cores/proto/docarray.py	`99.41% <100.00%> (+<0.01%)`	⬆️
comps/cores/mega/utils.py	`22.81% <16.66%> (-1.56%)`	⬇️

... and 2 files with indirect coverage changes

lvliang-intel · 2024-10-30T13:27:14Z

comps/llms/text-generation/tgi/llm.py

+            logger.error(f"Input model {input.model} not present in model_configs")
+            raise ConfigError(f"Input model {input.model} not present in model_configs")
+
+    llm = AsyncInferenceClient(model=llm_endpoint, timeout=600)


Please check the CI test error, looks like llm_endpoint is empty string for this case, please fix it.

Thanks for the review. Fixed it.

Signed-off-by: sgurunat <[email protected]>

ftian1 · 2024-10-31T01:32:27Z

I am confused by this PR. why we want user to pass a model_config to support different models? each OPEA microservice's instance will only support 1 model during deployment, The model_id will not be changed. and the endpoint is not configurable, it's predefined in OPEA API spec which is openai API compatible

I don't think this is right requirement to support changing different models during inference request.

comps/llms/text-generation/tgi/llm.py

Signed-off-by: sgurunat <[email protected]>

sgurunat added 4 commits October 29, 2024 08:58

Update gateway and docarray from mega and proto services to have mode…

11f378f

…l field for ChatQnAGateway and LLMParams respectively

Add load_model_configs method in utils.py to validate and load the mo…

3cd528c

…del_configs

Update llms text-generation tgi file (llm.py) to support multiple mod…

d9e5a32

…els. Uses load_model_configs method from utils

Update llms text-generation tgi template to add different templates f…

c3bc176

…or different models

sgurunat requested a review from lvliang-intel as a code owner October 30, 2024 07:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

310201a

for more information, see https://pre-commit.ci

lvliang-intel requested a review from letonghan October 30, 2024 13:23

lvliang-intel reviewed Oct 30, 2024

View reviewed changes

fixed llm_endpoint empty string issue on error scenario

15aafbc

Signed-off-by: sgurunat <[email protected]>

letonghan reviewed Oct 31, 2024

View reviewed changes

comps/llms/text-generation/tgi/llm.py Outdated Show resolved Hide resolved

comps/llms/text-generation/tgi/llm.py Show resolved Hide resolved

Function to get llm_endpoint and keep the code clean

9b7deaf

Signed-off-by: sgurunat <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple models support for LLM TGI #835

Multiple models support for LLM TGI #835

sgurunat commented Oct 30, 2024

codecov bot commented Oct 30, 2024

lvliang-intel Oct 30, 2024

sgurunat Oct 30, 2024

ftian1 commented Oct 31, 2024 •

edited

Loading

Multiple models support for LLM TGI #835

Are you sure you want to change the base?

Multiple models support for LLM TGI #835

Conversation

sgurunat commented Oct 30, 2024

codecov bot commented Oct 30, 2024

Codecov Report

lvliang-intel Oct 30, 2024

Choose a reason for hiding this comment

sgurunat Oct 30, 2024

Choose a reason for hiding this comment

ftian1 commented Oct 31, 2024 • edited Loading

ftian1 commented Oct 31, 2024 •

edited

Loading