Update chatqna.py to support vLLM embeddings #1237

raravena80 · 2024-12-07T00:27:27Z

Description

Changes needed to support ARM without rerank with specific embeddings endpoint input/output of vllm

Need to optionally set

export EMBEDDINGS_USE_VLLM=true

Re-opening #1232 due to DCO failing

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Test locally with API successfully

curl http://localhost:8888/v1/chatqna     -H "Content-Type: application/json"     -d '{
        "messages": "What is the revenue of Nike in 2023?"
    }'
data: b''

data: b' As'

data: b' of'

data: b' now'

data: b','

data: b' the'

data: b' exact'

data: b' revenue'

data: b' of'

data: b' N'

data: b'ike'

data: b' for'

data: b' '

data: b'2'

data: b'0'

data: b'2'

data: b'3'

data: b' is'

data: b' not'

data: b' available'

data: b'.'

data: b' However'

data: b','

data: b' you'

data: b' can'

data: b' find'

data: b' the'

data: b' latest'

data: b' revenue'

data: b' information'

data: b' by'

data: b' searching'

data: b' for'

data: b' their'

data: b' financial'

data: b' reports'

data: b' or'

data: b' visiting'

data: b' their'

data: b' official'

data: b' website'

data: b'.'

data: b''

data: b''

data: [DONE]

Signed-off-by: Ricardo Aravena <[email protected]>

raravena80 · 2024-12-09T21:22:15Z

@lvliang-intel @mkbhanda need a reviewer for this PR

eero-t

I think that in addition to vLLM embedding support:
https://docs.vllm.ai/en/latest/models/supported_models.html#text-embedding-task-embed

This should include also reranking support:
https://docs.vllm.ai/en/latest/models/supported_models.html#sentence-pair-scoring-task-score

Otherwise the required changed cannot be properly reviewed.

(E.g. why potential graph nodes are not just declared once, and routing between them done based on the args, instead of everything being copy-pasted 4 times, and assumable one more copy-paste with reranker...)

github-actions · 2024-12-18T02:26:22Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

lvliang-intel · 2024-12-18T02:32:50Z

ChatQnA/chatqna.py

@@ -58,11 +58,17 @@ def generate_rag_prompt(question, documents):
 LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0")
 LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 80))
 LLM_MODEL = os.getenv("LLM_MODEL", "Intel/neural-chat-7b-v3-3")
+EMBEDDINGS_MODEL_ID = os.getenv("EMBEDDINGS_MODEL_ID", "BAAI/bge-base-en-v1.5")
+EMBEDDINGS_USE_VLLM = os.getenv("EMBEDDINGS_USE_VLLM", "false")


We are refactoring the GenAIComps embedding code to ensure full compatibility with the OpenAI format. With this update, there is no need to differentiate vLLM in this context. All embedding requests should use the endpoint /v1/embeddings and include the model parameter.

We are refactoring the GenAIComps embedding code to ensure full compatibility with the OpenAI format. With this update, there is no need to differentiate vLLM in this context. All embedding requests should use the endpoint /v1/embeddings and include the model parameter.

I found following 3 PRs:

Define embedding/ranking/llm request/response format GenAIComps#289

compatible with openai/tgi/vllm request format GenAIComps#275

update tei embedding format. GenAIComps#1035

Are there (going to be) more?

Update chatqna.py to support vLLM embeddings

54b4329

Signed-off-by: Ricardo Aravena <[email protected]>

raravena80 requested a review from lvliang-intel as a code owner December 7, 2024 00:27

raravena80 mentioned this pull request Dec 7, 2024

Optionally support ARM setup #1232

Closed

4 tasks

Merge branch 'main' into patch-2

00a090a

eero-t suggested changes Dec 11, 2024

View reviewed changes

eero-t mentioned this pull request Dec 17, 2024

Add vLLM+HPA support to ChatQnA Helm chart opea-project/GenAIInfra#610

Merged

2 tasks

chensuyue requested a review from lkk12014402 December 18, 2024 02:23

Merge branch 'main' into patch-2

3b2d63e

lvliang-intel reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update chatqna.py to support vLLM embeddings #1237

Update chatqna.py to support vLLM embeddings #1237

raravena80 commented Dec 7, 2024

raravena80 commented Dec 9, 2024

eero-t left a comment

github-actions bot commented Dec 18, 2024

lvliang-intel Dec 18, 2024

raravena80 Dec 18, 2024

eero-t Dec 18, 2024

Update chatqna.py to support vLLM embeddings #1237

Are you sure you want to change the base?

Update chatqna.py to support vLLM embeddings #1237

Conversation

raravena80 commented Dec 7, 2024

Description

Issues

Type of change

Dependencies

Tests

raravena80 commented Dec 9, 2024

eero-t left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 18, 2024

Dependency Review

Scanned Files

lvliang-intel Dec 18, 2024

Choose a reason for hiding this comment

raravena80 Dec 18, 2024

Choose a reason for hiding this comment

eero-t Dec 18, 2024

Choose a reason for hiding this comment