-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update chatqna.py to support vLLM embeddings #1237
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ricardo Aravena <[email protected]>
@lvliang-intel @mkbhanda need a reviewer for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in addition to vLLM embedding support:
https://docs.vllm.ai/en/latest/models/supported_models.html#text-embedding-task-embed
This should include also reranking support:
https://docs.vllm.ai/en/latest/models/supported_models.html#sentence-pair-scoring-task-score
Otherwise the required changed cannot be properly reviewed.
(E.g. why potential graph nodes are not just declared once, and routing between them done based on the args, instead of everything being copy-pasted 4 times, and assumable one more copy-paste with reranker...)
Dependency Review✅ No vulnerabilities or license issues found.Scanned Files |
@@ -58,11 +58,17 @@ def generate_rag_prompt(question, documents): | |||
LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0") | |||
LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 80)) | |||
LLM_MODEL = os.getenv("LLM_MODEL", "Intel/neural-chat-7b-v3-3") | |||
EMBEDDINGS_MODEL_ID = os.getenv("EMBEDDINGS_MODEL_ID", "BAAI/bge-base-en-v1.5") | |||
EMBEDDINGS_USE_VLLM = os.getenv("EMBEDDINGS_USE_VLLM", "false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are refactoring the GenAIComps embedding code to ensure full compatibility with the OpenAI format. With this update, there is no need to differentiate vLLM in this context. All embedding requests should use the endpoint /v1/embeddings
and include the model
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are refactoring the GenAIComps embedding code to ensure full compatibility with the OpenAI format. With this update, there is no need to differentiate vLLM in this context. All embedding requests should use the endpoint
/v1/embeddings
and include themodel
parameter.
I found following 3 PRs:
- Define embedding/ranking/llm request/response format GenAIComps#289
- compatible with openai/tgi/vllm request format GenAIComps#275
- update tei embedding format. GenAIComps#1035
Are there (going to be) more?
Description
Changes needed to support ARM without rerank with specific embeddings endpoint input/output of vllm
Need to optionally set
Re-opening #1232 due to DCO failing
Issues
n/a
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
List the newly introduced 3rd party dependency if exists.
Tests
Test locally with API successfully