Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update chatqna.py to support vLLM embeddings #1237

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

raravena80
Copy link

Description

Changes needed to support ARM without rerank with specific embeddings endpoint input/output of vllm

Need to optionally set

export EMBEDDINGS_USE_VLLM=true

Re-opening #1232 due to DCO failing

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Test locally with API successfully

curl http://localhost:8888/v1/chatqna     -H "Content-Type: application/json"     -d '{
        "messages": "What is the revenue of Nike in 2023?"
    }'
data: b''

data: b' As'

data: b' of'

data: b' now'

data: b','

data: b' the'

data: b' exact'

data: b' revenue'

data: b' of'

data: b' N'

data: b'ike'

data: b' for'

data: b' '

data: b'2'

data: b'0'

data: b'2'

data: b'3'

data: b' is'

data: b' not'

data: b' available'

data: b'.'

data: b' However'

data: b','

data: b' you'

data: b' can'

data: b' find'

data: b' the'

data: b' latest'

data: b' revenue'

data: b' information'

data: b' by'

data: b' searching'

data: b' for'

data: b' their'

data: b' financial'

data: b' reports'

data: b' or'

data: b' visiting'

data: b' their'

data: b' official'

data: b' website'

data: b'.'

data: b''

data: b''

data: [DONE]

@raravena80 raravena80 mentioned this pull request Dec 7, 2024
4 tasks
@raravena80
Copy link
Author

@lvliang-intel @mkbhanda need a reviewer for this PR

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in addition to vLLM embedding support:
https://docs.vllm.ai/en/latest/models/supported_models.html#text-embedding-task-embed

This should include also reranking support:
https://docs.vllm.ai/en/latest/models/supported_models.html#sentence-pair-scoring-task-score

Otherwise the required changed cannot be properly reviewed.

(E.g. why potential graph nodes are not just declared once, and routing between them done based on the args, instead of everything being copy-pasted 4 times, and assumable one more copy-paste with reranker...)

Copy link

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

@@ -58,11 +58,17 @@ def generate_rag_prompt(question, documents):
LLM_SERVER_HOST_IP = os.getenv("LLM_SERVER_HOST_IP", "0.0.0.0")
LLM_SERVER_PORT = int(os.getenv("LLM_SERVER_PORT", 80))
LLM_MODEL = os.getenv("LLM_MODEL", "Intel/neural-chat-7b-v3-3")
EMBEDDINGS_MODEL_ID = os.getenv("EMBEDDINGS_MODEL_ID", "BAAI/bge-base-en-v1.5")
EMBEDDINGS_USE_VLLM = os.getenv("EMBEDDINGS_USE_VLLM", "false")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are refactoring the GenAIComps embedding code to ensure full compatibility with the OpenAI format. With this update, there is no need to differentiate vLLM in this context. All embedding requests should use the endpoint /v1/embeddings and include the model parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are refactoring the GenAIComps embedding code to ensure full compatibility with the OpenAI format. With this update, there is no need to differentiate vLLM in this context. All embedding requests should use the endpoint /v1/embeddings and include the model parameter.

I found following 3 PRs:

Are there (going to be) more?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants