Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial enablement for text-embedding #758

Open
wants to merge 24 commits into
base: habana_main
Choose a base branch
from

Conversation

libinta
Copy link

@libinta libinta commented Jan 30, 2025

This is the initial PR for text-embedding with --dtype bfloat16. The following models are tested with 1x and 2x mpi/ray

model
BAAI/bge-base-en-v1.5
BAAI/bge-multilingual-gemma2
intfloat/e5-mistral-7b-instruct
ssmits/Qwen2-7B-Instruct-embed-base
Alibaba-NLP/gte-Qwen2-7B-instruct

There is no Lora support yet.
has dependency with HabanaAI/vllm-hpu-extension#87

offline test use script from https://docs.vllm.ai/en/latest/getting_started/examples/embedding.html with adding dtype="bfloat16" in LLM call
sample cmds such as
python testscript.py
VLLM_SKIP_WARMUP=true python testscript.py

online test use client script from https://docs.vllm.ai/en/latest/getting_started/examples/openai_embedding_client.html
server sample cmds such as
vllm serve Alibaba-NLP/gte-Qwen2-7B-instruct --task embed --dtype bfloat16
VLLM_SKIP_WARMUP=true VLLM_PROMPT_USE_FUSEDSDPA=true vllm serve Alibaba-NLP/gte-Qwen2-7B-instruct --task embed --dtype bfloat16

@libinta libinta force-pushed the dev/enable_embedding_ace branch from c0da1e0 to d65340a Compare January 31, 2025 20:19
This was referenced Feb 5, 2025
@libinta libinta force-pushed the dev/enable_embedding_ace branch from 30f43b5 to 1185c2e Compare February 7, 2025 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant