Initial enablement for text-embedding #758

libinta · 2025-01-30T08:53:21Z

This is the initial PR for text-embedding with --dtype bfloat16. The following models are tested with 1x and 2x mpi/ray

model
BAAI/bge-base-en-v1.5
BAAI/bge-multilingual-gemma2
intfloat/e5-mistral-7b-instruct
ssmits/Qwen2-7B-Instruct-embed-base
Alibaba-NLP/gte-Qwen2-7B-instruct

There is no Lora support yet.
has dependency with HabanaAI/vllm-hpu-extension#87

offline test use script from https://docs.vllm.ai/en/latest/getting_started/examples/embedding.html with adding dtype="bfloat16" in LLM call
sample cmds such as
python testscript.py
VLLM_SKIP_WARMUP=true python testscript.py

online test use client script from https://docs.vllm.ai/en/latest/getting_started/examples/openai_embedding_client.html
server sample cmds such as
vllm serve Alibaba-NLP/gte-Qwen2-7B-instruct --task embed --dtype bfloat16
VLLM_SKIP_WARMUP=true VLLM_PROMPT_USE_FUSEDSDPA=true vllm serve Alibaba-NLP/gte-Qwen2-7B-instruct --task embed --dtype bfloat16

…such as bert,roberta, bart.

….inf caused nan for softmax calculation.

…pool

libinta added 12 commits January 23, 2025 23:50

Initial draft to enable embedding task.

56b42c3

remove ENCODER_ONLY

b62d611

Added support for embedding model with self attention without causal …

a647baa

…such as bert,roberta, bart.

Change set_attn_bias padding element from -math.inf to -3e38 as -math…

46e1aad

….inf caused nan for softmax calculation.

rewrite is_causal and add dbg msg

2f74e6b

update maskoff value

99947c8

fix wrong base mask

094294c

cleanup code

1c7416f

cleanup code

c6cdae1

cleanup code

8ac281b

Add pooler support for padded batch inputs for hpu with CLSPoll, Last…

e72c2f0

…pool

add meanpool for padded input

7c1c74b

libinta requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners January 30, 2025 08:53

libinta added 3 commits January 30, 2025 00:57

revert bert change

5c49ca1

modify meanpool for padded input

ae6fbe0

write is_pooler function

d65340a

libinta force-pushed the dev/enable_embedding_ace branch from c0da1e0 to d65340a Compare January 31, 2025 20:19

libinta added 8 commits January 31, 2025 21:11

fix is_causal logic

0c28519

Set is_causal based on attn_type

1fe398f

Set is_causal based on attn_type

c3a92f3

fix with warmup issue

55ae676

fix cpu test issue and format

787700b

fix code format

6f02b86

Merge branch 'habana_main' into dev/enable_embedding_ace

b97f7c6

fix hpu attn coding issue

593ded0

This was referenced Feb 5, 2025

Enable roberta embedding #785

Closed

Enable roberta embedding #786

Open

add support for batch padding

1185c2e

libinta force-pushed the dev/enable_embedding_ace branch from 30f43b5 to 1185c2e Compare February 7, 2025 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial enablement for text-embedding #758

Initial enablement for text-embedding #758

libinta commented Jan 30, 2025 •

edited by github-actions bot

Loading

Initial enablement for text-embedding #758

Are you sure you want to change the base?

Initial enablement for text-embedding #758

Conversation

libinta commented Jan 30, 2025 • edited by github-actions bot Loading

libinta commented Jan 30, 2025 •

edited by github-actions bot

Loading