add speculatice decoding to opea #617

ClarkChin08 · 2024-09-05T01:52:16Z

Description

Add one micro-service speculative decoding draft code.

Issues

speculative decoding support on cpu and gpu.

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
[ * ] New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

Used forked vLLM from https://github.com/jiqing-feng/vllm.git

Tests

Have tested on the test/test_spec_decode_text-generation_vllm.sh

chensuyue · 2024-09-11T13:52:22Z

comps/spec_decode/text-generation/vllm/docker/Dockerfile

@@ -0,0 +1,226 @@
+# Copyright (C) 2024 Intel Corporation


Need to refine follow the new code structure, there should be no docker folder.
So this file should named as Dockerfile.nvidia_gpu.

chensuyue · 2024-09-11T13:52:55Z

comps/spec_decode/text-generation/vllm/docker/Dockerfile.cpu

@@ -0,0 +1,53 @@
+# This vLLM Dockerfile is used to construct image that can build and run vLLM on x86 CPU platform.


Dockerfile.cpu -> Dockerfile

letonghan · 2024-09-12T02:14:37Z

Hi @ClarkChin08 , thanks for contributing.
The code structure of this PR should align with llm/text-generation/vllm/langchain.
Note:

Create a langchain folder inside spec_decode/text-generation/vllm.
Reorg all files in spec_decode/text-generation/vllm/langchain folder.
Create a dependency folder in the deepest folder, move vllm dependency Dockerfile, xx.sh into it.
Rename all Dockerfiles refer to the link above.
Add a spec_decode folder in tests and move the test script into it.

Thank you : )

Signed-off-by: chensuyue <[email protected]>

Signed-off-by: Chen Xi <[email protected]>

for more information, see https://pre-commit.ci

codecov · 2024-09-25T05:37:02Z

Codecov Report

Attention: Patch coverage is 77.14286% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
comps/cores/mega/gateway.py	27.27%	8 Missing ⚠️

Files with missing lines	Coverage Δ
comps/cores/mega/constants.py	`98.38% <100.00%> (+0.08%)`	⬆️
comps/cores/proto/docarray.py	`99.43% <100.00%> (+0.07%)`	⬆️
comps/cores/mega/gateway.py	`35.10% <27.27%> (-0.22%)`	⬇️

... and 1 file with indirect coverage changes

Signed-off-by: Chen Xi <[email protected]>

for more information, see https://pre-commit.ci

ClarkChin08 force-pushed the spec_decode branch from a5bbe3f to 66348ea Compare September 5, 2024 01:54

chensuyue reviewed Sep 11, 2024

View reviewed changes

lkk12014402 pushed a commit that referenced this pull request Sep 19, 2024

update workflow name (#617)

3363a37

Signed-off-by: chensuyue <[email protected]>

ClarkChin08 force-pushed the spec_decode branch from 20fefea to 13d8f85 Compare September 25, 2024 05:33

ClarkChin08 requested a review from lvliang-intel as a code owner September 25, 2024 05:33

ClarkChin08 and others added 3 commits September 25, 2024 13:35

add speculatice decoding to opea

21a2bf4

Signed-off-by: Chen Xi <[email protected]>

add dispatched drafter and scorer to spec_decode

13d8f85

Signed-off-by: Chen Xi <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

193aaba

for more information, see https://pre-commit.ci

ClarkChin08 and others added 2 commits September 25, 2024 14:13

add dispatched scorer to the spec decode

a96c6eb

Signed-off-by: Chen Xi <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

72dcc2e

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add speculatice decoding to opea #617

add speculatice decoding to opea #617

ClarkChin08 commented Sep 5, 2024

chensuyue Sep 11, 2024

chensuyue Sep 11, 2024

letonghan commented Sep 12, 2024

codecov bot commented Sep 25, 2024 •

edited

Loading

		@@ -0,0 +1,53 @@
		# This vLLM Dockerfile is used to construct image that can build and run vLLM on x86 CPU platform.

add speculatice decoding to opea #617

Are you sure you want to change the base?

add speculatice decoding to opea #617

Conversation

ClarkChin08 commented Sep 5, 2024

Description

Issues

Type of change

Dependencies

Tests

chensuyue Sep 11, 2024

Choose a reason for hiding this comment

chensuyue Sep 11, 2024

Choose a reason for hiding this comment

letonghan commented Sep 12, 2024

codecov bot commented Sep 25, 2024 • edited Loading

Codecov Report

codecov bot commented Sep 25, 2024 •

edited

Loading