-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add speculatice decoding to opea #617
base: main
Are you sure you want to change the base?
Conversation
a5bbe3f
to
66348ea
Compare
@@ -0,0 +1,226 @@ | |||
# Copyright (C) 2024 Intel Corporation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to refine follow the new code structure, there should be no docker
folder.
So this file should named as Dockerfile.nvidia_gpu
.
@@ -0,0 +1,53 @@ | |||
# This vLLM Dockerfile is used to construct image that can build and run vLLM on x86 CPU platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dockerfile.cpu
-> Dockerfile
Hi @ClarkChin08 , thanks for contributing.
Thank you : ) |
Signed-off-by: chensuyue <[email protected]>
20fefea
to
13d8f85
Compare
Signed-off-by: Chen Xi <[email protected]>
Signed-off-by: Chen Xi <[email protected]>
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
|
Signed-off-by: Chen Xi <[email protected]>
for more information, see https://pre-commit.ci
Description
Add one micro-service speculative decoding draft code.
Issues
speculative decoding support on cpu and gpu.
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
Used forked vLLM from https://github.com/jiqing-feng/vllm.git
Tests
Have tested on the test/test_spec_decode_text-generation_vllm.sh