Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Enhance Gaudi Performance by Adopting vLLM as Default Serving framework #1213

Open
lvliang-intel opened this issue Nov 29, 2024 · 0 comments
Assignees
Labels
feature New feature or request
Milestone

Comments

@lvliang-intel
Copy link
Collaborator

lvliang-intel commented Nov 29, 2024

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Gaudi2

Running nodes

Single Node

Description

Feature Objective:
Set vLLM as the default serving framework on Gaudi to leverage its optimized performance characteristics, thereby improving throughput and reducing latency in inference tasks.

Feature Details:

Replace TGI with vLLM as the default serving backend for inference on Gaudi devices.
Update serving configurations to align with vLLM's architecture for inference.
Perform performance benchmarking to validate vLLM's superiority in terms of TTFT, TPOT and scalability on Gaudi hardware.

Expected Outcome:
Adopting vLLM as the default framework improves the user experience by significantly lowering latency while exceeding the current TGI throughput levels on Gaudi.

@lvliang-intel lvliang-intel added the feature New feature or request label Nov 29, 2024
@lvliang-intel lvliang-intel added this to the v1.2 milestone Nov 29, 2024
@joshuayao joshuayao moved this to In progress in OPEA Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: In progress
Development

No branches or pull requests

2 participants