[Feature] Enhance Gaudi Performance by Adopting vLLM as Default Serving framework #1213

lvliang-intel · 2024-11-29T07:38:16Z

Priority

P1-Stopper

OS type

Ubuntu

Hardware type

Gaudi2

Running nodes

Single Node

Description

Feature Objective:
Set vLLM as the default serving framework on Gaudi to leverage its optimized performance characteristics, thereby improving throughput and reducing latency in inference tasks.

Feature Details:

Replace TGI with vLLM as the default serving backend for inference on Gaudi devices.
Update serving configurations to align with vLLM's architecture for inference.
Perform performance benchmarking to validate vLLM's superiority in terms of TTFT, TPOT and scalability on Gaudi hardware.

Expected Outcome:
Adopting vLLM as the default framework improves the user experience by significantly lowering latency while exceeding the current TGI throughput levels on Gaudi.

lvliang-intel added the feature New feature or request label Nov 29, 2024

lvliang-intel added this to the v1.2 milestone Nov 29, 2024

lvliang-intel assigned wangkl2 Nov 29, 2024

lvliang-intel added this to OPEA Nov 29, 2024

joshuayao moved this to In progress in OPEA Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Enhance Gaudi Performance by Adopting vLLM as Default Serving framework #1213

[Feature] Enhance Gaudi Performance by Adopting vLLM as Default Serving framework #1213

lvliang-intel commented Nov 29, 2024 •

edited

Loading

[Feature] Enhance Gaudi Performance by Adopting vLLM as Default Serving framework #1213

[Feature] Enhance Gaudi Performance by Adopting vLLM as Default Serving framework #1213

Comments

lvliang-intel commented Nov 29, 2024 • edited Loading

Priority

OS type

Hardware type

Running nodes

Description

lvliang-intel commented Nov 29, 2024 •

edited

Loading