You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature Objective:
Set vLLM as the default serving framework on Gaudi to leverage its optimized performance characteristics, thereby improving throughput and reducing latency in inference tasks.
Feature Details:
Replace TGI with vLLM as the default serving backend for inference on Gaudi devices.
Update serving configurations to align with vLLM's architecture for inference.
Perform performance benchmarking to validate vLLM's superiority in terms of TTFT, TPOT and scalability on Gaudi hardware.
Expected Outcome:
Adopting vLLM as the default framework improves the user experience by significantly lowering latency while exceeding the current TGI throughput levels on Gaudi.
The text was updated successfully, but these errors were encountered:
Priority
P1-Stopper
OS type
Ubuntu
Hardware type
Gaudi2
Running nodes
Single Node
Description
Feature Objective:
Set vLLM as the default serving framework on Gaudi to leverage its optimized performance characteristics, thereby improving throughput and reducing latency in inference tasks.
Feature Details:
Replace TGI with vLLM as the default serving backend for inference on Gaudi devices.
Update serving configurations to align with vLLM's architecture for inference.
Perform performance benchmarking to validate vLLM's superiority in terms of TTFT, TPOT and scalability on Gaudi hardware.
Expected Outcome:
Adopting vLLM as the default framework improves the user experience by significantly lowering latency while exceeding the current TGI throughput levels on Gaudi.
The text was updated successfully, but these errors were encountered: