From 1ac843a37263f6a1611f3416cf545ace14e03b22 Mon Sep 17 00:00:00 2001 From: Michael Wyatt Date: Fri, 19 Jan 2024 15:00:46 -0800 Subject: [PATCH] Update README.md --- blogs/deepspeed-fastgen/2024-01-19/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blogs/deepspeed-fastgen/2024-01-19/README.md b/blogs/deepspeed-fastgen/2024-01-19/README.md index 98a9346441a4..9a5c8a83df46 100644 --- a/blogs/deepspeed-fastgen/2024-01-19/README.md +++ b/blogs/deepspeed-fastgen/2024-01-19/README.md @@ -29,7 +29,7 @@ Today, we are happy to share that we are improving DeepSpeed-FastGen along three - **Performance Optimizations** - We drastically reduced the scheduling overhead of Dynamic SplitFuse and increased the efficiency of token sampling. As a result, we see higher throughput and lower latency, particularly when handling concurrent requests from many clients. We demonstrate the performance optimizations with benchmarks and evaluation of DeepSpeed-FastGen against vLLM for the newly added model families. The benchmark results can be seen in [Performance Evaluation](#performance-evaluation) and the benchmark code is available at [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii). + We drastically reduced the scheduling overhead of Dynamic SplitFuse and increased the efficiency of token sampling. As a result, we see higher throughput and lower latency, particularly when handling concurrent requests from many clients. We demonstrate the performance optimizations with benchmarks and evaluation of DeepSpeed-FastGen against vLLM for the newly added model families. The benchmark results can be seen in [Performance Evaluation](#performance-optimizations) and the benchmark code is available at [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/benchmarks/inference/mii). - **Feature Enhancements**