From 601cfadabf9be73d87bf6b56e3065b19950369c0 Mon Sep 17 00:00:00 2001 From: Karol Blaszczak Date: Fri, 18 Aug 2023 20:00:54 +0200 Subject: [PATCH] Update prerelease_information.md (#19282) --- docs/resources/prerelease_information.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/resources/prerelease_information.md b/docs/resources/prerelease_information.md index 527b9e7f7375d2..c8a90cc5815062 100644 --- a/docs/resources/prerelease_information.md +++ b/docs/resources/prerelease_information.md @@ -44,9 +44,9 @@ Please file a github Issue on these with the label “pre-release” so we can g * CPU runtime: * Enabled weights decompression support for Large Language models (LLMs). The implementation - supports avx2 and avx512 HW targets for Intel® Core™ processors and gives up to 2x improvement - in the latency mode (FP32 VS FP32+INT8 weights comparison). For 4th Generation Intel® Xeon® - Scalable Processors (formerly Sapphire Rapids) this INT8 decompression feature gives 10-25% + supports avx2 and avx512 HW targets for Intel® Core™ processors for improved + latency mode (FP32 VS FP32+INT8 weights comparison). For 4th Generation Intel® Xeon® + Scalable Processors (formerly Sapphire Rapids) this INT8 decompression feature provides performance improvement, compared to pure BF16 inference. * Reduced memory consumption of compile model stage by moving constant folding of Transpose nodes to the CPU Runtime side.