Name		Name	Last commit message	Last commit date
parent directory ..
116-sparsity-optimization.ipynb		116-sparsity-optimization.ipynb
README.md		README.md

README.md

Accelerate Inference of Sparse Transformer Models with OpenVINO™ and 4th Gen Intel® Xeon® Scalable Processors

This tutorial demonstrates how to improve performance of sparse Transformer models with OpenVINO on 4th Gen Intel® Xeon® Scalable processors. It uses a pre-trained model from the Hugging Face Transformers library and shows how to convert it to the OpenVINO™ IR format and run inference on a CPU, using a dedicated runtime option that enables sparsity optimizations. It also demonstrates how to get more performance stacking sparsity with 8-bit quantization. To simplify the user experience, the Hugging Face Optimum library is used to convert the model to the OpenVINO™ IR format and quantize it using Neural Network Compression Framework.

NOTE: This tutorial requires OpenVINO 2022.3 or newer and 4th Gen Intel® Xeon® Scalable processor that can be acquired on Amazon Web Services (AWS).

Notebook Contents

The tutorial consists of the following steps:

Download and quantize sparse the public BERT model, using OpenVINO integration with Hugging Face Optimum.
Compare sparse 8-bit vs. dense 8-bit inference performance.

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

116-sparsity-optimization

116-sparsity-optimization

README.md

Accelerate Inference of Sparse Transformer Models with OpenVINO™ and 4th Gen Intel® Xeon® Scalable Processors

Notebook Contents

Installation Instructions

Files

116-sparsity-optimization

Directory actions

More options

Directory actions

More options

Latest commit

History

116-sparsity-optimization

Folders and files

parent directory

README.md

Accelerate Inference of Sparse Transformer Models with OpenVINO™ and 4th Gen Intel® Xeon® Scalable Processors

Notebook Contents

Installation Instructions