This tutorial demonstrates how to use NNCF 8-bit quantization in post-training mode (without the fine-tuning pipeline) to optimize a PyTorch model for high-speed inference via OpenVINO Toolkit. For more advanced NNCF usage, refer to these examples.
To speed up download and validation, this tutorial uses a pre-trained ResNet-50 model on the Tiny ImageNet dataset.
The tutorial consists of the following steps:
- Evaluating the original model.
- Transforming the original
FP32
model toINT8
. - Exporting optimized and original models to ONNX and then to OpenVINO IR.
- Comparing performance of the obtained
FP32
andINT8
models.
If you have not installed all required dependencies, follow the Installation Guide.