diff --git a/docs/OV_Runtime_UG/hetero_execution.md b/docs/OV_Runtime_UG/hetero_execution.md index 0d90971a37d35e..ff21f9ae18a834 100644 --- a/docs/OV_Runtime_UG/hetero_execution.md +++ b/docs/OV_Runtime_UG/hetero_execution.md @@ -103,7 +103,7 @@ If you want different devices in Hetero execution to have different device-speci @endsphinxdirective -In the example above, `CPU` device is configured to enable profiling data, while only `GPU` device has configuration property to perform inference in `f16` precision, while CPU has default execution precision. +In the example above, `GPU` device is configured to enable profiling data, while only `CPU` device has configuration property to perform inference in `f32` precision, while GPU has default execution precision. ### Handling Difficult Topologies diff --git a/docs/OV_Runtime_UG/integrate_with_your_application.md b/docs/OV_Runtime_UG/integrate_with_your_application.md index 6472e9ec8c5625..5ea552737aa881 100644 --- a/docs/OV_Runtime_UG/integrate_with_your_application.md +++ b/docs/OV_Runtime_UG/integrate_with_your_application.md @@ -132,9 +132,7 @@ To learn how to change the device configuration, read the [Query device properti ### Step 3. Create an Inference Request -`ov::InferRequest` class provides methods for model inference in the OpenVINO™ Runtime. -This section demonstrates a simple pipeline, to get more information about other use cases, read the [InferRequest documentation](./ov_infer_request.md) dedicated article. -Create an infer request using the following code: +`ov::InferRequest` class provides methods for model inference in OpenVINO™ Runtime. Create an infer request using the following code (see [InferRequest detailed documentation](./ov_infer_request.md) for more details): @sphinxdirective @@ -174,7 +172,7 @@ You can use external memory to create `ov::Tensor` and use the `ov::InferRequest ### Step 5. Start Inference -OpenVINO™ Runtime supports inference in asynchronous or synchronous mode. Async API usage can improve overall frame-rate of the application, because rather than wait for inference to complete, the app can continue doing things on the host, while the accelerator is busy. You can use `ov::InferRequest::start_async()` to start model inference in the asynchronous mode and call `ov::InferRequest::wait()` to wait for the inference results: +OpenVINO™ Runtime supports inference in either synchronous or asynchronous mode. Using the Async API can improve application's overall frame-rate, because rather than wait for inference to complete, the app can keep working on the host, while the accelerator is busy. You can use `ov::InferRequest::start_async` to start model inference in the asynchronous mode and call `ov::InferRequest::wait` to wait for the inference results: @sphinxdirective @@ -192,14 +190,7 @@ OpenVINO™ Runtime supports inference in asynchronous or synchronous mode. Asyn @endsphinxdirective -The asynchronous mode supports two methods to get the inference results: - * `ov::InferRequest::wait_for()` - Waits until the specified timeout (in milliseconds) has elapsed or the inference result becomes available, whichever comes first. - * `ov::InferRequest::wait()` - Waits until the inference result becomes available. - -Both requests are thread-safe, which means they can be called from different threads without exposing erroneous behavior or producing unpredictable results. - -While the request is ongoing, all its methods except `ov::InferRequest::cancel`, `ov::InferRequest::wait` or `ov::InferRequest::wait_for` throw -the `ov::Busy` exception indicating the request is busy with computations. +This section demonstrates a simple pipeline, to get more information about other ways to perform inference, read the dedicated ["Run inference" section](./ov_infer_request.md). ### Step 6. Process the Inference Results diff --git a/docs/OV_Runtime_UG/ov_infer_request.md b/docs/OV_Runtime_UG/ov_infer_request.md index c984a4e6a92286..6b93392661bdc8 100644 --- a/docs/OV_Runtime_UG/ov_infer_request.md +++ b/docs/OV_Runtime_UG/ov_infer_request.md @@ -30,7 +30,7 @@ This class allows to set and get data for model inputs, outputs and run inferenc ### Synchronous mode -You can use `ov::InferRequest::infer()`, which blocks the application execution, to infer model in synchronous mode: +You can use `ov::InferRequest::infer`, which blocks the application execution, to infer model in the synchronous mode: @sphinxdirective @@ -50,7 +50,7 @@ You can use `ov::InferRequest::infer()`, which blocks the application execution, ### Asynchronous mode -Asynchronous mode can improve overall frame-rate of the application, because rather than wait for inference to complete, the app can continue doing things on the host, while accelerator is busy. You can use `ov::InferRequest::start_async()` to infer model in asynchronous mode: +Asynchronous mode can improve application's overall frame-rate, because rather than wait for inference to complete, the app can keep working on the host, while the accelerator is busy. You can use `ov::InferRequest::start_async` to infer model in the asynchronous mode: @sphinxdirective @@ -68,8 +68,8 @@ Asynchronous mode can improve overall frame-rate of the application, because rat @endsphinxdirective -Asynchronous mode supports two ways to wait inference results: - * `ov::InferRequest::wait_for()` - specify maximum duration in milliseconds to block for. The method is blocked until the specified timeout has elapsed, or the result becomes available, whichever comes first. +Asynchronous mode supports two ways the application waits for inference results: + * `ov::InferRequest::wait_for` - specifies the maximum duration in milliseconds to block the method. The method is blocked until the specified time has passed, or the result becomes available, whichever comes first. @sphinxdirective .. tab:: C++ @@ -85,7 +85,7 @@ Asynchronous mode supports two ways to wait inference results: :fragment: [wait_for] @endsphinxdirective - * `ov::InferRequest::wait()` - waits until inference result becomes available + * `ov::InferRequest::wait` - waits until inference result becomes available @sphinxdirective .. tab:: C++ @@ -102,10 +102,9 @@ Asynchronous mode supports two ways to wait inference results: @endsphinxdirective -Both requests are thread-safe: can be called from different threads without fearing corruption and failures. +Both methods are thread-safe. -Also InferRequest provides an functionality which allows to avoid a call of `ov::InferRequest::wait()`, in order to do it, you can use `ov::InferRequest::set_callback()` method. This method allows to set callback which will be called after completing run of InferRequest, please use weak reference of infer_request (`ov::InferRequest*`, `ov::InferRequest&`, `std::weal_ptr` and etc) in the callback, it is needed to avoid cyclic references. -For more details please take a look too [Classification Sample Async](../../samples/cpp/classification_sample_async/README.md). +When you are running several inference requests in parallel, a device can process them simultaneously, with no garauntees on the completion order. This may complicate a possible logic based on the `ov::InferRequest::wait` (unless your code needs to wait for the _all_ requests). For multi-request scenarios, consider using the `ov::InferRequest::set_callback` method to set a callback which is called upon completion of the request: @sphinxdirective @@ -123,7 +122,10 @@ For more details please take a look too [Classification Sample Async](../../samp @endsphinxdirective -You can use `ov::InferRequest::cancel()` method in case if you want to cancel the current inference request: +> **NOTE**: Use weak reference of infer_request (`ov::InferRequest*`, `ov::InferRequest&`, `std::weal_ptr`, etc.) in the callback. It is necessary to avoid cyclic references. +For more details, check [Classification Sample Async](../../samples/cpp/classification_sample_async/README.md). + +You can use the `ov::InferRequest::cancel` method if you want to abort execution of the current inference request: @sphinxdirective @@ -145,7 +147,7 @@ You can use `ov::InferRequest::cancel()` method in case if you want to cancel th `ov::InferRequest` allows to get input/output tensors by tensor name, index, port and without any arguments in case if model has only one input or output. - * `ov::InferRequest::get_input_tensor()`, `ov::InferRequest::set_input_tensor()`, `ov::InferRequest::get_output_tensor()`, `ov::InferRequest::set_output_tensor()` methods without arguments can be used to get or set input/output tensor for model with only one input/output: + * `ov::InferRequest::get_input_tensor`, `ov::InferRequest::set_input_tensor`, `ov::InferRequest::get_output_tensor`, `ov::InferRequest::set_output_tensor` methods without arguments can be used to get or set input/output tensor for model with only one input/output: @sphinxdirective .. tab:: C++ @@ -162,7 +164,7 @@ You can use `ov::InferRequest::cancel()` method in case if you want to cancel th @endsphinxdirective - * `ov::InferRequest::get_input_tensor()`, `ov::InferRequest::set_input_tensor()`, `ov::InferRequest::get_output_tensor()`, `ov::InferRequest::set_output_tensor()` methods with argument can be used to get or set input/output tensor by input/output index: + * `ov::InferRequest::get_input_tensor`, `ov::InferRequest::set_input_tensor`, `ov::InferRequest::get_output_tensor`, `ov::InferRequest::set_output_tensor` methods with argument can be used to get or set input/output tensor by input/output index: @sphinxdirective .. tab:: C++ @@ -179,7 +181,7 @@ You can use `ov::InferRequest::cancel()` method in case if you want to cancel th @endsphinxdirective - * `ov::InferRequest::get_tensor()`, `ov::InferRequest::set_tensor()` methods can be used to get or set input/output tensor by tensor name: + * `ov::InferRequest::get_tensor`, `ov::InferRequest::set_tensor` methods can be used to get or set input/output tensor by tensor name: @sphinxdirective .. tab:: C++ @@ -196,7 +198,7 @@ You can use `ov::InferRequest::cancel()` method in case if you want to cancel th @endsphinxdirective - * `ov::InferRequest::get_tensor()`, `ov::InferRequest::set_tensor()` methods can be used to get or set input/output tensor by port: + * `ov::InferRequest::get_tensor`, `ov::InferRequest::set_tensor` methods can be used to get or set input/output tensor by port: @sphinxdirective .. tab:: C++ @@ -218,7 +220,7 @@ You can use `ov::InferRequest::cancel()` method in case if you want to cancel th ### Cascade of models `ov::InferRequest` can be used to organize cascade of models. You need to have infer requests for each model. -In this case you can get output tensor from the first request using `ov::InferRequest::get_tensor()` and set it as input for the second request using `ov::InferRequest::set_tensor()`. But be careful, shared tensors across compiled models can be rewritten by the first model if the first infer request is run once again, while the second model has not started yet. +In this case you can get output tensor from the first request using `ov::InferRequest::get_tensor` and set it as input for the second request using `ov::InferRequest::set_tensor`. But be careful, shared tensors across compiled models can be rewritten by the first model if the first infer request is run once again, while the second model has not started yet. @sphinxdirective @@ -238,7 +240,7 @@ In this case you can get output tensor from the first request using `ov::InferRe ### Using of ROI tensors -It is possible to re-use shared input by several models. You do not need to allocate separate input tensor for a model if it processes a ROI object located inside of already allocated input of a previous model. For instance, when first model detects objects on a video frame (stored as input tensor) and second model accepts detected bounding boxes (ROI inside of the frame) as input. In this case, it is allowed to re-use pre-allocated input tensor (used by first model) by second model and just crop ROI without allocation of new memory using `ov::Tensor()` with passing of `ov::Tensor` and `ov::Coordinate` as parameters. +It is possible to re-use shared input by several models. You do not need to allocate separate input tensor for a model if it processes a ROI object located inside of already allocated input of a previous model. For instance, when the first model detects objects in a video frame (stored as input tensor) and the second model accepts detected bounding boxes (ROI inside of the frame) as input. In this case, it is allowed to re-use pre-allocated input tensor (used by the first model) by the second model and just crop ROI without allocation of new memory using `ov::Tensor` with passing of `ov::Tensor` and `ov::Coordinate` as parameters. @sphinxdirective diff --git a/docs/documentation.md b/docs/documentation.md index ea26b3f22ff93c..9d63bbc168bb95 100644 --- a/docs/documentation.md +++ b/docs/documentation.md @@ -26,11 +26,11 @@ :caption: Tuning for Performance :hidden: - openvino_docs_performance_benchmarks openvino_docs_optimization_guide_dldt_optimization_guide openvino_docs_MO_DG_Getting_Performance_Numbers - pot_README + openvino_docs_model_optimization_guide openvino_docs_tuning_utilities + openvino_docs_performance_benchmarks .. toctree:: diff --git a/docs/img/nncf_workflow.png b/docs/img/nncf_workflow.png new file mode 100644 index 00000000000000..53f3cc334e0334 --- /dev/null +++ b/docs/img/nncf_workflow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7a58f31b2043fe9d92892b1f40ed8a7c596c36ef9d1cd1c71adb981009161bf +size 45665 diff --git a/docs/optimization_guide/dldt_optimization_guide.md b/docs/optimization_guide/dldt_optimization_guide.md index 33b39bc1da88ac..85b899faeea867 100644 --- a/docs/optimization_guide/dldt_optimization_guide.md +++ b/docs/optimization_guide/dldt_optimization_guide.md @@ -9,32 +9,20 @@ Performance means how fast the model is in deployment. Two key metrics are used ![](../img/LATENCY_VS_THROUGHPUT.svg) -Latency measures inference time (ms) required to process a single input. When it comes to batch input need to measure throughput (images per second or frames per second, FPS). To calculate throughput, divide number of frames that were processed by the processing time. +Latency measures inference time (ms) required to process a single input. When it comes to batch input need to measure throughput (images per second or frames per second, FPS). To calculate throughput, divide the number of frames that were processed by the processing time. -> **NOTE**: To get performance numbers for OpenVINO, as well as tips how to measure it and compare with native framework, check [Getting performance numbers](../MO_DG/prepare_model/Getting_performance_numbers.md) page. +## How to measure performance +To get performance numbers for OpenVINO, as well as tips how to measure it and compare with native framework, go to [Getting performance numbers](../MO_DG/prepare_model/Getting_performance_numbers.md) page. ## How to Improve Performance -> **NOTE**: Make sure that your model can be successfully inferred with OpenVINO Runtime. +> **NOTE**: Make sure that your model can be successfully inferred with OpenVINO Inference Engine before reffering to the optimization topic. -Inside OpenVINO there are two ways how to get better performance number: during developing and deployment your model. **It is possible to combine both developing and deployment optimizations**. +Inside OpenVINO there are two ways how to get better performance numbers: optimize the model, which is called **model optimization** or tune parameters of execution, which is also **deployment optimization**. Note, that it is possible to combine both types of optimizations. -- **Developing step** includes model modification. Inside developing optimization there are three ways to optimize your model: +- **Model optimization** includes model modification, such as quantization, pruning, optimization of preprocessing, etc. Fore more details, refer to this [document](./model_optimization_guide.md). - - **Post-training Optimization tool** (POT) is designed to optimize the inference of deep learning models by applying special methods without model retraining or fine-tuning, like post-training quantization. +- **Deployment optimization** includes tuning inference parameters and optimizing model execution. To read more visit [Deployment Optimization Guide](../optimization_guide/dldt_deployment_optimization_guide.md). - - **Neural Network Compression Framework (NNCF)** provides a suite of advanced algorithms for Neural Networks inference optimization with minimal accuracy drop, available quantization, pruning and sparsity optimization algorithms. - - - **Model Optimizer** implement some optimization to a model, most of them added by default, but you can configure mean/scale values, batch size RGB vs BGR input channels and other parameters to speed-up preprocess of a model ([Additional Optimization Use Cases](../MO_DG/prepare_model/Additional_Optimizations.md)) - -- **Deployment step** includes tuning inference parameters and optimizing model execution, to read more visit [Deployment Optimization Guide](../optimization_guide/dldt_deployment_optimization_guide.md). - -More detailed workflow: - -![](../img/DEVELOPMENT_FLOW_V3_crunch.svg) - -To understand when to use each development optimization tool, follow this diagram: - -POT is the easiest way to get optimized models and it is also really fast and usually takes several minutes depending on the model size and used HW. NNCF can be considered as an alternative or an addition when the first does not give accurate results. - -![](../img/WHAT_TO_USE.svg) +## Performance benchmarks +To estimate the performance and compare performance numbers, measured on various supported devices, a wide range of public models are available at [Perforance benchmarks](../benchmarks/performance_benchmarks.md) section. \ No newline at end of file diff --git a/docs/optimization_guide/model_optimization_guide.md b/docs/optimization_guide/model_optimization_guide.md new file mode 100644 index 00000000000000..3edcc62917a240 --- /dev/null +++ b/docs/optimization_guide/model_optimization_guide.md @@ -0,0 +1,34 @@ + # Model Optimization Guide {#openvino_docs_model_optimization_guide} + +@sphinxdirective + +.. toctree:: + :maxdepth: 1 + :hidden: + + pot_README + docs_nncf_introduction + +@endsphinxdirective + + Model optimization assumes applying transformations to the model and relevant data flow to improve the inference performance. These transformations are basically offline and can require the availability of training and validation data. It includes such methods as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development: + + - **Post-training Optimization tool [(POT)](../../tools/pot/README.md)** is designed to optimize the inference of deep learning models by applying post-training methods that do not require model retraining or fine-tuning, like post-training quantization. + +- **Neural Network Compression Framework [(NNCF)](./nncf_introduction.md)** provides a suite of advanced algorithms for Neural Networks inference optimization with minimal accuracy drop, for example, quantization, pruning algorithms. + +- **Model Optimizer** implements optimization to a model, most of them added by default, but you can configure mean/scale values, batch size RGB vs BGR input channels, and other parameters to speed-up preprocess of a model ([Additional Optimization Use Cases](../MO_DG/prepare_model/Additional_Optimizations.md)) + + +## Detailed workflow: + +![](../img/DEVELOPMENT_FLOW_V3_crunch.svg) + +To understand which development optimization tool you need, refer to the diagram: + +POT is the easiest way to get optimized models, and usually takes several minutes depending on the model size and used HW. NNCF can be considered as an alternative or addition when the first one does not give accurate results. + +![](../img/WHAT_TO_USE.svg) + +## See also +- [Deployment optimization](./dldt_deployment_optimization_guide.md) \ No newline at end of file diff --git a/docs/optimization_guide/nncf_introduction.md b/docs/optimization_guide/nncf_introduction.md new file mode 100644 index 00000000000000..6ce2234771b7c2 --- /dev/null +++ b/docs/optimization_guide/nncf_introduction.md @@ -0,0 +1,63 @@ +# Neural Network Compression Framework {#docs_nncf_introduction} +This document describes the Neural Network Compression Framework (NNCF) which is being developed as a separate project outside of OpenVINO™ but it is highly aligned with OpenVINO™ in terms of the supported optimization features and models. It is open-sourced and available on [GitHub](https://github.com/openvinotoolkit/nncf). + +## Introduction + Neural Network Compression Framework (NNCF) is aimed at optimizing Deep Neural Network (DNN) by applying optimization methods, such as quantization, pruning, etc., to the original framework model. It mostly provides in-training optimization capabilities which means that optimization methods require model fine-tuning during and after optimization. The diagram below shows the model optimization workflow using NNCF. + ![](../img/nncf_workflow.png) + + ### Features + - Support optimization of PyTorch and TensorFlow 2.x models. + - Support of various optimization algorithms, applied during a model fine-tuning process to achieve a better performance-accuracy trade-off: + + |Compression algorithm|PyTorch|TensorFlow 2.x| + | :--- | :---: | :---: | + |[8- bit quantization](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md) | Supported | Supported | + |[Filter pruning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Pruning.md) | Supported | Supported | + |[Sparsity](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Sparsity.md) | Supported | Supported | + |[Mixed-precision quantization](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md#mixed_precision_quantization) | Supported | Not supported | + |[Binarization](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Binarization.md) | Supported | Not supported | + + + +- Stacking of optimization methods. For example: 8-bit quaNtization + Filter Pruning. +- Support for [Accuracy-Aware model training](https://github.com/openvinotoolkit/nncf/blob/develop/docs/Usage.md#accuracy-aware-model-training) pipelines via the [Adaptive Compression Level Training](https://github.com/openvinotoolkit/nncf/tree/develop/docs/accuracy_aware_model_training/AdaptiveCompressionLevelTraining.md) and [Early Exit Training](https://github.com/openvinotoolkit/nncf/tree/develop/docs/accuracy_aware_model_training/EarlyExitTrainig.md). +- Automatic, configurable model graph transformation to obtain the compressed model. + > **NOTE**: Limited support for TensorFlow models. Only the models created, using Sequential or Keras Functional API, are supported. +- GPU-accelerated layers for the faster compressed model fine-tuning. +- Distributed training support. +- Configuration file examples for each supported compression algorithm. +- Exporting PyTorch compressed models to ONNX\* checkpoints and TensorFlow compressed models to SavedModel or Frozen Graph format, ready to use with [OpenVINO™ toolkit](https://github.com/openvinotoolkit/). +- Git patches for prominent third-party repositories ([huggingface-transformers](https://github.com/huggingface/transformers)) demonstrating the process of integrating NNCF into custom training pipelines + +## Get started +### Installation +NNCF provides the packages available for installation through the PyPI repository. To install the latest version via pip manager run the following command: +``` +pip install nncf +``` + +### Usage examples +NNCF provides various examples and tutorials that demonstrate usage of optimization methods. + +### Tutorials +- [Quantization-aware training of PyTorch model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/302-pytorch-quantization-aware-training) +- [Quantization-aware training of TensorFlow model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/305-tensorflow-quantization-aware-training) +- (Experimental) [Post-training quantization of PyTorch model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/112-pytorch-post-training-quantization-nncf) + +### Samples +- PyTorch: + - [Image Classification sample](https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/classification/README.md) + - [Object Detection sample](https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/object_detection/README.md) + - [Semantic segmentation sample](https://github.com/openvinotoolkit/nncf/blob/develop/examples/torch/semantic_segmentation/README.md) + +- TensorFlow samples: + - [Image Classification sample](https://github.com/openvinotoolkit/nncf/blob/develop/examples/tensorflow/classification/README.md) + - [Object Detection sample](https://github.com/openvinotoolkit/nncf/blob/develop/examples/tensorflow/object_detection/README.md) + - [Instance Segmentation sample](https://github.com/openvinotoolkit/nncf/blob/develop/examples/tensorflow/segmentation/README.md) + + +## See also +- [Compressed Model Zoo](https://github.com/openvinotoolkit/nncf#nncf-compressed-model-zoo) +- [NNCF in HuggingFace Optimum](https://github.com/dkurt/optimum-openvino) +- [OpenVINO™ Post-training Optimization tool](../../tools/pot/README.md) + diff --git a/docs/snippets/ov_hetero.cpp b/docs/snippets/ov_hetero.cpp index c21a70be63926e..4bf4c344f5dcb4 100644 --- a/docs/snippets/ov_hetero.cpp +++ b/docs/snippets/ov_hetero.cpp @@ -45,10 +45,10 @@ auto compiled_model = core.compile_model(model, device); auto compiled_model = core.compile_model(model, "HETERO", // GPU with fallback to CPU ov::device::priorities("GPU", "CPU"), - // profiling is enabled only for CPU - ov::device::properties("CPU", ov::enable_profiling(true)), - // FP16 inference precision only for GPU - ov::device::properties("GPU", ov::hint::inference_precision(ov::element::f16)) + // profiling is enabled only for GPU + ov::device::properties("GPU", ov::enable_profiling(true)), + // FP32 inference precision only for CPU + ov::device::properties("CPU", ov::hint::inference_precision(ov::element::f32)) ); //! [configure_fallback_devices] } diff --git a/docs/snippets/ov_hetero.py b/docs/snippets/ov_hetero.py index 52874aea2bca57..96a2676e34316a 100644 --- a/docs/snippets/ov_hetero.py +++ b/docs/snippets/ov_hetero.py @@ -1,53 +1,41 @@ -#include - -int main() { -ov::Core core; -auto model = core.read_model("sample.xml"); -//! [set_manual_affinities] -for (auto && op : model->get_ops()) { - op->get_rt_info()["affinity"] = "CPU"; -} -//! [set_manual_affinities] - -//! [fix_automatic_affinities] -// This example demonstrates how to perform default affinity initialization and then -// correct affinity manually for some layers -const std::string device = "HETERO:GPU,CPU"; - -// query_model result contains mapping of supported operations to devices -auto supported_ops = core.query_model(model, device); - -// update default affinities manually for specific operations -supported_ops["operation_name"] = "CPU"; - -// set affinities to a model -for (auto&& node : model->get_ops()) { - auto& affinity = supported_ops[node->get_friendly_name()]; - // Store affinity mapping using op runtime information - node->get_rt_info()["affinity"] = affinity; -} - -// load model with manually set affinities -auto compiled_model = core.compile_model(model, device); -//! [fix_automatic_affinities] - -//! [compile_model] -{ - auto compiled_model = core.compile_model(model, "HETERO:GPU,CPU"); - // or with ov::device::priorities with multiple args - compiled_model = core.compile_model(model, "HETERO", ov::device::priorities("GPU", "CPU")); - // or with ov::device::priorities with a single argument - compiled_model = core.compile_model(model, "HETERO", ov::device::priorities("GPU,CPU")); -} -//! [compile_model] -{ -//! [configure_fallback_devices] - auto compiled_model = core.compile_model(model, "HETERO", - ov::device::priorities("GPU", "CPU"), // GPU with fallback to CPU - ov::device::properties("CPU", ov::enable_profiling(true)), // profiling is enabled only for CPU - ov::device::properties("GPU", ov::hint::inference_precision(ov::element::f16)) // FP16 inference precision only for GPU - ); -//! [configure_fallback_devices] -} -return 0; -} +import openvino.runtime as ov + +core = ov.Core() +model = core.read_model("sample.xml") + +#! [set_manual_affinities] +for op in model.get_ops(): + rt_info = op.get_rt_info() + rt_info["affinity"] = "CPU" +#! [set_manual_affinities] + +#! [fix_automatic_affinities] +# This example demonstrates how to perform default affinity initialization and then +# correct affinity manually for some layers +device = "HETERO:GPU,CPU" + +# query_model result contains mapping of supported operations to devices +supported_ops = core.query_model(model, device) + +# update default affinities manually for specific operations +supported_ops["operation_name"] = "CPU" + +# set affinities to a model +for node in model.get_ops(): + affinity = supported_ops[node.get_friendly_name()] + node.get_rt_info()["affinity"] = "CPU" + +# load model with manually set affinities +compiled_model = core.compile_model(model, device) +#! [fix_automatic_affinities] + +#! [compile_model] +compiled_model = core.compile_model(model, device_name="HETERO:GPU,CPU") +#! [compile_model] + +#! [configure_fallback_devices] +core.set_property("HETERO", {"MULTI_DEVICE_PRIORITIES": "GPU,CPU"}) +core.set_property("GPU", {"PERF_COUNT": "YES"}) +core.set_property("CPU", {"INFERENCE_PRECISION_HINT": "f32"}) +compiled_model = core.compile_model(model=model, device_name="HETERO") +#! [configure_fallback_devices] diff --git a/tools/mo/openvino/tools/mo/ops/activation_ops.py b/tools/mo/openvino/tools/mo/ops/activation_ops.py index bdf48463b9b20c..2d76a274c1e438 100644 --- a/tools/mo/openvino/tools/mo/ops/activation_ops.py +++ b/tools/mo/openvino/tools/mo/ops/activation_ops.py @@ -291,6 +291,6 @@ def infer(node: Node): assert beta.ndim == 0, 'The "beta" value for node {} must be a scalar'.format(node_name) beta = beta.item() - input_value = node.in_port(1).data.get_value() + input_value = node.in_port(0).data.get_value() if input_value is not None and beta is not None: node.out_port(0).data.set_value(input_value / (1.0 + np.exp(-input_value * beta))) diff --git a/tools/mo/unit_tests/mo/ops/activation_test.py b/tools/mo/unit_tests/mo/ops/activation_test.py index d1ea12e9726dcb..95310356b02bde 100644 --- a/tools/mo/unit_tests/mo/ops/activation_test.py +++ b/tools/mo/unit_tests/mo/ops/activation_test.py @@ -5,7 +5,7 @@ import numpy as np -from openvino.tools.mo.ops.activation_ops import Elu, SoftPlus, Mish +from openvino.tools.mo.ops.activation_ops import Elu, SoftPlus, Mish, Swish from openvino.tools.mo.graph.graph import Node from unit_tests.utils.graph import build_graph @@ -115,3 +115,32 @@ def test_activation_mish_infer(self): self.assertEqual(res_shape[i], value) for i, value in enumerate(exp_value): self.assertAlmostEqual(res_value[i], value) + + def test_activation_swish_infer(self): + graph = build_graph(self.nodes_attributes, + [ + ('node_1', 'activation_node'), + ('activation_node', 'node_3') + ], + { + 'node_1': { + 'value': np.array([-1.0, 0.0, 1.0, 20.0]) + }, + 'activation_node': { + 'op': 'Swish', + }, + 'node_3': { + 'value': None + } + }) + graph.graph['layout'] = 'NCHW' + activation_node = Node(graph, 'activation_node') + Swish.infer(activation_node) + exp_shape = np.array([4]) + res_shape = graph.node['node_3']['shape'] + res_value = graph.node['node_3']['value'] + exp_value = np.array([-0.26894142, 0.0, 0.73105858, 19.99999996]) + for i, value in enumerate(exp_shape): + self.assertEqual(res_shape[i], value) + for i, value in enumerate(exp_value): + self.assertAlmostEqual(res_value[i], value) diff --git a/tools/pot/openvino/tools/pot/algorithms/quantization/bias_correction/algorithm.py b/tools/pot/openvino/tools/pot/algorithms/quantization/bias_correction/algorithm.py index 4929d842224dd7..7627c401c87106 100644 --- a/tools/pot/openvino/tools/pot/algorithms/quantization/bias_correction/algorithm.py +++ b/tools/pot/openvino/tools/pot/algorithms/quantization/bias_correction/algorithm.py @@ -260,6 +260,7 @@ def _create_parameters_for_input_nodes(input_nodes): outputs_shapes = {nu.create_node_name(n): nu.get_output_shape(n, 0).copy() for n in input_nodes} inputs_data = [] param_type = 'Parameter' + nodes_data = [] for input_node in input_nodes: input_node_name = nu.create_node_name(input_node) c_input_shape = outputs_shapes[input_node_name] @@ -271,16 +272,19 @@ def _create_parameters_for_input_nodes(input_nodes): parameter_name = input_node_name + '/parameter' param_node = ge.create_node(input_node.graph, parameter_name, param_type, {'shape': c_input_shape, 'data_type': input_node_data_type}) - for _, port in input_node.out_ports().items(): - for in_port in port.get_destinations(): - in_port.disconnect() - in_port.connect(param_node.out_port(0)) + nodes_data.append((input_node, param_node)) inputs_data.append({ 'param_name': parameter_name, 'param_shape': tuple(c_input_shape), 'input_name': input_node_name }) + for input_node, param_node in nodes_data: + for _, port in input_node.out_ports().items(): + for in_port in port.get_destinations(): + in_port.disconnect() + in_port.connect(param_node.out_port(0)) + return inputs_data def _create_results_after_nodes(self, output_nodes): diff --git a/tools/pot/tests/test_sanity.py b/tools/pot/tests/test_sanity.py index 55569c7786049a..7eca8ad819dd93 100644 --- a/tools/pot/tests/test_sanity.py +++ b/tools/pot/tests/test_sanity.py @@ -54,7 +54,9 @@ ('mtcnn', 'caffe', 'DefaultQuantization', 'performance', 1, {'recall': 0.76, 'map': 0.6844}, {}, 'CPU'), ('mtcnn', 'caffe', 'DefaultQuantization', 'performance', 2, {'recall': 0.76, 'map': 0.6638}, - {'use_fast_bias': False}, 'CPU') + {'use_fast_bias': False}, 'CPU'), + ('octave-resnet-26-0.25', 'mxnet', 'DefaultQuantization', 'performance', 300, + {'accuracy@top1': 0.766, 'accuracy@top5': 0.927}, {'use_fast_bias': False}, 'CPU'), ] CASCADE_MAP = Dict({ 'mtcnn': {