Skip to content

Commit

Permalink
Up-to-date link to PTQ chapter (instead of obsolete POT)
Browse files Browse the repository at this point in the history
  • Loading branch information
slyalin committed Sep 4, 2023
1 parent 08317aa commit 1d57e79
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
10 changes: 5 additions & 5 deletions docs/MO_DG/prepare_model/FP16_Compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
@sphinxdirective

By default, when IR is saved all relevant floating-point weights are compressed to ``FP16`` data type during model conversion.
It results in creating a "compressed ``FP16`` model", which occupies about half of
It results in creating a "compressed ``FP16`` model", which occupies about half of
the original space in the file system. The compression may introduce a minor drop in accuracy,
but it is negligible for most models.
In case if accuracy drop is significant user can disable compression explicitly.
Expand All @@ -29,20 +29,20 @@ To disable compression, use the ``compress_to_fp16=False`` option:
mo --input_model INPUT_MODEL --compress_to_fp16=False


For details on how plugins handle compressed ``FP16`` models, see
For details on how plugins handle compressed ``FP16`` models, see
:doc:`Working with devices <openvino_docs_OV_UG_Working_with_devices>`.

.. note::

``FP16`` compression is sometimes used as the initial step for ``INT8`` quantization.
Refer to the :doc:`Post-training optimization <pot_introduction>` guide for more
``FP16`` compression is sometimes used as the initial step for ``INT8`` quantization.
Refer to the :doc:`Post-training optimization <ptq_introduction>` guide for more
information about that.


.. note::

Some large models (larger than a few GB) when compressed to ``FP16`` may consume an overly large amount of RAM on the loading
phase of the inference. If that is the case for your model, try to convert it without compression:
phase of the inference. If that is the case for your model, try to convert it without compression:
``convert_model(INPUT_MODEL, compress_to_fp16=False)`` or ``convert_model(INPUT_MODEL)``


Expand Down
2 changes: 1 addition & 1 deletion docs/OV_Converter_UG/prepare_model/FP16_Compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ For details on how plugins handle compressed ``FP16`` models, see
.. note::

``FP16`` compression is sometimes used as the initial step for ``INT8`` quantization.
Refer to the :doc:`Post-training optimization <pot_introduction>` guide for more
Refer to the :doc:`Post-training optimization <ptq_introduction>` guide for more
information about that.

@endsphinxdirective
Expand Down

0 comments on commit 1d57e79

Please sign in to comment.