forked from openvinotoolkit/nncf
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Computation of compression parameters via OpenVINO models (openvinoto…
…olkit#2727) ### Changes - Implemented OpenVINO model graphs which are used for calculation of compressed and decompressed weights. Since these models are compiled, calculation become significantly faster especially for larger models and int4 compression. - This functionality is exposed by two methods at `weight_lowering.py`: - `do_int_quantization()` is used for computing a compressed weight. Possible signatures: - `weight` -> `compressed_weight`, `scale`, (`zero_point` for asymmetric compression) - `weight`, `scale`, (`zero_point`) -> `compressed_weight`, `scale`, (`zero_point`) - `calculate_quantized_dequantized_weight()` is used for computing a decompressed weight. Possible signatures: - `weight` -> `decompressed_weight` - `weight`, `scale`, (`zero_point`) -> `decompressed_weight` - `weight` -> `decompressed_weight`, `compressed_weight`, `scale`, (`zero_point`) - `weight`, `scale`, (`zero_point`) -> `decompressed_weight`, `compressed_weight`, `scale`, (`zero_point`) - Output `scale` and `zero_point` are the same as the ones given as input (if they were given at all). - Computation is done via OV models only if openvino package is installed and input tensors are not torch tensors. - Introduce a new NNCF Tensor backend for storing instances of `openvino.Tensor`. Implementation for this backend is limited by only the required functionality, e.g. addition of OV Tensors is not supported because it is not needed. - Introduction of OV Tensors is required for seamless handling of tensors in `bf16`, `u4` and `i4` data types. For example, `bf16` constants are read from an OpenVINO LLM and given as inputs to a compressing OpenVINO model. `u4` and `i4` compressed weights are seamlessly inserted into the resulting compressed OpenVINO model. - Added `as_numpy_tensor()` method to convert an NNCF Tensor to numpy backend. Currently only OV -> NP conversion is required. - All calculations are aligned with reference numpy implementation. Some performance and memory sacrifices had to be made for such alignment. Data-free asymmetric compression:  Data-free symmetric compression:  Data-aware compression:  ### Reason for changes Reducing model compression time. Only OpenVINO model compression backend is affected. ### Related tickets 139047 ### Tests - `tests/openvino/native/quantization/test_ov_modeling_compression.py::test_quantization_alignment` -- check aligment with reference numpy implementation - `tests/openvino/native/test_openvino_modeling.py` -- checks OV modeling framework hyperparameters - `tests/openvino/native/test_tensor.py` -- NNCF OV Tensor backend tests Validation jobs: - `NNCF/job/manual/job/post_training_weight_compression/299/` - `NNCF/job/nightly/job/test_examples/650` - OVVP validation ✅ - optimum-intel test job https://github.com/huggingface/optimum-intel/actions/runs/12912964434/job/36009036879?pr=734
- Loading branch information
1 parent
b6f2e75
commit f3f232f
Showing
32 changed files
with
2,195 additions
and
293 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# Copyright (c) 2025 Intel Corporation | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import copy | ||
import inspect | ||
from contextlib import contextmanager | ||
from functools import wraps | ||
from typing import Any, Callable, Dict, Iterator, TypeVar, cast | ||
|
||
|
||
class ResultsCache: | ||
""" | ||
A container for results decorated with @cache_results decorator. | ||
""" | ||
|
||
def __init__(self) -> None: | ||
self._enabled = True | ||
# Stores the results of the decorated function | ||
self._cache: Dict[Any, Any] = {} | ||
# Stores the number of times the cached result was accessed | ||
self._access_count: Dict[Any, int] = {} | ||
|
||
def enable(self) -> None: | ||
self._enabled = True | ||
|
||
def disable(self) -> None: | ||
self._enabled = False | ||
|
||
def enabled(self) -> bool: | ||
return self._enabled | ||
|
||
def access_count(self) -> Dict[Any, int]: | ||
return copy.deepcopy(self._access_count) | ||
|
||
def clear(self) -> None: | ||
self._cache.clear() | ||
self._access_count.clear() | ||
|
||
def __getitem__(self, key: Any) -> Any: | ||
self._access_count[key] += 1 | ||
return self._cache[key] | ||
|
||
def __setitem__(self, key: Any, value: Any) -> None: | ||
self._access_count[key] = 0 | ||
self._cache[key] = value | ||
|
||
def __contains__(self, key: Any) -> bool: | ||
return key in self._cache | ||
|
||
|
||
TFunc = TypeVar("TFunc", bound=Callable[..., Any]) | ||
|
||
|
||
def cache_results(cache: ResultsCache) -> Callable[[TFunc], TFunc]: | ||
""" | ||
Decorator to cache results of a function. When decorated function is called with the same set of arguments, it | ||
will return the cached result instead of recomputing it. If it was the first call with such set of arguments, the | ||
result will be computed and stored in the cache. The cache is stored in the `cache` object. Function arguments | ||
must be hashable. | ||
:param cache: A cache container where results will be stored. | ||
""" | ||
|
||
def decorator(func: TFunc) -> TFunc: | ||
@wraps(func) | ||
def wrapper(*args: Any, **kwargs: Any) -> Any: | ||
if not cache.enabled(): | ||
return func(*args, **kwargs) | ||
sig = inspect.signature(func) | ||
new_kwargs = {name: arg for name, arg in zip(sig.parameters, args)} | ||
new_kwargs.update(kwargs) | ||
cache_key = (func.__name__, frozenset(new_kwargs.items())) | ||
if cache_key in cache: | ||
return cache[cache_key] | ||
result = func(*args, **kwargs) | ||
cache[cache_key] = result | ||
return result | ||
|
||
return cast(TFunc, wrapper) | ||
|
||
return decorator | ||
|
||
|
||
@contextmanager | ||
def disable_results_caching(cache: ResultsCache) -> Iterator[None]: | ||
""" | ||
Context manager to disable caching of results for a block of code. | ||
:param cache: A cache container where results are stored. | ||
""" | ||
if cache.enabled(): | ||
cache.disable() | ||
yield | ||
cache.enable() | ||
else: | ||
yield |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Copyright (c) 2025 Intel Corporation | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from nncf.openvino.optimized_functions.functions import astype as astype | ||
from nncf.openvino.optimized_functions.functions import do_int_quantization as do_int_quantization | ||
from nncf.openvino.optimized_functions.functions import quantize_dequantize_weight as quantize_dequantize_weight | ||
from nncf.openvino.optimized_functions.models import OVModelParameters as OVModelParameters | ||
from nncf.openvino.optimized_functions.models import clear_ov_model_cache as clear_ov_model_cache |
Oops, something went wrong.