[Torch][WeightCompression] Add Scale Estimation data-aware support #3179

kshpv · 2025-01-08T09:53:48Z

Changes

Added data-aware support for the Torch backend for WeightCompression with Scale Estimation.
Introduced support for MeanVarianceReducer, MaxVarianceReducer, and MeanAbsMaxReducer.
Incorporated torch.inference_mode() context for WeightCompression.

Reason for changes

These changes enable the utilization of data-aware Scale Estimation for the Torch backend, specifically leveraging CUDA devices for improved performance.

Related tickets

Ticket ID: 158974

Tests

Added a template for WeightCompression tests for both Torch and OV backends, covering data-aware and Scale Estimation scenarios.
Extended the test scope to include tinyllama_data_aware and tinyllama_scale_estimation_per_channel for Torch.
Added a new test case tinyllama_scale_estimation_group_size_64 for both Torch and OV backends.

Performance Metrics

Note: All CUDA results are obtained locally on a single RTX 3090.

Model	Backend	Metric Name	Metric Value	Num int4	Num int8	Compression Time (from Performance Job)	RAM MiB (from Performance Job)
tinyllama_data_aware	OV	Similarity	0.8577	94	124	0:01:28	8545
tinyllama_data_aware	TORCH	Similarity	0.8577	94	124	0:02:15	1225
tinyllama_data_aware	TORCH (CUDA)	Similarity	0.8577	94	124	0:00:28	-
tinyllama_scale_estimation_per_channel	OV	Similarity	0.8139	188	124	0:02:57	8681
tinyllama_scale_estimation_per_channel	TORCH	Similarity	0.8139	188	124	0:03:25	5472
tinyllama_scale_estimation_per_channel	TORCH (CUDA)	Similarity	0.8139	188	124	0:00:35	-
tinyllama_scale_estimation_group_size_64	OV	Similarity	0.8566	94	124	0:04:17	8681
tinyllama_scale_estimation_group_size_64	TORCH	Similarity	0.8566	94	124	0:04:01	5575
tinyllama_scale_estimation_group_size_64	TORCH (CUDA)	Similarity	0.8566	94	124	0:00:36	-

tests/common/experimental/test_reducers_and_aggregators.py

…torch

kshpv · 2025-01-08T11:48:23Z

weight compression build - 291

kshpv · 2025-01-08T11:51:33Z

The proposed example can be added as a follow-up PR - will be excluded from this PR

tests/common/experimental/test_reducers_and_aggregators.py

tests/torch/ptq/test_weights_compression.py

tests/cross_fw/test_templates/template_test_weights_compression.py

ljaljushkin · 2025-01-21T11:24:42Z

nncf/experimental/common/tensor_statistics/collectors.py

+    def _reduce_out_of_place(self, x: List[Tensor]) -> List[Tensor]:
+        x = x[0]


Not for this PR.
Should this function really take the list?
Seems like it always contains one element. Is it only RawReducer that consumes the whole list and the rest works with one element?
Can we inherit all these "one-element" classes from a class that defines a method with a single element?
IMO, it would be more clear when one element expected and when not.
@daniil-lyakhov

Reducers was designed to receive several inputs and output several outputs as well. For now, there is no such reducers, but the possible use case - quantization error (fp32 input, int8 input) -> diff

We can create a class for that, it will make hierarchy tree more complicated, we than should introduce method like _reduce_out_of_place_one_input method, and I don't think this will make code more readable

Since all reducers use one input, it should be relatively easy, isn't?

keep in base class _reduce_out_of_place for many inputs

subclass an intermediate class with _reduce_out_of_place_one_input

rename inheritance from base class to intermediate class

rename _reduce_out_of_place to _reduce_out_of_place_one_input.

tests/torch/ptq/test_weights_compression.py

ljaljushkin · 2025-01-24T09:10:48Z

@kshpv Have you run the performance job after merging with develop? Could you share it?

kshpv · 2025-01-24T13:08:33Z

@kshpv Have you run the performance job after merging with develop? Could you share it?

Sure, performance build 41

tests/cross_fw/test_templates/template_test_weights_compression.py

ljaljushkin · 2025-01-24T14:26:19Z

If compare 40 and 41 performance build, one can notice that AWQ is sometimes slowly.
40 build doesn't include #2727

41 build includes #2727

Although time for mixed-precision, applying compession and total time are better with 41 build, I wonder why there is some slowness. Is it measurement error or expected numbers? @nikita-savelyevv

nikita-savelyevv · 2025-01-24T14:42:20Z

If compare 40 and 41 performance build, one can notice that AWQ is sometimes slowly. 40 build doesn't include #2727 41 build includes #2727

Although time for mixed-precision, applying compession and total time are better with 41 build, I wonder why there is some slowness. Is it measurement error or expected numbers? @nikita-savelyevv

It is possible that AWQ part could become a bit slower for a small model like tiny-llama because the added compiled functions are the most effective for compressing large tensors. When operating with small tensors, compilation overhead can overshadow the speedup from compiled computation.

That's why for example int8 tiny-llama compression is a bit slower after #2727

My assumption is that the same effect makes AWQ a bit slower in this case. It should not happen for larger models though. I will check this. Thanks for the observation.

ljaljushkin · 2025-01-24T14:46:10Z

t the same effect makes AWQ a bit slower in this case. It should not happen for larger models though. I will check this. Thanks for the observa

Thanks for the confirmation!

kshpv added 5 commits January 3, 2025 16:02

add torch sample

09126af

upd sample

67cef71

fix reducers

94d2850

align SE with GPTQ

db42165

add tests

f96788a

kshpv requested a review from a team as a code owner January 8, 2025 09:53

github-actions bot added NNCF PT Pull requests that updates NNCF PyTorch NNCF Common Pull request that updates NNCF Common experimental NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jan 8, 2025

kshpv changed the title ~~[TorchScale est torch~~ [Torch][WeightCompression] Add Scale Estimation support Jan 8, 2025

kshpv marked this pull request as draft January 8, 2025 09:54

kshpv added 4 commits January 8, 2025 10:58

backend method - get_filter_fn_for_statistics

b1d4c47

fixes

51ccdd6

Merge remote-tracking branch 'remote/develop' into scale_est_torch

e6a9191

sample

e37ef52

daniil-lyakhov reviewed Jan 8, 2025

View reviewed changes

tests/common/experimental/test_reducers_and_aggregators.py Outdated Show resolved Hide resolved

kshpv added 4 commits January 8, 2025 12:15

add tinyllama_data_aware, tinyllama_scale_estimation_per_channel for …

d1843ad

…torch

fix precommit

cd79e80

Merge remote-tracking branch 'remote/develop' into scale_est_torch

bc0731c

minor

df6b43b

MaximProshin changed the title ~~[Torch][WeightCompression] Add Scale Estimation support~~ [Torch][WeightCompression] Add Scale Estimation data-aware support Jan 8, 2025

refactor test

368054a

add WA for dataset

e2a6f46

daniil-lyakhov reviewed Jan 8, 2025

View reviewed changes

tests/common/experimental/test_reducers_and_aggregators.py Outdated Show resolved Hide resolved

kshpv added 2 commits January 8, 2025 13:07

fix

dbf2b1d

dtype

702f8b1

andreyanufr reviewed Jan 21, 2025

View reviewed changes

tests/torch/ptq/test_weights_compression.py Show resolved Hide resolved

ljaljushkin reviewed Jan 21, 2025

View reviewed changes

tests/cross_fw/test_templates/template_test_weights_compression.py Outdated Show resolved Hide resolved

ljaljushkin reviewed Jan 21, 2025

View reviewed changes

ljaljushkin requested changes Jan 21, 2025

View reviewed changes

tests/torch/ptq/test_weights_compression.py Outdated Show resolved Hide resolved

kshpv added 5 commits January 21, 2025 14:07

add tinyllama_scale_estimation_group_size_64

026a0ed

torch.no_grad -> torch.inference_mode

e3f12c2

upd reference

a347a25

#test: upd int4 weight locator for torch

601f2e4

upd licence year

32bc0e5

kshpv requested review from ljaljushkin and alexsu52 January 22, 2025 14:48

ljaljushkin requested changes Jan 22, 2025

View reviewed changes

tests/torch/ptq/test_weights_compression.py Outdated Show resolved Hide resolved

kshpv added 3 commits January 23, 2025 10:23

Merge remote-tracking branch 'remote/develop' into scale_est_torch

f8d6451

merge

7557fb5

rebase

568809c

kshpv requested review from ljaljushkin January 24, 2025 10:03

add test on scale estimation

8c7efd6

kshpv added 2 commits January 24, 2025 14:10

add check on reducing error after SE

64f588f

upd atol for model (difference across devices)

be92375

ljaljushkin requested changes Jan 24, 2025

View reviewed changes

tests/cross_fw/test_templates/template_test_weights_compression.py Outdated Show resolved Hide resolved

tests/cross_fw/test_templates/template_test_weights_compression.py Outdated Show resolved Hide resolved

kshpv added 2 commits January 24, 2025 15:09

no copy

9037dd2

new line

34570fa

polishing

5e5440b

ljaljushkin approved these changes Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch][WeightCompression] Add Scale Estimation data-aware support #3179

[Torch][WeightCompression] Add Scale Estimation data-aware support #3179

kshpv commented Jan 8, 2025 •

edited

Loading

kshpv commented Jan 8, 2025

kshpv commented Jan 8, 2025 •

edited

Loading

ljaljushkin Jan 21, 2025

daniil-lyakhov Jan 22, 2025

ljaljushkin Jan 24, 2025 •

edited

Loading

ljaljushkin commented Jan 24, 2025

kshpv commented Jan 24, 2025

ljaljushkin commented Jan 24, 2025

nikita-savelyevv commented Jan 24, 2025

ljaljushkin commented Jan 24, 2025 •

edited

Loading

		def _reduce_out_of_place(self, x: List[Tensor]) -> List[Tensor]:
		x = x[0]

[Torch][WeightCompression] Add Scale Estimation data-aware support #3179

Are you sure you want to change the base?

[Torch][WeightCompression] Add Scale Estimation data-aware support #3179

Conversation

kshpv commented Jan 8, 2025 • edited Loading

Changes

Reason for changes

Related tickets

Tests

Performance Metrics

kshpv commented Jan 8, 2025

kshpv commented Jan 8, 2025 • edited Loading

ljaljushkin Jan 21, 2025

Choose a reason for hiding this comment

daniil-lyakhov Jan 22, 2025

Choose a reason for hiding this comment

ljaljushkin Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

ljaljushkin commented Jan 24, 2025

kshpv commented Jan 24, 2025

ljaljushkin commented Jan 24, 2025

nikita-savelyevv commented Jan 24, 2025

ljaljushkin commented Jan 24, 2025 • edited Loading

kshpv commented Jan 8, 2025 •

edited

Loading

kshpv commented Jan 8, 2025 •

edited

Loading

ljaljushkin Jan 24, 2025 •

edited

Loading

ljaljushkin commented Jan 24, 2025 •

edited

Loading