Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

Open
1 task
alexsu52 opened this issue Jun 27, 2024 · 4 comments
Open
1 task

[TorchFX] Torch FX/PyTorch 2 Export Quantization #2766

alexsu52 opened this issue Jun 27, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@alexsu52
Copy link
Contributor

alexsu52 commented Jun 27, 2024

🚀 Feature request

Quantization is a widely used technique to accelerate models, particularly when using the torch.compile. For detailed tutorials and demonstrations on model quantization using PyTorch 2 Export Quantization, please refer to the following resources:

These guides show how to obtain a quantized model via the PyTorch 2 Export Quantization API and run it using torch.compile. However OpenVINO provide backend for torch.compile, but NNCF does not support quantization PyTorch 2 Export (torch.fx.GraphModule) models and users have to use X86InductorQuantizer to quantize models. Comparisons between PyTorch 2 Export INT8 models quantized by X86InductorQuantizer and OpenVINO INT8 models quantized by NNCF show that NNCF produces more accurate and efficient INT8 models.

Feature request is to support for torch.fx.GraphModule models in nncf.quantize to enable the creation of accurate and highly efficient models using torch.compile with the OpenVINO backend.

Feature Use Case

import torch
import nncf

# initialize a floating point model​
float_model = M().eval()​

# program capture​
# NOTE: this API will be updated to torch.export API in the future,​ but the captured result should mostly stay the same​
model = capture_pre_autograd_graph(float_model, *example_inputs)

# quantization​
quantized_model = nncf.quantize(model, calibration_dataset)

# compile quantized model with OpenVINO bac​kend
compiled_model = torch.compile(quantized_model, backend='openvino')

Are you going to submit a PR?

  • Yes I'd like to help by submitting a PR!
@alexsu52 alexsu52 added the enhancement New feature or request label Jun 27, 2024
@alexsu52 alexsu52 changed the title Torch FX Quantization Torch FX/PyTorch 2 Export Quantization Jun 27, 2024
@alexsu52
Copy link
Contributor Author

@daniil-lyakhov, please, analyze this feature request and open issues as sub-tasks of this feature request.

@alexsu52
Copy link
Contributor Author

alexsu52 commented Jul 1, 2024

I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:

class OpenVINOQuantizer(Quantizer):​
    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:​
        pass​

    # validate the annotated graph is supported by the backend​
    def validate(self, model: torch.fx.GraphModule) -> None:​
        pass​

    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    @classmethod​
    def get_supported_operators(cls) -> List[OperatorConfig]:​
        pass​


# apply quantization pipeline for torch.export.ExportedProgram​
def quantize_pt2e(​
    model: torch.export.ExportedProgram, ​
    calibration_dataset: Dataset, ​
    quantizer: torch.ao.quantization.quantizer.Quantizer, ​
    subset_size: int = 300,​
    fast_bias_correction: Optional[bool] = True,​
    smooth_quant: Optional[bool] = None,​
    channel_alignment: Optional[bool] = None,​
    bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, ​
    smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,​
)​

@MaximProshin MaximProshin changed the title Torch FX/PyTorch 2 Export Quantization [TorchFX] Torch FX/PyTorch 2 Export Quantization Jul 5, 2024
AlexanderDokuchaev pushed a commit that referenced this issue Aug 5, 2024
### Changes

Added a test in tests/torch/fx/test_models.py to include a test for
quantized graph which compares the quantized graph with a reference
quantized graph.

### Reason for changes

To check if the graph was quantized correctly

### Ticket
#2766 

### Tests

test_quantized_model() was added in tests/torch/fx/test_models.py
KodiaqQ pushed a commit that referenced this issue Aug 5, 2024
…izers (#2854)

### Changes

Quantizer merge logic updated to check that all output branches are
quantized before quantizers merging and propagating up.

### Reason for changes

To prevent merging of quantizers in case of ScaledDotProductAttention
op, which should have quantizers on [0, 1] input ports and shouldn't
have a quantizer on the 3 input port.

### Related tickets

148211
#2766 

### Tests

* Common solver test for ScaleDotProductAttention branch merging and
quantization initialization
* Graph tests for torch/ov backends
AlexanderDokuchaev pushed a commit that referenced this issue Aug 7, 2024
### Changes

Conformance test for resnet18

### Reason for changes

To extend testing scope for the TorchFX backend

### Related tickets

#2766

### Tests

post_training_quantization/442 is successfull
AlexanderDokuchaev pushed a commit that referenced this issue Aug 9, 2024
### Changes

Torch FX pre-hook insertion support

### Reason for changes

To enable vit_b_16 quantization

### Related tickets

#2766 

### Tests

test_quantized_models is updated by vit_b_16 and swin_v2_s
AlexanderDokuchaev pushed a commit that referenced this issue Aug 13, 2024
### Changes

Constant linear layers support

### Reason for changes

To support swint_v2_s FBC

### Related tickets

#2766 

### Tests
Build post_training_quantization/444/ is finished successfully
Unit test `test_model_transformer.test_model_extraction` is presented
KodiaqQ pushed a commit that referenced this issue Aug 16, 2024
### Changes

TorchFX SmoothQuant backend implementation
*  module_insertion_transformation_builder is introduced
* Transformation requires names for new modules and nodes
* vit_b_16 is introduced in the conformance tests
### Reason for changes

To improve metrics of quantized models: swin_v2_s and vit_b_16
* To insert SQ multiply nodes to the graph
* To make node names human-readable and consistent
* To check sq algorithm E2E

### Related tickets

#2766

### Tests

* Smooth quant test template is implemented for TorchfX backed
* Conformance test: post_training_quantization/446/ is successfull
* Test models check SQ multiplies for swin_v2_s and vit_b_16 models
alexsu52 pushed a commit that referenced this issue Oct 21, 2024
### Changes

Transformation for removing fake quantize nodes and saving all weights
to disk in int8 format after quantization. It works as follows:
1. Reshape the scale if qdq operation is per-channel.
2. Pattern match the quantize-dequantize nodes.
3. Filter the matches to only include quantize-dequantize ops with
constant input.
4. Replace with the multiplication of the scale and input.

### Reason for changes

To compress the model after quantization

### Tests

Add `test_post_quantization_compression()` in
`tests/torch/fx/test_model_transformer.py` which checks the data type of
all weights in the model after applying quantization and also checks the
value after the decompression step (element-wise multiplication
operation).

### Tickets
#2766

---------

Co-authored-by: Daniil Lyakhov <[email protected]>
AlexanderDokuchaev pushed a commit that referenced this issue Oct 22, 2024
### Changes

* Resnet18 TorchFX example

### Reason for changes

* To showcase NNCF TorchFX quantization

### Related tickets

#2766 

### Tests

test_examples/544/ - Done
@daniil-lyakhov
Copy link
Collaborator

daniil-lyakhov commented Oct 23, 2024

I suugest to introduce the following API in NNCF, to support third-party quantizers and better alignment with PyTorch 2 Export Quantization API:

class OpenVINOQuantizer(Quantizer):​
    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    def annotate(self, model: torch.fx.GraphModule) -> torch.fx.GraphModule:​
        pass​

    # validate the annotated graph is supported by the backend​
    def validate(self, model: torch.fx.GraphModule) -> None:​
        pass​

    # annotate nodes in the graph with observer or fake quant constructors​
    # to convey the desired way of quantization​
    @classmethod​
    def get_supported_operators(cls) -> List[OperatorConfig]:​
        pass​


# apply quantization pipeline for torch.export.ExportedProgram​
def quantize_pt2e(​
    model: torch.export.ExportedProgram, ​
    calibration_dataset: Dataset, ​
    quantizer: torch.ao.quantization.quantizer.Quantizer, ​
    subset_size: int = 300,​
    fast_bias_correction: Optional[bool] = True,​
    smooth_quant: Optional[bool] = None,​
    channel_alignment: Optional[bool] = None,​
    bias_correction_params: Optional[AdvancedBiasCorrectionParameters] = None, ​
    smooth_quant_alphas: Optional[AdvancedSmoothQuantParameters] = None,​
)​

plus parameter range estimators

alexsu52 pushed a commit that referenced this issue Oct 30, 2024
### Changes

* ~~Constant folding is applied to all TorchFX models before the
quantization~~
* Some torchvision models (swin_v2_s, vit_16_b) are exported by
`torch.export.export` before ov conversation
* Moc transformations are applied to openvino compressed models after
the compression

After the #2984 
* Fixed `_compress_qdq_constant_transformation` for per tensor case

### Reason for changes

* To align TorchFX/OV quantized models

### Related tickets

#2766

### Tests

post_training_quantization/504/ is finished successfully
alexsu52 added a commit that referenced this issue Nov 7, 2024
### Changes

Constant folding is enabled by default in TorchFX backend

### Reason for changes

To align quantizers placement between OV and TorchFX

### Related tickets

#2766 

### Tests

* test_constant_folding
*  test_constant_folding_with_constraints
* test_models.py references are updated
* post_training_quantization/535/ - finished successfully

---------

Co-authored-by: Alexander Suslov <[email protected]>
Co-authored-by: Aamir Nazir <[email protected]>
alexsu52 pushed a commit that referenced this issue Nov 14, 2024
### Changes

* TorchFX Unit tests are moved from
`torch._export.capture_pre_autograd_graph` to
`torch.export.export_for_training`
ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY 
* BC types for `fuse_bn_node` are updated
* NNCFGraphBuilder is updated to support a batch-norm type with only one
output node (instead of three)
* Model extractor does not traverse down from constans to prevent
redundant nodes in the extracted model when the constant is shared
* `shared_constants_unification_transformation` is removed
* Tests which require `capture_pre_autograd_graph` are removed

### Reason for changes

* To migrate to the lates and recommended export method for TorchFX
backend

### Related tickets

#2766 

### Tests

test_shared_constants_unification_not_connected_const
post_training_quantization/540/ is finished successfully
daniil-lyakhov added a commit to daniil-lyakhov/nncf that referenced this issue Nov 14, 2024
…it#3075)

### Changes

* TorchFX Unit tests are moved from
`torch._export.capture_pre_autograd_graph` to
`torch.export.export_for_training`
ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY 
* BC types for `fuse_bn_node` are updated
* NNCFGraphBuilder is updated to support a batch-norm type with only one
output node (instead of three)
* Model extractor does not traverse down from constans to prevent
redundant nodes in the extracted model when the constant is shared
* `shared_constants_unification_transformation` is removed
* Tests which require `capture_pre_autograd_graph` are removed

### Reason for changes

* To migrate to the lates and recommended export method for TorchFX
backend

### Related tickets

openvinotoolkit#2766 

### Tests

test_shared_constants_unification_not_connected_const
post_training_quantization/540/ is finished successfully
KodiaqQ pushed a commit that referenced this issue Nov 14, 2024
PR #3075 to the release branch:

### Changes

* TorchFX Unit tests are moved from
`torch._export.capture_pre_autograd_graph` to
`torch.export.export_for_training`
ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY 
* BC types for `fuse_bn_node` are updated
* NNCFGraphBuilder is updated to support a batch-norm type with only one
output node (instead of three)
* Model extractor does not traverse down from constans to prevent
redundant nodes in the extracted model when the constant is shared
* `shared_constants_unification_transformation` is removed
* Tests which require `capture_pre_autograd_graph` are removed

### Reason for changes

* To migrate to the lates and recommended export method for TorchFX
backend

### Related tickets

#2766 

### Tests

test_shared_constants_unification_not_connected_const
post_training_quantization/540/ is finished successfully
alexsu52 pushed a commit that referenced this issue Nov 15, 2024
### Changes

* Main README.md, Usage.md and post training quantization docs are
updated with info about the TorchFX

### Reason for changes

* To reflect new experimental features of TorchFX in the docs

### Related tickets

#2766
KodiaqQ pushed a commit that referenced this issue Nov 15, 2024
PR #2917 to the release branch

### Changes

* Main README.md, Usage.md and post training quantization docs are
updated with info about the TorchFX

### Reason for changes

* To reflect new experimental features of TorchFX in the docs

### Related tickets

#2766
alexsu52 pushed a commit that referenced this issue Nov 25, 2024
### Changes

* Torch SDPA pattern is updated
* As the concat node has his input nodes in format `args=([inp_1, ...,
inp_n], dim)`, thus it should be treated differently. Retrieving concat
inputs by input port id was supported in each TorchFX transformation

### Reason for changes

* To support quantization of ultralytics/yolo11n in TorchFX backend

### Related tickets

#2766 
157032

### Tests

* `tests/torch/fx/test_model_transformer.py` and
`tests/torch/fx/test_compress_weights.py` are updated to check all cases
with the concat node. All .`dot` / `.json` were checked manually.
* `tests/torch/fx/test_models.py` is updated with `YOLO11N_SDPABlock`
synthetic model to check the correctness of SDPA pattern matching
alexsu52 pushed a commit that referenced this issue Nov 26, 2024
### Changes

All `capture_pre_autograd_graph` calls in the conformance test were
replaced by `torch.export.export_for_training`.

### Reason for changes

To remove deprecated `capture_pre_autograd_graph` from the conformance
test.

### Related tickets

#2766 

### Tests

post_training_quantization/555/ have finished succesfully
daniil-lyakhov added a commit to daniil-lyakhov/nncf that referenced this issue Dec 2, 2024
…notoolkit#3078)

### Changes

All `capture_pre_autograd_graph` calls in the conformance test were
replaced by `torch.export.export_for_training`.

### Reason for changes

To remove deprecated `capture_pre_autograd_graph` from the conformance
test.

### Related tickets

openvinotoolkit#2766 

### Tests

post_training_quantization/555/ have finished succesfully
daniil-lyakhov added a commit to daniil-lyakhov/nncf that referenced this issue Dec 2, 2024
…notoolkit#3078)

### Changes

All `capture_pre_autograd_graph` calls in the conformance test were
replaced by `torch.export.export_for_training`.

### Reason for changes

To remove deprecated `capture_pre_autograd_graph` from the conformance
test.

### Related tickets

openvinotoolkit#2766 

### Tests

post_training_quantization/555/ have finished succesfully
alexsu52 pushed a commit that referenced this issue Dec 4, 2024
### Changes

* Bias fusing is removed from default transformations
* `constant_folding` is updated to remove inplace operations without
users
* `extract_model` is updated to support original model output as a
subgraph output

### Reason for changes

To make it possible to apply quantization the same way it done by
X86Quantizer

### Related tickets

#2766
110985

### Tests
* All int8 references are updated and checked manually
* `test_constant_folding` and `test_constant_folding_with_constraints`
are updated with a constant subgraph which contains an inplace op
(`relu_`)
* `test_model_extraction_with_original_output` is introduced
* conformance test post_training_quantization/557 have finished
successfully
alexsu52 pushed a commit that referenced this issue Dec 4, 2024
### Changes

Folded constants do not require gradient

### Reason for changes

* To unify all model constant/buffers
* To make compressed model deepcopy-able

### Related tickets

#2766 

### Tests

`test_constant_folding` is updated
nikita-savelyevv pushed a commit to nikita-savelyevv/nncf that referenced this issue Dec 11, 2024
…it#3075)

### Changes

* TorchFX Unit tests are moved from
`torch._export.capture_pre_autograd_graph` to
`torch.export.export_for_training`
ALL REFERENCE GRAPHS WERE VALIDATED MANUALLY 
* BC types for `fuse_bn_node` are updated
* NNCFGraphBuilder is updated to support a batch-norm type with only one
output node (instead of three)
* Model extractor does not traverse down from constans to prevent
redundant nodes in the extracted model when the constant is shared
* `shared_constants_unification_transformation` is removed
* Tests which require `capture_pre_autograd_graph` are removed

### Reason for changes

* To migrate to the lates and recommended export method for TorchFX
backend

### Related tickets

openvinotoolkit#2766 

### Tests

test_shared_constants_unification_not_connected_const
post_training_quantization/540/ is finished successfully
nikita-savelyevv pushed a commit to nikita-savelyevv/nncf that referenced this issue Dec 11, 2024
### Changes

* Main README.md, Usage.md and post training quantization docs are
updated with info about the TorchFX

### Reason for changes

* To reflect new experimental features of TorchFX in the docs

### Related tickets

openvinotoolkit#2766
alexsu52 pushed a commit that referenced this issue Jan 21, 2025
)

### Changes

Introduction of `quantize_pt2e` method

### Reason for changes



### Related tickets

#2766 

### Tests
graph tests: `tests/torch/fx/test_quantizer.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants