[Good First Issue][NNCF]: Add ONNX support of data-free Weight Compression Algorithm #3273

kshpv · 2025-02-12T11:09:21Z

Context

NNCF supports OpenVINO, Torch and TorchFX backends for weight compression algorithm - nncf.compress_weights(). The goal of this issue is to expand support to the ONNX backend. The structure of the NNCF code is designed in a way that this could be done quite straightforward, but it requires attention to detail.

Very important: the runtime target is OpenVINOExecutionProvider provider for ONNXRuntime

What needs to be done?

The task is to implement data-free int8 and uint8 Weight Compression algorithm support which includes:

Implement WeightCompressionAlgoBackend for ONNX:
We already have this implemented for OpenVINO, Torch, and TorchFX, so you can use those as references.
Some methods like insert_adapters, _get_statistics_for_weights_compression, and dump_parameters can be skipped.
The goal is to make sure we can run nncf.compress_weights(onnx_model) and get a ONNX model with compressed weights in int8, uin8 formats.
Test the Compression:
Ensure that running nncf.compress_weights(onnx_model) actually produces a compressed ONNX model.
Add Initial Tests:
This is super important to prove that the algorithm works correctly.
There are two types of tests we need:
Conformance Tests: Add a tinyllama_data_free case for ONNX, similar to what we have for OpenVINO. Note: the test, a good starting point is to read readme
Unit Tests: We'll need to add some unit tests. Note: this is a unitetsts for OpenVINO, Torch and TorchFX

This can be split into some subtasks to develop/review faster. This is up to you and can be discussed.
If you have any questions or need guidance, feel free to ask in the comments or reach out to the maintainers.

Example Pull Requests

Adding support of data free for Torch - #2333
Adding support of data free for TorchFX - #2891

Resources

Contribution guide - start here!
Intel DevHub Discord channel - engage in discussions, ask questions and talk to OpenVINO developers
How to link your Pull Request to an issue

Contact points

@kshpv

The description is not full and will be updated

The text was updated successfully, but these errors were encountered:

XueSongTap · 2025-02-13T04:35:31Z

Hi @kshpv I'd like to work on this issue. I have experience with ONNX and can help implement the weight compression algorithm support for the ONNX backend.

alexsu52 · 2025-02-13T15:59:22Z

@XueSongTap, thank you for your interest! Assigned to you.

kshpv · 2025-02-17T16:14:19Z

Very important requirements to keep in mind:

The weights compression scheme should be as follows: DequantizeLinear(weight, ...) -> MatMul/Conv/....

The block_size attribute is only supported for ONNX opset >= 21. This leads to the next requirement, Weight Compression will only support ONNX model starting from opset 21. For the rest of the models algorithm should not execute with appropriate message.

kshpv · 2025-02-17T17:43:39Z

I would like to assist with Step 2 - Test the Compression:

I have prepared a set of scripts that can be utilized with the following workflow:

Export the ONNX model, such as BERT, using the optimum and transformers libraries.
Measure the execution time of the original model over N iterations.
Compress the ONNX model using NNCF (Neural Network Compression Framework).
Measure the execution time of the compressed model over N iterations.

Let me know when we are good with the step 1

alexsu52 · 2025-02-21T05:21:25Z

@XueSongTap, how long do you estimate it will take you to complete this task?

XueSongTap · 2025-02-23T16:06:04Z

Hi @alexsu52 based on the requirements and my current understanding, I estimate it will take around 2-3 weeks to complete this task:

Week 1:

Study existing implementations (OpenVINO, Torch backends)
Start implementing basic WeightCompressionAlgoBackend for ONNX

Week 2:

Complete core implementation
Add initial unit tests
Handle ONNX opset version checks and requirements

Week 3 (if needed):

Implement conformance tests
Performance testing and optimization
Documentation and code review fixes

I plan to provide regular updates and can adjust the timeline based on feedback. Please let me know if this timeline works or if you have different expectations.

alexsu52 · 2025-02-24T09:08:16Z

@XueSongTap, sounds good! We look forward to updates from you.

If you have any questions, don't hesitate to ask them.

kshpv added the good first issue Good for newcomers label Feb 12, 2025

alexsu52 added this to Good first issues Feb 12, 2025

github-project-automation bot moved this to Contributors Needed in Good first issues Feb 12, 2025

alexsu52 assigned XueSongTap Feb 13, 2025

alexsu52 moved this from Contributors Needed to Assigned in Good first issues Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Good First Issue][NNCF]: Add ONNX support of data-free Weight Compression Algorithm #3273

[Good First Issue][NNCF]: Add ONNX support of data-free Weight Compression Algorithm #3273

kshpv commented Feb 12, 2025 •

edited

Loading

XueSongTap commented Feb 13, 2025

alexsu52 commented Feb 13, 2025

kshpv commented Feb 17, 2025 •

edited

Loading

kshpv commented Feb 17, 2025

alexsu52 commented Feb 21, 2025

XueSongTap commented Feb 23, 2025

alexsu52 commented Feb 24, 2025

[Good First Issue][NNCF]: Add ONNX support of data-free Weight Compression Algorithm #3273

[Good First Issue][NNCF]: Add ONNX support of data-free Weight Compression Algorithm #3273

Comments

kshpv commented Feb 12, 2025 • edited Loading

Context

What needs to be done?

Example Pull Requests

Resources

Contact points

XueSongTap commented Feb 13, 2025

alexsu52 commented Feb 13, 2025

kshpv commented Feb 17, 2025 • edited Loading

kshpv commented Feb 17, 2025

alexsu52 commented Feb 21, 2025

XueSongTap commented Feb 23, 2025

alexsu52 commented Feb 24, 2025

kshpv commented Feb 12, 2025 •

edited

Loading

kshpv commented Feb 17, 2025 •

edited

Loading