Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant fold initializer for DQ node #23366

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

chilo-ms
Copy link
Contributor

Some NPUs require weights/initializers to be in FP32, FP16, INT8, UINT8 and INT4 if consumed by Q/DQ nodes.
In other words, ORT needs to dequantize "specific data type" initializers to FP32 for them.

This PR leverages ORT ConstantFolding optimizer to dequantize initializer for DQ node if the initializer has a specific data type.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment on lines +304 to +306
// Dequantize initializer using ORT ConstantFolding optimizer for dq node if initializer has specific(? TBD) data type.
// This feature is required by some NPU's.
// "0": disable. ORT doesn't constant fold the DQ node. [DEFAULT]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Dequantize initializer using ORT ConstantFolding optimizer for dq node if initializer has specific(? TBD) data type.
// This feature is required by some NPU's.
// "0": disable. ORT doesn't constant fold the DQ node. [DEFAULT]
// Dequantize initializer using ORT ConstantFolding optimizer for dq node if initializer has specific(? TBD) data type.
// This feature is required by some NPU's.
// "0": disable. ORT doesn't constant fold the DQ node. [DEFAULT]

Comment on lines +736 to +738
false /*skip_dequantize_linear*/,
false /*dequantize_initializer_for_dequantize_linear*/,
empty_config_options),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
false /*skip_dequantize_linear*/,
false /*dequantize_initializer_for_dequantize_linear*/,
empty_config_options),
false /*skip_dequantize_linear*/,
false /*dequantize_initializer_for_dequantize_linear*/,
empty_config_options),

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Jan 15, 2025

I'm working on a prototype which makes ORT capable of enabling further optimizations for EPs.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment on lines 251 to 253
transformers.emplace_back(std::make_unique<QDQPropagationTransformer>());
transformers.emplace_back(std::make_unique<WeightBiasQuantization>());
//transformers.emplace_back(std::make_unique<WeightBiasQuantization>());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
transformers.emplace_back(std::make_unique<QDQPropagationTransformer>());
transformers.emplace_back(std::make_unique<WeightBiasQuantization>());
//transformers.emplace_back(std::make_unique<WeightBiasQuantization>());
transformers.emplace_back(std::make_unique<QDQPropagationTransformer>());
// transformers.emplace_back(std::make_unique<WeightBiasQuantization>());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant