-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant fold initializer for DQ node #23366
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
// Dequantize initializer using ORT ConstantFolding optimizer for dq node if initializer has specific(? TBD) data type. | ||
// This feature is required by some NPU's. | ||
// "0": disable. ORT doesn't constant fold the DQ node. [DEFAULT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Dequantize initializer using ORT ConstantFolding optimizer for dq node if initializer has specific(? TBD) data type. | |
// This feature is required by some NPU's. | |
// "0": disable. ORT doesn't constant fold the DQ node. [DEFAULT] | |
// Dequantize initializer using ORT ConstantFolding optimizer for dq node if initializer has specific(? TBD) data type. | |
// This feature is required by some NPU's. | |
// "0": disable. ORT doesn't constant fold the DQ node. [DEFAULT] |
false /*skip_dequantize_linear*/, | ||
false /*dequantize_initializer_for_dequantize_linear*/, | ||
empty_config_options), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false /*skip_dequantize_linear*/, | |
false /*dequantize_initializer_for_dequantize_linear*/, | |
empty_config_options), | |
false /*skip_dequantize_linear*/, | |
false /*dequantize_initializer_for_dequantize_linear*/, | |
empty_config_options), |
I'm working on a prototype which makes ORT capable of enabling further optimizations for EPs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
transformers.emplace_back(std::make_unique<QDQPropagationTransformer>()); | ||
transformers.emplace_back(std::make_unique<WeightBiasQuantization>()); | ||
//transformers.emplace_back(std::make_unique<WeightBiasQuantization>()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformers.emplace_back(std::make_unique<QDQPropagationTransformer>()); | |
transformers.emplace_back(std::make_unique<WeightBiasQuantization>()); | |
//transformers.emplace_back(std::make_unique<WeightBiasQuantization>()); | |
transformers.emplace_back(std::make_unique<QDQPropagationTransformer>()); | |
// transformers.emplace_back(std::make_unique<WeightBiasQuantization>()); | |
Some NPUs require weights/initializers to be in FP32, FP16, INT8, UINT8 and INT4 if consumed by Q/DQ nodes.
In other words, ORT needs to dequantize "specific data type" initializers to FP32 for them.
This PR leverages ORT ConstantFolding optimizer to dequantize initializer for DQ node if the initializer has a specific data type.