Fix: correct output scale compute #1077

Giuseppe5 · 2024-10-28T09:31:55Z

Reason for this PR

There is a bug when computing output scale factor, when the input scale is quantized per-row and the weights are quantized per channel

Changes Made in this PR

Resolves the issue by skipping scale factor computation in this case. This means that we cannot have a properly formed quant tensor with this quantization strategy.
In theory, if I am not hallucinating, the math would results in a per-element scale factor, but we need to decide whether it's worth keeping track of this since any operation with such QuantTensor would results in dequantize.

Testing Summary

None

Risk Highlight

NA

Checklist

Code comments added to any hard-to-understand areas, if applicable.
Changes generate no new warnings.
Updated any relevant tests, if applicable.
No conflicts with destination dev branch.
I reviewed my own code changes.
Initial CI/CD passing.
1+ reviews given, and any review issues addressed and approved.
Post-review full CI/CD passing.

Giuseppe5 added 3 commits October 26, 2024 00:27

Fix: correct output scale compute

8b8877c

precommit

f0e9a85

Fix

dd05b52

Giuseppe5 requested review from nickfraser and removed request for nickfraser October 28, 2024 20:47

Update int_torch_handler.py

60367a6

Giuseppe5 requested review from nickfraser and removed request for nickfraser October 30, 2024 13:46

Giuseppe5 added the next release PRs which should be merged for the next release label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: correct output scale compute #1077

Fix: correct output scale compute #1077

Giuseppe5 commented Oct 28, 2024 •

edited

Loading

Fix: correct output scale compute #1077

Are you sure you want to change the base?

Fix: correct output scale compute #1077

Conversation

Giuseppe5 commented Oct 28, 2024 • edited Loading

Reason for this PR

Changes Made in this PR

Testing Summary

Risk Highlight

Checklist

Giuseppe5 commented Oct 28, 2024 •

edited

Loading