-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Porting TF fake_quant_with_min_max functions #20641
Conversation
* adds fake_quant_with_min_max functions from TF to keras3
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20641 +/- ##
==========================================
+ Coverage 81.96% 81.98% +0.01%
==========================================
Files 554 554
Lines 51656 51704 +48
Branches 7996 8000 +4
==========================================
+ Hits 42342 42389 +47
- Misses 7367 7368 +1
Partials 1947 1947
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @doncarlos999
I have left some comments.
Additionally, I think we still need fake_quant_with_min_max_vars
, as it is used in TFMOT:
https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/quantization/keras/quant_ops.py#L340
@@ -100,3 +100,759 @@ def test_quantize_and_dequantize(self): | |||
) | |||
# A loose assertion due to an expected quantization error | |||
self.assertAllClose(qdq_values, values, atol=5e-1) | |||
|
|||
def _TestOp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use @parameterized.named_parameters
and named_product
to organize similar tests like this one:
https://github.com/keras-team/keras/blob/master/keras/src/ops/nn_test.py#L2355-L2365
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reduced the number of tests by merging the logic into a single function but if you still think it would help to use named_parameters to organize the tests then I can add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the convention in the codebase by using @parameterized.named_parameters
to organize the tests.
@james77777778 thank you for the review. I'm working on revisions now. Regarding the |
We can test the gradients of
You can refer to this test for an example: Using a different function, separate from the user-facing function, for testing purposes seems redundant and fragile to me. However, we should wait for calls from @fchollet |
I agree that having two separate functions is fragile I simply kept the functions separate as that was how they were tested in the Tensorflow repo. |
…nt_with_min_max_vars function
Refactor to use backend specific gradient functions in tests and merges logic into single function
@james77777778 I have addressed your previous comments. |
Thanks for the updates! @james77777778 should we merge? |
Sorry for the late reply. I missed the updates. Let me take a look... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doncarlos999
It’s great to merge three functions into one!
However, the implementation might be slightly different from TF’s because I had to set a higher tolerance to pass this simple test:
import numpy as np
import tensorflow as tf
from keras.src.quantizers.quantizers import fake_quant_with_min_max_args
from keras.src.quantizers.quantizers import fake_quant_with_min_max_vars
# tf.quantization.fake_quant_with_min_max_args
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 64]).astype("float32")
inp = tf.convert_to_tensor(x)
with tf.GradientTape() as tape:
tape.watch(inp)
tf_out = tf.quantization.fake_quant_with_min_max_args(
inp, min=-6.0, max=6.0, num_bits=8
)
tf_gradients = tape.gradient(tf_out, inp)
inp = tf.convert_to_tensor(x)
with tf.GradientTape() as tape:
tape.watch(inp)
keras_out = fake_quant_with_min_max_args(
inp, min_vals=-6.0, max_vals=6.0, num_bits=8
)
keras_gradients = tape.gradient(keras_out, inp)
np.testing.assert_allclose(tf_out.numpy(), keras_out.numpy(), atol=1e-5)
np.testing.assert_allclose(
tf_gradients.numpy(), keras_gradients.numpy(), atol=1e-5
)
# tf.quantization.fake_quant_with_min_max_vars
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 64]).astype("float32")
min = np.array(-5.5, dtype="float32")
max = np.array(7.5, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
tf_out = tf.quantization.fake_quant_with_min_max_vars(
tf_x, tf_min, tf_max, num_bits=8
)
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
keras_out = fake_quant_with_min_max_vars(tf_x, tf_min, tf_max, num_bits=8)
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])
for tf_g, keras_g in zip(tf_gradients, keras_gradients):
np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())
print("All passed!")
Could you please verify the impl?
Thanks!
@@ -100,3 +100,759 @@ def test_quantize_and_dequantize(self): | |||
) | |||
# A loose assertion due to an expected quantization error | |||
self.assertAllClose(qdq_values, values, atol=5e-1) | |||
|
|||
def _TestOp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the convention in the codebase by using @parameterized.named_parameters
to organize the tests.
@james77777778 Thanks for the comments. I will work on revisions. |
This PR addresses review feedback to fix implementation and to move tests to using named_parameters rather than individual functions.
@james77777778 I have fixed the implementation so that the simple test code that you provided works. I also updated the test to use Let me know if there are any further revisions are required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @doncarlos999 Thanks for the updates!
I think this PR is almost ready to be merged.
However, I failed to pass this test using tf.quantization.fake_quant_with_min_max_vars_per_channel
:
import numpy as np
import tensorflow as tf
from keras.src.quantizers.quantizers import (
fake_quant_with_min_max_vars_per_channel,
)
# tf.quantization.fake_quant_with_min_max_vars_per_channel
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 3]).astype("float32")
min = np.array([-5.5] * 3, dtype="float32")
max = np.array([7.5] * 3, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
tf_out = tf.quantization.fake_quant_with_min_max_vars_per_channel(
tf_x, tf_min, tf_max, num_bits=8
)
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
keras_out = fake_quant_with_min_max_vars_per_channel(
tf_x, tf_min, tf_max, num_bits=8, narrow_range=False
)
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])
for tf_g, keras_g in zip(tf_gradients, keras_gradients):
np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())
print("All passed!")
Could you confirm the impl in fake_quant_with_min_max_vars_per_channel
?
Thanks!
Adds axis param and fixes logic for per channel function
@james77777778 Thanks for the feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doncarlos999
On second thought, I think we only need to export one API since you’ve already combined the logic into a single impl.
Here’s my impl based on yours:
...
from keras.src.backend.common.backend_utils import canonicalize_axis
...
@keras_export("keras.quantizers.fake_quant_with_min_max_vars")
def fake_quant_with_min_max_vars(
inputs,
min_vals,
max_vals,
num_bits,
narrow_range=False,
axis=None,
):
"""Perform per-tensor or per-channel fake quantization.
`[min_vals, max_vals]` define the clamping range for the `inputs`.
The `inputs` are quantized into the quantization range:
- `[0, 2^num_bits - 1]` when `narrow_range=False`
- `[1, 2^num_bits - 1]` when `narrow_range=True`
After quantization, the values are dequantized and output as floats within
the `[min_vals, max_vals]` interval.
This operation supports gradient computation, allowing `min_vals` and
`max_vals` to be trained.
Args:
inputs: Input tensor of float dtype.
min_vals: A global minimum scalar or a per-channel minimum tensor.
max_vals: A global maximum scalar or a per-channel maximum tensor.
num_bits: Quantization bit width (e.g., `8` for int8).
narrow_range: Whether to use narrow quantization range.
axis: Axis along which to perform per-channel quantization. If `None`,
per-tensor quantization is performed. Defaults to `None`.
Returns:
Fake-quantized tensor
"""
inputs = ops.convert_to_tensor(inputs)
min_vals = ops.convert_to_tensor(min_vals)
max_vals = ops.convert_to_tensor(max_vals)
if axis is not None:
axis = canonicalize_axis(axis, inputs.ndim)
@ops.custom_gradient
def _fake_quant_with_min_max_vars_per_channel(x, min_val, max_val):
# Calculate quantization parameters for all channels at once
nudged_min, nudged_max, scale, inv_scale = adjust_and_nudge(
min_val, max_val, num_bits, narrow_range
)
quant_zero = ops.floor(
ops.add(ops.multiply(-nudged_min, inv_scale), 0.5)
)
x_clamped = ops.clip(x, nudged_min, nudged_max)
x_clamped_shifted = ops.subtract(x_clamped, nudged_min)
result = ops.multiply(
ops.floor(
ops.add(
ops.subtract(
ops.multiply(x_clamped_shifted, inv_scale), quant_zero
),
0.5,
)
),
scale,
)
# Create gradient mask for all channels
masks = ops.cast(
(x >= nudged_min) & (x <= nudged_max),
dtype="float32",
)
def grad(*args, upstream=None):
if upstream is None:
(upstream,) = args
# Gradient for x
dx = ops.multiply(upstream, masks)
axes = [i for i in range(len(dx.shape)) if i != axis]
# Gradient for min_val
# When x is clipped to min, the gradient flows to min_val
min_mask = ops.cast(x <= nudged_min, dtype="float32")
grad_min = ops.multiply(upstream, min_mask)
if axis is not None:
grad_min = ops.sum(grad_min, axis=axes)
else:
grad_min = ops.sum(grad_min)
# Gradient for max_val
# When x is clipped to max, the gradient flows to max_val
max_mask = ops.cast(x >= nudged_max, dtype="float32")
grad_max = ops.multiply(upstream, max_mask)
if axis is not None:
grad_max = ops.sum(grad_max, axis=axes)
else:
grad_max = ops.sum(grad_max)
return dx, grad_min, grad_max
return result, grad
return _fake_quant_with_min_max_vars_per_channel(inputs, min_vals, max_vals)
The basic usage:
import numpy as np
import tensorflow as tf
from keras.src.quantizers.quantizers import fake_quant_with_min_max_vars
# tf.quantization.fake_quant_with_min_max_vars
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 64]).astype("float32")
min = np.array(-5.5, dtype="float32")
max = np.array(7.5, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
tf_out = tf.quantization.fake_quant_with_min_max_vars(
tf_x, tf_min, tf_max, num_bits=8
)
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
keras_out = fake_quant_with_min_max_vars(tf_x, tf_min, tf_max, num_bits=8)
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])
for tf_g, keras_g in zip(tf_gradients, keras_gradients):
np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())
# tf.quantization.fake_quant_with_min_max_vars_per_channel
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 3]).astype("float32")
min = np.array([-5.5] * 3, dtype="float32")
max = np.array([7.5] * 3, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
tf_out = tf.quantization.fake_quant_with_min_max_vars_per_channel(
tf_x, tf_min, tf_max, num_bits=8
)
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
tape.watch(tf_x)
tape.watch(tf_min)
tape.watch(tf_max)
keras_out = fake_quant_with_min_max_vars(
tf_x, tf_min, tf_max, num_bits=8, narrow_range=False, axis=-1
)
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])
for tf_g, keras_g in zip(tf_gradients, keras_gradients):
np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())
print("All passed!")
WDYT?
@james77777778 That looks good to me, do you want me to update the implementation? Or will you push those changes? |
Feel free to update your impl if you think it looks good! |
@james77777778 Done. |
Shouldn't EDITED: |
Ah yes, my bad. Fixed |
LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to both of you! LGTM
Based on the discussion here: #20319 I started porting the
fake_quant_with_min_max
functions from tensorflow to keras3.This PR contains those ported functions and the relevant tests from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/tests/fake_quant_ops_test.py.
I didn't implement
tf.quantization.fake_quant_with_min_max_vars
as it looks the same astf.quantization.fake_quant_with_min_max_args
. But, I can add this one too if required.For the CLA I am waiting on our CTO to add me to the Edge Impulse <-> Google CLA. But I figured that I can work on revisions to the PR in the meantime.CC: @matpalm, @dansitu, @james77777778