Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting TF fake_quant_with_min_max functions #20641

Merged
merged 16 commits into from
Jan 15, 2025

Conversation

doncarlos999
Copy link
Contributor

@doncarlos999 doncarlos999 commented Dec 13, 2024

Based on the discussion here: #20319 I started porting the fake_quant_with_min_max functions from tensorflow to keras3.
This PR contains those ported functions and the relevant tests from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/tests/fake_quant_ops_test.py.

I didn't implement tf.quantization.fake_quant_with_min_max_vars as it looks the same as tf.quantization.fake_quant_with_min_max_args. But, I can add this one too if required.

For the CLA I am waiting on our CTO to add me to the Edge Impulse <-> Google CLA. But I figured that I can work on revisions to the PR in the meantime.

CC: @matpalm, @dansitu, @james77777778

* adds fake_quant_with_min_max functions from TF to keras3
Copy link

google-cla bot commented Dec 13, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 97.91667% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.98%. Comparing base (e345cbd) to head (0da24e2).

Files with missing lines Patch % Lines
keras/api/_tf_keras/keras/quantizers/__init__.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20641      +/-   ##
==========================================
+ Coverage   81.96%   81.98%   +0.01%     
==========================================
  Files         554      554              
  Lines       51656    51704      +48     
  Branches     7996     8000       +4     
==========================================
+ Hits        42342    42389      +47     
- Misses       7367     7368       +1     
  Partials     1947     1947              
Flag Coverage Δ
keras 81.80% <95.83%> (+0.01%) ⬆️
keras-jax 64.05% <87.50%> (+0.02%) ⬆️
keras-numpy 58.92% <66.66%> (+<0.01%) ⬆️
keras-openvino 29.94% <12.50%> (-0.02%) ⬇️
keras-tensorflow 64.76% <95.83%> (+0.02%) ⬆️
keras-torch 64.09% <85.41%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
Copy link
Contributor

@james77777778 james77777778 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @doncarlos999
I have left some comments.

Additionally, I think we still need fake_quant_with_min_max_vars, as it is used in TFMOT:
https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/quantization/keras/quant_ops.py#L340

keras/api/_tf_keras/keras/quantizers/__init__.py Outdated Show resolved Hide resolved
keras/api/_tf_keras/keras/quantizers/__init__.py Outdated Show resolved Hide resolved
keras/api/quantizers/__init__.py Outdated Show resolved Hide resolved
keras/api/quantizers/__init__.py Outdated Show resolved Hide resolved
keras/src/quantizers/__init__.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
@@ -100,3 +100,759 @@ def test_quantize_and_dequantize(self):
)
# A loose assertion due to an expected quantization error
self.assertAllClose(qdq_values, values, atol=5e-1)

def _TestOp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use @parameterized.named_parameters and named_product to organize similar tests like this one:
https://github.com/keras-team/keras/blob/master/keras/src/ops/nn_test.py#L2355-L2365

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reduced the number of tests by merging the logic into a single function but if you still think it would help to use named_parameters to organize the tests then I can add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow the convention in the codebase by using @parameterized.named_parameters to organize the tests.

keras/src/quantizers/quantizers_test.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers_test.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers_test.py Outdated Show resolved Hide resolved
@doncarlos999
Copy link
Contributor Author

@james77777778 thank you for the review. I'm working on revisions now.

Regarding the *_gradient functions, I added those as a way to test that the gradients that come from the main functions were being calculated correctly. Should we keep them just for testing purposes but not expose them in the public facing API? If not I will remove them.

@james77777778
Copy link
Contributor

Regarding the *_gradient functions, I added those as a way to test that the gradients that come from the main functions were being calculated correctly. Should we keep them just for testing purposes but not expose them in the public facing API? If not I will remove them.

We can test the gradients of fake_* functions using:

  • tensorflow: tf.GradientTape() + tape.gradient
  • torch: loss.backward() + variable.grad
  • jax: jax.grad

You can refer to this test for an example:
https://github.com/keras-team/keras/blob/master/keras/src/layers/core/dense_test.py#L584-L649

Using a different function, separate from the user-facing function, for testing purposes seems redundant and fragile to me. However, we should wait for calls from @fchollet

@doncarlos999
Copy link
Contributor Author

I agree that having two separate functions is fragile I simply kept the functions separate as that was how they were tested in the Tensorflow repo.
I will start adding tests based on your example in the meantime. Thank you.

@doncarlos999
Copy link
Contributor Author

@james77777778 I have addressed your previous comments.
I refactored the code into a single function to avoid duplication as the logic for the per_channel function on a 1 length array is the same as the other separate functions. If we need a separate function for handling parameters being passed in as float/int rather than single length arrays of floats/ints, then I can add the other function back (I have a previous commit that I can pull it back from).

@fchollet
Copy link
Collaborator

Thanks for the updates!

@james77777778 should we merge?

@james77777778
Copy link
Contributor

Thanks for the updates!

@james77777778 should we merge?

Sorry for the late reply. I missed the updates.

Let me take a look...

Copy link
Contributor

@james77777778 james77777778 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doncarlos999
It’s great to merge three functions into one!

However, the implementation might be slightly different from TF’s because I had to set a higher tolerance to pass this simple test:

import numpy as np
import tensorflow as tf

from keras.src.quantizers.quantizers import fake_quant_with_min_max_args
from keras.src.quantizers.quantizers import fake_quant_with_min_max_vars

# tf.quantization.fake_quant_with_min_max_args
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 64]).astype("float32")
inp = tf.convert_to_tensor(x)
with tf.GradientTape() as tape:
    tape.watch(inp)
    tf_out = tf.quantization.fake_quant_with_min_max_args(
        inp, min=-6.0, max=6.0, num_bits=8
    )
tf_gradients = tape.gradient(tf_out, inp)

inp = tf.convert_to_tensor(x)
with tf.GradientTape() as tape:
    tape.watch(inp)
    keras_out = fake_quant_with_min_max_args(
        inp, min_vals=-6.0, max_vals=6.0, num_bits=8
    )
keras_gradients = tape.gradient(keras_out, inp)

np.testing.assert_allclose(tf_out.numpy(), keras_out.numpy(), atol=1e-5)
np.testing.assert_allclose(
    tf_gradients.numpy(), keras_gradients.numpy(), atol=1e-5
)

# tf.quantization.fake_quant_with_min_max_vars
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 64]).astype("float32")
min = np.array(-5.5, dtype="float32")
max = np.array(7.5, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    tf_out = tf.quantization.fake_quant_with_min_max_vars(
        tf_x, tf_min, tf_max, num_bits=8
    )
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])

tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    keras_out = fake_quant_with_min_max_vars(tf_x, tf_min, tf_max, num_bits=8)
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])

for tf_g, keras_g in zip(tf_gradients, keras_gradients):
    np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())

print("All passed!")

Could you please verify the impl?
Thanks!

keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
@@ -100,3 +100,759 @@ def test_quantize_and_dequantize(self):
)
# A loose assertion due to an expected quantization error
self.assertAllClose(qdq_values, values, atol=5e-1)

def _TestOp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow the convention in the codebase by using @parameterized.named_parameters to organize the tests.

@doncarlos999
Copy link
Contributor Author

@james77777778 Thanks for the comments. I will work on revisions.

This PR addresses review feedback to fix implementation and to move tests to using named_parameters rather  than individual functions.
@doncarlos999
Copy link
Contributor Author

doncarlos999 commented Jan 14, 2025

@james77777778 I have fixed the implementation so that the simple test code that you provided works. I also updated the test to use parameterized.named_parameters() rather than individual functions. However, I couldn't see a way to use named_product() due to the tests requiring the expected values to be provided.

Let me know if there are any further revisions are required.

Copy link
Contributor

@james77777778 james77777778 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @doncarlos999 Thanks for the updates!

I think this PR is almost ready to be merged.

However, I failed to pass this test using tf.quantization.fake_quant_with_min_max_vars_per_channel:

import numpy as np
import tensorflow as tf

from keras.src.quantizers.quantizers import (
    fake_quant_with_min_max_vars_per_channel,
)


# tf.quantization.fake_quant_with_min_max_vars_per_channel
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 3]).astype("float32")
min = np.array([-5.5] * 3, dtype="float32")
max = np.array([7.5] * 3, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    tf_out = tf.quantization.fake_quant_with_min_max_vars_per_channel(
        tf_x, tf_min, tf_max, num_bits=8
    )
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])

tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    keras_out = fake_quant_with_min_max_vars_per_channel(
        tf_x, tf_min, tf_max, num_bits=8, narrow_range=False
    )
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])

for tf_g, keras_g in zip(tf_gradients, keras_gradients):
    np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())

print("All passed!")

Could you confirm the impl in fake_quant_with_min_max_vars_per_channel?
Thanks!

keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
keras/src/quantizers/quantizers.py Outdated Show resolved Hide resolved
Adds axis param and fixes logic for per channel function
@doncarlos999
Copy link
Contributor Author

@james77777778 Thanks for the feedback.
I went ahead and added the axis parameter in this PR as it helped fix the discrepancy with tf.quantization.fake_quant_with_min_max_vars_per_channel. I also addressed the other comments.

Copy link
Contributor

@james77777778 james77777778 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doncarlos999
On second thought, I think we only need to export one API since you’ve already combined the logic into a single impl.

Here’s my impl based on yours:

...

from keras.src.backend.common.backend_utils import canonicalize_axis

...

@keras_export("keras.quantizers.fake_quant_with_min_max_vars")
def fake_quant_with_min_max_vars(
    inputs,
    min_vals,
    max_vals,
    num_bits,
    narrow_range=False,
    axis=None,
):
    """Perform per-tensor or per-channel fake quantization.

    `[min_vals, max_vals]` define the clamping range for the `inputs`.

    The `inputs` are quantized into the quantization range:
    - `[0, 2^num_bits - 1]` when `narrow_range=False`
    - `[1, 2^num_bits - 1]` when `narrow_range=True`

    After quantization, the values are dequantized and output as floats within
    the `[min_vals, max_vals]` interval.

    This operation supports gradient computation, allowing `min_vals` and
    `max_vals` to be trained.

    Args:
        inputs: Input tensor of float dtype.
        min_vals: A global minimum scalar or a per-channel minimum tensor.
        max_vals: A global maximum scalar or a per-channel maximum tensor.
        num_bits: Quantization bit width (e.g., `8` for int8).
        narrow_range: Whether to use narrow quantization range.
        axis: Axis along which to perform per-channel quantization. If `None`,
              per-tensor quantization is performed. Defaults to `None`.

    Returns:
        Fake-quantized tensor
    """
    inputs = ops.convert_to_tensor(inputs)
    min_vals = ops.convert_to_tensor(min_vals)
    max_vals = ops.convert_to_tensor(max_vals)

    if axis is not None:
        axis = canonicalize_axis(axis, inputs.ndim)

    @ops.custom_gradient
    def _fake_quant_with_min_max_vars_per_channel(x, min_val, max_val):
        # Calculate quantization parameters for all channels at once
        nudged_min, nudged_max, scale, inv_scale = adjust_and_nudge(
            min_val, max_val, num_bits, narrow_range
        )

        quant_zero = ops.floor(
            ops.add(ops.multiply(-nudged_min, inv_scale), 0.5)
        )
        x_clamped = ops.clip(x, nudged_min, nudged_max)
        x_clamped_shifted = ops.subtract(x_clamped, nudged_min)
        result = ops.multiply(
            ops.floor(
                ops.add(
                    ops.subtract(
                        ops.multiply(x_clamped_shifted, inv_scale), quant_zero
                    ),
                    0.5,
                )
            ),
            scale,
        )

        # Create gradient mask for all channels
        masks = ops.cast(
            (x >= nudged_min) & (x <= nudged_max),
            dtype="float32",
        )

        def grad(*args, upstream=None):
            if upstream is None:
                (upstream,) = args

            # Gradient for x
            dx = ops.multiply(upstream, masks)
            axes = [i for i in range(len(dx.shape)) if i != axis]
            # Gradient for min_val
            # When x is clipped to min, the gradient flows to min_val
            min_mask = ops.cast(x <= nudged_min, dtype="float32")
            grad_min = ops.multiply(upstream, min_mask)
            if axis is not None:
                grad_min = ops.sum(grad_min, axis=axes)
            else:
                grad_min = ops.sum(grad_min)

            # Gradient for max_val
            # When x is clipped to max, the gradient flows to max_val
            max_mask = ops.cast(x >= nudged_max, dtype="float32")
            grad_max = ops.multiply(upstream, max_mask)
            if axis is not None:
                grad_max = ops.sum(grad_max, axis=axes)
            else:
                grad_max = ops.sum(grad_max)

            return dx, grad_min, grad_max

        return result, grad

    return _fake_quant_with_min_max_vars_per_channel(inputs, min_vals, max_vals)

The basic usage:

import numpy as np
import tensorflow as tf

from keras.src.quantizers.quantizers import fake_quant_with_min_max_vars

# tf.quantization.fake_quant_with_min_max_vars
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 64]).astype("float32")
min = np.array(-5.5, dtype="float32")
max = np.array(7.5, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    tf_out = tf.quantization.fake_quant_with_min_max_vars(
        tf_x, tf_min, tf_max, num_bits=8
    )
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])

tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    keras_out = fake_quant_with_min_max_vars(tf_x, tf_min, tf_max, num_bits=8)
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])

for tf_g, keras_g in zip(tf_gradients, keras_gradients):
    np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())

# tf.quantization.fake_quant_with_min_max_vars_per_channel
x = np.random.uniform(low=-10, high=10, size=[10, 32, 32, 3]).astype("float32")
min = np.array([-5.5] * 3, dtype="float32")
max = np.array([7.5] * 3, dtype="float32")
tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    tf_out = tf.quantization.fake_quant_with_min_max_vars_per_channel(
        tf_x, tf_min, tf_max, num_bits=8
    )
tf_gradients = tape.gradient(tf_out, [tf_x, tf_min, tf_max])

tf_x = tf.convert_to_tensor(x)
tf_min = tf.Variable(min)
tf_max = tf.Variable(max)
with tf.GradientTape() as tape:
    tape.watch(tf_x)
    tape.watch(tf_min)
    tape.watch(tf_max)
    keras_out = fake_quant_with_min_max_vars(
        tf_x, tf_min, tf_max, num_bits=8, narrow_range=False, axis=-1
    )
keras_gradients = tape.gradient(keras_out, [tf_x, tf_min, tf_max])

for tf_g, keras_g in zip(tf_gradients, keras_gradients):
    np.testing.assert_allclose(tf_g.numpy(), keras_g.numpy())

print("All passed!")

WDYT?

@doncarlos999
Copy link
Contributor Author

@james77777778 That looks good to me, do you want me to update the implementation? Or will you push those changes?

@james77777778
Copy link
Contributor

@james77777778 That looks good to me, do you want me to update the implementation? Or will you push those changes?

Feel free to update your impl if you think it looks good!

@doncarlos999
Copy link
Contributor Author

@james77777778 Done.

@james77777778
Copy link
Contributor

james77777778 commented Jan 15, 2025

@james77777778 Done.

Shouldn't fake_quant_with_min_max_args and fake_quant_with_min_max_vars_per_channel be removed?

EDITED:
Once the CI passes, this should be ready to merge.

@doncarlos999
Copy link
Contributor Author

Shouldn't fake_quant_with_min_max_args and fake_quant_with_min_max_vars_per_channel be removed?

Ah yes, my bad. Fixed

@james77777778
Copy link
Contributor

LGTM!
Let's wait for @fchollet 's decision

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to both of you! LGTM

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jan 15, 2025
@fchollet fchollet merged commit 6557961 into keras-team:master Jan 15, 2025
7 checks passed
@google-ml-butler google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase kokoro:force-run labels Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Merged
Development

Successfully merging this pull request may close these issues.

5 participants