QuantLayer to automatically expose underlying quant metadata from proxies #1052

Giuseppe5 · 2024-10-14T09:03:00Z

Is your feature request related to a problem? Please describe.
In previous releases (0.10 and before), quant layers would expose certain quantization metadata of underlying proxies

quant_conv = qnn.QuantConv2d(..., weight_quant=Int8WeightPerTensorFloat)
scale = quant_conv.quant_weight_scale()

This has been removed in 0.11 because of the need to implement new QuantTensors with varying quant metadata fields.
All the info is still available but they are only exposed at proxy level.

Describe the solution you'd like
Given a set of quant metadata exposed by the proxy, the layers should be able to automagically expose the methods associated with the various proxies

rk119 · 2024-10-14T09:35:08Z

Hi,

I would like to work on this issue if it is possible :)

Giuseppe5 · 2024-10-14T09:41:24Z

All help is welcomed!

I would recommend to check the old releases to see what the interface for quant metadata used to look like (but not the implementation).
E.g.:

brevitas/tests/brevitas/proxy/test_weight_scaling.py

Line 24 in 2004568

new_scale = q_linear.quant_weight_scale()

The idea is that if you instantiate a MX Float weight quantizer, you should be able to do:

q_linear.quant_weight_exponent_bit_width()

Even though QuantLayer won't have any hardcoded quant_weight_exponent_bit_width.

Sorry for repeating myself, and please feel free to ask more questions if needed

rk119 · 2024-10-14T10:04:48Z

Alright, I'll start by getting familiar with the codebase and the past releases, and then I'll dive into it.

Thank you for your guidance and providing a head start! I'll reach out with any questions if needed.

rk119 · 2024-10-17T08:28:34Z

Hi @Giuseppe5,

I apologize for the delay in completing this issue. As I am new to Brevitas and still learning how to contribute to open-source projects, I am taking some time to thoroughly understand the repo.

I have been exploring the differences between the previous and current versions of Brevitas. In the past version, specifically in parameter.py, the quant_weight method returned a QuantTensor (reconstructed weights after quantization is applied) along with metadata, and there were method definitions to access this metadata directly. In the current version, in parameter_quant.py, the metadata is accessed directly from the Proxy (as you mentioned).

I wanted to confirm that I am on the right track with this understanding. Please correct me if I am mistaken in any way.

Since, there are new QuantTensor classes with different arguments compared to the earlier version, I am still exploring how to implement the solution. I plan to contact you soon with proper questions and a solution before submitting a pull request. I would greatly appreciate your guidance.

Giuseppe5 · 2024-10-18T07:10:22Z

Hello!
First of all, absolutely no rush from our side. We are grateful for your willing to contribute, and we appreciate that we would need to expand our developer documentation to allow new people to familiarize faster with the codebase.

You are on the correct track. The idea in my mind is that the proxy exposes a methods that tells the layer which quant metadata is available, and then the layer generates at runtime the methods necessary to directly access that metadata.

The main issue could be caused by the bias, since bias quantization might require an external parameter for quantization (external scale).
Let's start with input/weights which is also the most common use case and then see if/how it can be extended to bias.

rk119 · 2024-10-20T20:22:15Z

@Giuseppe5

Hello! First of all, absolutely no rush from our side. We are grateful for your willing to contribute, and we appreciate that we would need to expand our developer documentation to allow new people to familiarize faster with the codebase.

I'd love to help out in the future with this if time permits, once I am well versed with the codebase.

You are on the correct track. The idea in my mind is that the proxy exposes a methods that tells the layer which quant metadata is available, and then the layer generates at runtime the methods necessary to directly access that metadata.

Ah yes, metaprogramming. I will look into some Python docs to familiarize myself with it and best approaches.

The main issue could be caused by the bias, since bias quantization might require an external parameter for quantization (external scale). Let's start with input/weights which is also the most common use case and then see if/how it can be extended to bias.

Alright 👍

Giuseppe5 added enhancement New feature or request good first issue Good for newcomers labels Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QuantLayer to automatically expose underlying quant metadata from proxies #1052

QuantLayer to automatically expose underlying quant metadata from proxies #1052

Giuseppe5 commented Oct 14, 2024

rk119 commented Oct 14, 2024

Giuseppe5 commented Oct 14, 2024

rk119 commented Oct 14, 2024

rk119 commented Oct 17, 2024 •

edited

Loading

Giuseppe5 commented Oct 18, 2024

rk119 commented Oct 20, 2024 •

edited

Loading

QuantLayer to automatically expose underlying quant metadata from proxies #1052

QuantLayer to automatically expose underlying quant metadata from proxies #1052

Comments

Giuseppe5 commented Oct 14, 2024

rk119 commented Oct 14, 2024

Giuseppe5 commented Oct 14, 2024

rk119 commented Oct 14, 2024

rk119 commented Oct 17, 2024 • edited Loading

Giuseppe5 commented Oct 18, 2024

rk119 commented Oct 20, 2024 • edited Loading

rk119 commented Oct 17, 2024 •

edited

Loading

rk119 commented Oct 20, 2024 •

edited

Loading