Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuantLayer to automatically expose underlying quant metadata from proxies #1052

Open
Giuseppe5 opened this issue Oct 14, 2024 · 6 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@Giuseppe5
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
In previous releases (0.10 and before), quant layers would expose certain quantization metadata of underlying proxies

quant_conv = qnn.QuantConv2d(..., weight_quant=Int8WeightPerTensorFloat)
scale = quant_conv.quant_weight_scale()

This has been removed in 0.11 because of the need to implement new QuantTensors with varying quant metadata fields.
All the info is still available but they are only exposed at proxy level.

Describe the solution you'd like
Given a set of quant metadata exposed by the proxy, the layers should be able to automagically expose the methods associated with the various proxies

@Giuseppe5 Giuseppe5 added enhancement New feature or request good first issue Good for newcomers labels Oct 14, 2024
@rk119
Copy link

rk119 commented Oct 14, 2024

Hi,

I would like to work on this issue if it is possible :)

@Giuseppe5
Copy link
Collaborator Author

All help is welcomed!

I would recommend to check the old releases to see what the interface for quant metadata used to look like (but not the implementation).
E.g.:

new_scale = q_linear.quant_weight_scale()

The idea is that if you instantiate a MX Float weight quantizer, you should be able to do:

q_linear.quant_weight_exponent_bit_width()

Even though QuantLayer won't have any hardcoded quant_weight_exponent_bit_width.

Sorry for repeating myself, and please feel free to ask more questions if needed

@rk119
Copy link

rk119 commented Oct 14, 2024

Alright, I'll start by getting familiar with the codebase and the past releases, and then I'll dive into it.

Thank you for your guidance and providing a head start! I'll reach out with any questions if needed.

@rk119
Copy link

rk119 commented Oct 17, 2024

Hi @Giuseppe5,

I apologize for the delay in completing this issue. As I am new to Brevitas and still learning how to contribute to open-source projects, I am taking some time to thoroughly understand the repo.

I have been exploring the differences between the previous and current versions of Brevitas. In the past version, specifically in parameter.py, the quant_weight method returned a QuantTensor (reconstructed weights after quantization is applied) along with metadata, and there were method definitions to access this metadata directly. In the current version, in parameter_quant.py, the metadata is accessed directly from the Proxy (as you mentioned).

I wanted to confirm that I am on the right track with this understanding. Please correct me if I am mistaken in any way.

Since, there are new QuantTensor classes with different arguments compared to the earlier version, I am still exploring how to implement the solution. I plan to contact you soon with proper questions and a solution before submitting a pull request. I would greatly appreciate your guidance.

@Giuseppe5
Copy link
Collaborator Author

Hello!
First of all, absolutely no rush from our side. We are grateful for your willing to contribute, and we appreciate that we would need to expand our developer documentation to allow new people to familiarize faster with the codebase.

You are on the correct track. The idea in my mind is that the proxy exposes a methods that tells the layer which quant metadata is available, and then the layer generates at runtime the methods necessary to directly access that metadata.

The main issue could be caused by the bias, since bias quantization might require an external parameter for quantization (external scale).
Let's start with input/weights which is also the most common use case and then see if/how it can be extended to bias.

@rk119
Copy link

rk119 commented Oct 20, 2024

@Giuseppe5

Hello! First of all, absolutely no rush from our side. We are grateful for your willing to contribute, and we appreciate that we would need to expand our developer documentation to allow new people to familiarize faster with the codebase.

I'd love to help out in the future with this if time permits, once I am well versed with the codebase.

You are on the correct track. The idea in my mind is that the proxy exposes a methods that tells the layer which quant metadata is available, and then the layer generates at runtime the methods necessary to directly access that metadata.

Ah yes, metaprogramming. I will look into some Python docs to familiarize myself with it and best approaches.

The main issue could be caused by the bias, since bias quantization might require an external parameter for quantization (external scale). Let's start with input/weights which is also the most common use case and then see if/how it can be extended to bias.

Alright 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants