Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to quantize to FP8 W8A16 without calibration data #858

Closed
us58 opened this issue Oct 21, 2024 · 2 comments
Closed

Is it possible to quantize to FP8 W8A16 without calibration data #858

us58 opened this issue Oct 21, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@us58
Copy link

us58 commented Oct 21, 2024

I want to quantize a model to FP8 W8A16 (since I am on Ampere). In the quantization_w8a8_fp8 example, it says no calibration is needed for FP8 W8A8. Is this also possible for FP8 W8A16? I did not find any information on this.

Also, if possible, can you give me an example on how to do this (like the FP8 W8A8 example)? Thanks in advance.

@us58 us58 added the enhancement New feature or request label Oct 21, 2024
@okwinds
Copy link

okwinds commented Oct 22, 2024

A single sample should work

@dsikka
Copy link
Collaborator

dsikka commented Jan 31, 2025

Hi @us58:

If you specify an FP8 recipe targeting weights only for quantization, you should not need any calibration data.

Example:

quant_stage:
    quant_modifiers:
        QuantizationModifier:
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 8
                        type: float
                        strategy: tensor
                        dynamic: false
                        symmetric: true

@dsikka dsikka closed this as completed Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants