Is it possible to quantize to FP8 W8A16 without calibration data #858

us58 · 2024-10-21T07:55:00Z

I want to quantize a model to FP8 W8A16 (since I am on Ampere). In the quantization_w8a8_fp8 example, it says no calibration is needed for FP8 W8A8. Is this also possible for FP8 W8A16? I did not find any information on this.

Also, if possible, can you give me an example on how to do this (like the FP8 W8A8 example)? Thanks in advance.

okwinds · 2024-10-22T01:16:45Z

A single sample should work

dsikka · 2025-01-31T15:16:19Z

Hi @us58:

If you specify an FP8 recipe targeting weights only for quantization, you should not need any calibration data.

Example:

quant_stage:
    quant_modifiers:
        QuantizationModifier:
            ignore: ["lm_head"]
            config_groups:
                group_0:
                    weights:
                        num_bits: 8
                        type: float
                        strategy: tensor
                        dynamic: false
                        symmetric: true

us58 added the enhancement New feature or request label Oct 21, 2024

dsikka closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to quantize to FP8 W8A16 without calibration data #858

Is it possible to quantize to FP8 W8A16 without calibration data #858

us58 commented Oct 21, 2024

okwinds commented Oct 22, 2024

dsikka commented Jan 31, 2025

Is it possible to quantize to FP8 W8A16 without calibration data #858

Is it possible to quantize to FP8 W8A16 without calibration data #858

Comments

us58 commented Oct 21, 2024

okwinds commented Oct 22, 2024

dsikka commented Jan 31, 2025