Add support for dbrx moe #45

charlifu · 2024-09-26T16:04:11Z

Dbrx-instruct model does not use nn.Linear to implement the Moe layer, which leads to AutoFP8 not able to quantize the moe weight. This PR adds quantization support for dbrx model's moe layer.

Add a FP8DbrxExpertGLU module def to replace original DbrxExpertGLU model to quantize the weight and activation (if needed).
Add a example to quantize dbrx model.
PPL test using wikitext2: 3.91(fp16) vs 3.99 (fp8)

charlifu added 4 commits September 26, 2024 15:38

add support for dbrx moe

e115fa6

add support dynamic quant

e16aa5c

format

8e34938

naming

19c2734

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dbrx moe #45

Add support for dbrx moe #45

charlifu commented Sep 26, 2024 •

edited

Loading

Add support for dbrx moe #45

Are you sure you want to change the base?

Add support for dbrx moe #45

Conversation

charlifu commented Sep 26, 2024 • edited Loading

charlifu commented Sep 26, 2024 •

edited

Loading