Benchmark LLM #1054

Giuseppe5 · 2024-10-14T09:29:49Z

Is your feature request related to a problem? Please describe.
We have grown to support quite a few PTQ techniques within our LLM entrypoint, with even more possible combinations of them.
Although some minor benchmarking has been performed, it would be good to do systematic runs and understand what works with what, what to avoid, etc.

Describe the solution you'd like
An extensive search is not feasible, a few suggestions:

Weight Only 4b/8b, W8A8, W4A8, W4A4
MXFp8/6/4 for Weights/Activations
Combination of HQO for zero point + MSE for scale (might require to write custom quantizers)
GPxQ (with/without HQO, also with/without MSE), weight only/weight + activations
GPxQ (as above) with/without activation quantization

Few suggestions on the model side:

Llama 3.1/3.2
Mistral
Phi3
MoE (currently untested)
...

Additional context
Reach out for further clarifications.

Giuseppe5 added enhancement New feature or request good first issue Good for newcomers labels Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark LLM #1054

Benchmark LLM #1054

Giuseppe5 commented Oct 14, 2024 •

edited

Loading

Benchmark LLM #1054

Benchmark LLM #1054

Comments

Giuseppe5 commented Oct 14, 2024 • edited Loading

Giuseppe5 commented Oct 14, 2024 •

edited

Loading