You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We have grown to support quite a few PTQ techniques within our LLM entrypoint, with even more possible combinations of them.
Although some minor benchmarking has been performed, it would be good to do systematic runs and understand what works with what, what to avoid, etc.
Describe the solution you'd like
An extensive search is not feasible, a few suggestions:
Weight Only 4b/8b, W8A8, W4A8, W4A4
MXFp8/6/4 for Weights/Activations
Combination of HQO for zero point + MSE for scale (might require to write custom quantizers)
GPxQ (with/without HQO, also with/without MSE), weight only/weight + activations
Is your feature request related to a problem? Please describe.
We have grown to support quite a few PTQ techniques within our LLM entrypoint, with even more possible combinations of them.
Although some minor benchmarking has been performed, it would be good to do systematic runs and understand what works with what, what to avoid, etc.
Describe the solution you'd like
An extensive search is not feasible, a few suggestions:
Few suggestions on the model side:
Additional context
Reach out for further clarifications.
The text was updated successfully, but these errors were encountered: