release: 0.2.3

dacorvo released this 25 Jul 15:18

What's Changed

Use new int8 torch kernels by @dacorvo in #222
Rebuild extension when pytorch is updated by @dacorvo in #223
Use tinygemm bfloat16 / int4 kernel whenever possible by @dacorvo in #234
Add HQQ optimizer by @dacorvo in #235
Add QuantizedModelForCausalLM by @dacorvo in #243
Integrate quanto commands to optimum-cli by @dacorvo in #244
Add pixart-sigma test to image example by @dacorvo in #247
Support diffusion models. by @sayakpaul in #255

Bug fixes

Fix: align extension on max arch by @dacorvo in #227
Fix TinyGemmQBitsTensor move by @dacorvo in #246
Fix stream-lining bug by @dacorvo in #249
Fix float/int8 matrix multiplication latency regression by @dacorvo in #250
Fix serialization issues by @dacorvo in #258

New Contributors

@sayakpaul made their first contribution in #255

Full Changelog: v0.2.2...v0.2.3

Contributors

dacorvo and sayakpaul

Assets 2