release: 0.2.3
What's Changed
- Use new int8 torch kernels by @dacorvo in #222
- Rebuild extension when pytorch is updated by @dacorvo in #223
- Use tinygemm bfloat16 / int4 kernel whenever possible by @dacorvo in #234
- Add HQQ optimizer by @dacorvo in #235
- Add QuantizedModelForCausalLM by @dacorvo in #243
- Integrate quanto commands to optimum-cli by @dacorvo in #244
- Add pixart-sigma test to image example by @dacorvo in #247
- Support diffusion models. by @sayakpaul in #255
Bug fixes
- Fix: align extension on max arch by @dacorvo in #227
- Fix TinyGemmQBitsTensor move by @dacorvo in #246
- Fix stream-lining bug by @dacorvo in #249
- Fix float/int8 matrix multiplication latency regression by @dacorvo in #250
- Fix serialization issues by @dacorvo in #258
New Contributors
- @sayakpaul made their first contribution in #255
Full Changelog: v0.2.2...v0.2.3