Releases: huggingface/optimum-quanto
Releases · huggingface/optimum-quanto
release: 0.0.11
release: 0.0.10
New features:
- calibration streamline option to remove spurious quantize/dequantize,
- calibration debug mode.
release: 0.0.9
New features:
- quantize weights and activations parameters
- float8 activations
release: 0.0.8
New features:
- weight-only quantization,
- integer matmul acceleration on CUDA.
Bug fixes:
- actually use float16 weights,
- avoid float16 overflows,
- correct device placement,
- robust serialization.
release: 0.0.7
New features:
- per-axis quantization
release: 0.0.6
New features:
- support
opt
models, - support
gpt-neox
models, - support
codegen
models.
release: 0.0.5
New features:
- support MPS devices,
- support Transformer models
release: 0.0.4
Fix release to add correct package metadata.
release: 0.0.1
Initial import of the sources.