Skip to content

release: 0.0.8

Compare
Choose a tag to compare
@dacorvo dacorvo released this 08 Dec 15:31
· 525 commits to main since this release

New features:

  • weight-only quantization,
  • integer matmul acceleration on CUDA.

Bug fixes:

  • actually use float16 weights,
  • avoid float16 overflows,
  • correct device placement,
  • robust serialization.