Skip to content

v0.3.0

Compare
Choose a tag to compare
@mobicham mobicham released this 28 Oct 16:05
· 111 commits to master since this release
  • New GEMV RevSplitK algorithm outperforms GEMM Split-K and GEMV for batch-size=1
  • Add support for channel-wise scaling (weights, activations, weights + activations)
  • Add support for FP8 x FP8 / FP8 x Wn
  • Add support for INT8 x Wn
  • Improved autotune speed
  • Improved base configs for 4090 RTX, A100 and H100
  • Better control for autotune via set_autotune