Skip to content

v0.1.0

Compare
Choose a tag to compare
@mobicham mobicham released this 19 Sep 11:24
· 184 commits to master since this release
4690460

Triton Kernels

  • A16W8 (GEMV + GEMM) - with grouping
  • A16W4 (GEMV + GEMM) - with grouping
  • A16W2 (GEMV + GEMM) - with grouping
  • A16W1 (GEMV + GEMM) - with grouping

CUDA Kernels

  • A16W8 (GEMV - batch-size=1) - no grouping
  • A16W4 (GEMV - batch-size=1) - no grouping
  • A16W2 (GEMV - batch-size=1) - no grouping
  • A8W8 (GEMV - batch-size=1) - no grouping
  • A8W4 (GEMV - batch-size=1) - no grouping
  • A8W2 (GEMV - batch-size=1) - no grouping