v0.1.0
Triton Kernels
- A16W8 (GEMV + GEMM) - with grouping
- A16W4 (GEMV + GEMM) - with grouping
- A16W2 (GEMV + GEMM) - with grouping
- A16W1 (GEMV + GEMM) - with grouping
CUDA Kernels
- A16W8 (GEMV - batch-size=1) - no grouping
- A16W4 (GEMV - batch-size=1) - no grouping
- A16W2 (GEMV - batch-size=1) - no grouping
- A8W8 (GEMV - batch-size=1) - no grouping
- A8W4 (GEMV - batch-size=1) - no grouping
- A8W2 (GEMV - batch-size=1) - no grouping