Skip to content

v2.4.10 SGEMM TF32 Stage 2/3

Compare
Choose a tag to compare
@DefTruth DefTruth released this 15 Oct 02:04
· 87 commits to main since this release
2906e78

What's Changed

  • [HGEMM] HGEMM WMMA Stage mma4x2+warp4x4 by @DefTruth in #76
  • [SGEMM] Add SGEMM WMMA TF32 Stage2/3 by @DefTruth in #77
  • [SGEMM] Add cuBLAS SGEMM F32/TF32 baseline by @DefTruth in #78
  • [SGEMM] Add Kernel cudaFuncSetAttribute hint by @DefTruth in #79
  • [RoPE] Add minimal RoPE f32/f32x4 pack impl by @bear-zd in #80

Full Changelog: v2.4.9...v2.4.10