[QST] What is the general process of developing and tuning GEMM kernels with CUTLASS? #2050

tiandi111 · 2025-01-21T10:22:07Z

Hi CUTLASS Community,

I'm a newbie to CUTLASS. I'm trying to use CUTLASS to develop multiple GEMM kernel templates and then tune them on different shapes to select the best one.

However, I'm confused with the APIs CUTLASS provided:

Which version of API should I use? I noticed that there're 2.x and 3.x API, is there any standard to choose the version?
Is the API dependent on Hardware architectures? I only saw Hopper examples with 3.x API, is it only applicable on Hopper?
The APIs from different layers, e.g. device and kernel, shares many similarities in terms of warp shape, instruction shape and etc. What's the difference between them?

Look forward to any reply, Thank you very much!

tiandi111 added ? - Needs Triage question Question labels Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] What is the general process of developing and tuning GEMM kernels with CUTLASS? #2050

[QST] What is the general process of developing and tuning GEMM kernels with CUTLASS? #2050

tiandi111 commented Jan 21, 2025 •

edited

Loading

[QST] What is the general process of developing and tuning GEMM kernels with CUTLASS? #2050

[QST] What is the general process of developing and tuning GEMM kernels with CUTLASS? #2050

Comments

tiandi111 commented Jan 21, 2025 • edited Loading

tiandi111 commented Jan 21, 2025 •

edited

Loading