[BUG] Precision issue with python cutlass gemm #2014

MinghaoYan · 2024-12-26T19:03:16Z

Describe the bug
Gemm called with the Python Cutlass wrapper returns different results from PyTorch. Manual computation shows PyTorch to be more precise.

The gap can be as big as 0.001, which is too big for a simple Gemm.

Steps/Code to reproduce bug

import cutlass

import torch
from torch.autograd import Function

def test_gemm():
    A = torch.randn(2, 4, requires_grad=True, device="cuda")
    B = torch.randn(4, 2, requires_grad=True, device="cuda")
    C = torch.zeros(2, 2, requires_grad=True, device="cuda")
    D = torch.zeros(2, 2, requires_grad=True, device="cuda")

    a_ref = A.detach().clone().requires_grad_(True)
    b_ref = B.detach().clone().requires_grad_(True)

    print(A, B, C)
    D_ref = a_ref @ b_ref
    print(D_ref[0].dtype)

    plan = cutlass.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor, element_accumulator=torch.float32)
    plan.run(A, B, C, D, print_module=False)

    print(D_ref, D)

Expected behavior
These two matrix multiplication results should be similar but the gap is quite large, below is sample output

A: tensor([[-0.4193, 1.0308, -1.5871, 3.1340],
[ 0.6812, 2.0357, -2.0991, -0.1116]], device='cuda:0',
requires_grad=True)
B: tensor([[-0.0762, 0.5771],
[ 0.7318, 0.0826],
[ 0.7243, 2.0359],
[-1.6843, 0.6128]], device='cuda:0', requires_grad=True)
C: tensor([[0., 0.],
[0., 0.]], device='cuda:0', requires_grad=True)

dtype: torch.float32

D_ref (pytorch output): tensor([[-5.6419, -1.4676],
[ 0.1053, -3.7806]], device='cuda:0', grad_fn=)

D (Nvidia-cutlass output): tensor([[-5.6431, -1.4654],
[ 0.1054, -3.7801]], device='cuda:0', requires_grad=True)

For instance, manual computation gives the first row and first column -5.64184263, which is much closer to the PyTorch output than the Nvidia Cutlass output.

Environment details (please complete the following information):
EC2 P4D instance with A100 GPU
Pytorch version: 2.5.1+cu121
nvidia-cutlass version: 3.5.1.0

The text was updated successfully, but these errors were encountered:

MinghaoYan · 2024-12-26T19:51:29Z

Apparently, this problem only occurs in float32 but not in float16

jackkosaian · 2024-12-30T15:33:57Z

Can please you check if the related answer here helps?

MinghaoYan · 2024-12-30T15:52:51Z

This makes sense, thank you!

MinghaoYan added ? - Needs Triage bug Something isn't working labels Dec 26, 2024

MinghaoYan changed the title ~~[BUG] Presion issue with python cutlass gemm~~ [BUG] Precision issue with python cutlass gemm Dec 26, 2024

MinghaoYan closed this as completed Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Precision issue with python cutlass gemm #2014

[BUG] Precision issue with python cutlass gemm #2014

MinghaoYan commented Dec 26, 2024 •

edited

Loading

MinghaoYan commented Dec 26, 2024

jackkosaian commented Dec 30, 2024

MinghaoYan commented Dec 30, 2024

[BUG] Precision issue with python cutlass gemm #2014

[BUG] Precision issue with python cutlass gemm #2014

Comments

MinghaoYan commented Dec 26, 2024 • edited Loading

MinghaoYan commented Dec 26, 2024

jackkosaian commented Dec 30, 2024

MinghaoYan commented Dec 30, 2024

MinghaoYan commented Dec 26, 2024 •

edited

Loading