Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Weird precision problem in specific tiling description #1186

Closed
vincentccc opened this issue Nov 14, 2023 · 4 comments
Closed

[BUG] Weird precision problem in specific tiling description #1186

vincentccc opened this issue Nov 14, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@vincentccc
Copy link
Contributor

vincentccc commented Nov 14, 2023

Describe the bug
A clear and concise description of what the bug is.
I got a stochastic result in conv2d kernel with the tiling
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<128, 96, 64>,
cutlass::gemm::GemmShape<32, 96, 64 >,
cutlass::gemm::GemmShape<16, 8, 32>,
and stage3, the workload shows below:

Conv2d Workload input: s8 (1, 256,256,128) weight: s8(128,3,3,128) padding: (1,1,1,1) strides(1,1)

some fact may help:

  1. I ran the kernel several times, some of the results are not same, and the mismatch items index are random, but the channel dim of the index always be 0-63
  2. I found when I set the workload to be small, the error will not shows up, like (1, 128, 128, 64) (64,3,3,64)
  3. other schedule works well, eg: 128x64x64 64x64x64

It is very hard for me to debug what is going on cause I cannot reproduce the error stably.
Is there any hints or method about how to fix/debug this bug or the approximate bug code location.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]

Additional context
Add any other context about the problem here.

@vincentccc vincentccc added ? - Needs Triage bug Something isn't working labels Nov 14, 2023
@hwu36
Copy link
Collaborator

hwu36 commented Nov 14, 2023

sorry. i think it is just that non power of two tile size is not valid in cutlass 2.x conv.

@vincentccc
Copy link
Contributor Author

sorry. i think it is just that non power of two tile size is not valid in cutlass 2.x conv.

Thanks for your reply, is that means this tiling will work on cutlass 3.x?
BTW, I am still curious about why it not works, is there any hard code or strong prior?

@thakkarV
Copy link
Collaborator

We do support non Po2 tile shapes in 3.X api yes

@hwu36
Copy link
Collaborator

hwu36 commented Nov 15, 2023

BTW, I am still curious about why it not works, is there any hard code or strong prior?

in 2.x, shared memory layout is not non power of 2 friendly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants