You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CuTe uses column-major to encode the TV layout of mma’s multiplicand, the doc explains that Since CuTe layouts return indices rather than coordinates, we choose a column-major encoding of the (m,n) coordinates:.
But it seems it leads to wrong results for a row-major matrix.
The TV Layout in the MMAs are not "row-major" or "col-major", they describe the partitioning patterns of each instruction and can be applied to any Tensor with any Layout.
If you would like to use row-major data, then you should use row-major data:
Tensor gA = make_tensor(make_gmem_ptr(A), make_shape(inst_m, inst_k), make_stride(inst_k, Int<1>{}));
// or
Tensor gA = make_tensor(make_gmem_ptr(A), make_shape(inst_m, inst_k), GenRowMajor{});
The TV Layout in the MMAs are not "row-major" or "col-major", they describe the partitioning patterns of each instruction and can be applied to any Tensor with any Layout.
If you would like to use row-major data, then you should use row-major data:
Tensor gA = make_tensor(make_gmem_ptr(A), make_shape(inst_m, inst_k), make_stride(inst_k, Int<1>{}));
// or
Tensor gA = make_tensor(make_gmem_ptr(A), make_shape(inst_m, inst_k), GenRowMajor{});
CuTe uses column-major to encode the TV layout of mma’s multiplicand, the doc explains that
Since CuTe layouts return indices rather than coordinates, we choose a column-major encoding of the (m,n) coordinates:
.But it seems it leads to wrong results for a row-major matrix.
e.g. for multiplicand A of this mma inst:
cutlass/include/cute/atom/mma_traits_sm80.hpp
Lines 66 to 79 in 4a1709e
Then I use the following code to read values from global memory directly.
Since I fill the matrix using naturally increasing elements and A is a row-major matrix, I think the thread0 should read
0
,1
,64
,65
(as PTX ref:https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#matrix-fragments-for-mma-m16n8k8)But the output is:
Shall we use row-major tv layout to encode the row-major multiplicand? Or is there something wrong in my code?
The text was updated successfully, but these errors were encountered: