Confused about cutlass layout #666
-
Hello there, I'm confused about cutlass matrix layout. In my understanding, in cutlass:
and if a matrix is non-tranposed, the matrix will be stored as row-major, otherwise column-major. The kernel I list below may show you I'm right // Gemm operator cutlass_simt_igemm_s8_128x128_32x2_nn_align1
using cutlass_simt_igemm_s8_128x128_32x2_nn_align1_base =
typename cutlass::gemm::kernel::DefaultGemmUniversal<
int8_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1, // transposed B operand
int8_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1, // transposed A operand
int32_t, cutlass::layout::RowMajor,
int32_t,
cutlass::arch::OpClassSimt,
cutlass::arch::Sm61,
cutlass::gemm::GemmShape<128, 128, 32>,
cutlass::gemm::GemmShape<32, 64, 32>,
cutlass::gemm::GemmShape<1, 1, 4>,
cutlass::epilogue::thread::LinearCombination<
int32_t,
1,
int32_t,
int32_t
>
,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<8>,
2,
cutlass::arch::OpMultiplyAdd
>::GemmKernel;
For a Could you kindly point out where the problem is? Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi, @LeiWang1999 . You are correct that T represents "transpose" and N represents "no transpose." However, N is used for column-major layouts and T for row-major layouts. Is the kernel you list above one that is generated by CUTLASS? The reason that I ask is that kernels generated by CUTLASS via CMake leverage CUTLASS's GemmUniversalAdapter, which expects a transposed problem (e.g., A becomes B, row major becomes column major). Thus, when emitting these GEMM declarations, CUTLASS's generation framework transposes the problem and operand layouts. This can be seen here and here. So the CUTLASS profiler is reporting the correct layout for the NN kernel in question. If you find reasoning about the layouts using the GemmUniversalAdapter a bit tricky, you may find it useful to look through examples from the CUTLASS unit tests. Many of the CUTLASS unit tests do not use the GemmUniversalAdapter interface, and thus declare layouts as one would expect (e.g., this NN kernel uses column-major A and B). I hope this helps! |
Beta Was this translation helpful? Give feedback.
Hi, @LeiWang1999 .
You are correct that T represents "transpose" and N represents "no transpose."
However, N is used for column-major layouts and T for row-major layouts.
Is the kernel you list above one that is generated by CUTLASS?
The reason that I ask is that kernels generated by CUTLASS via CMake leverage CUTLASS's GemmUniversalAdapter, which expects a transposed problem (e.g., A becomes B, row major becomes column major). Thus, when emitting these GEMM declarations, CUTLASS's generation framework transposes the problem and operand layouts. This can be seen here and here. So the CUTLASS profiler is reporting the correct layout for the NN kernel in question.
If you find reasoning abou…