Confused about cutlass layout #666

LeiWang1999 · 2022-10-18T10:07:21Z

LeiWang1999
Oct 18, 2022

Hello there, I'm confused about cutlass matrix layout.

In my understanding, in cutlass:

the letter N represents non-transpose
the letter T represents transpose

and if a matrix is non-tranposed, the matrix will be stored as row-major, otherwise column-major. The kernel I list below may show you I'm right

// Gemm operator cutlass_simt_igemm_s8_128x128_32x2_nn_align1
using cutlass_simt_igemm_s8_128x128_32x2_nn_align1_base = 
  typename cutlass::gemm::kernel::DefaultGemmUniversal<
    int8_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1,    // transposed B operand
    int8_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1,    // transposed A operand
    int32_t, cutlass::layout::RowMajor,
    int32_t,
    cutlass::arch::OpClassSimt,
    cutlass::arch::Sm61,
    cutlass::gemm::GemmShape<128, 128, 32>,
    cutlass::gemm::GemmShape<32, 64, 32>,
    cutlass::gemm::GemmShape<1, 1, 4>,
    
    cutlass::epilogue::thread::LinearCombination<
      int32_t,
      1,
      int32_t,
      int32_t
    >
,
    cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<8>,
    2,
    cutlass::arch::OpMultiplyAdd
>::GemmKernel;

For a NN layout kernel, the input A and input B is stored as RowMajor layout, but when I used cutlass_profiler to profile the kernel, the log show the s8:column and s32:column, which may represent column major as described in docs, , which is contrary to my previous understanding and confused me alot.

Could you kindly point out where the problem is? Thanks.

Answered by jackkosaian

Oct 18, 2022

Hi, @LeiWang1999 .

You are correct that T represents "transpose" and N represents "no transpose."

However, N is used for column-major layouts and T for row-major layouts.

Is the kernel you list above one that is generated by CUTLASS?

The reason that I ask is that kernels generated by CUTLASS via CMake leverage CUTLASS's GemmUniversalAdapter, which expects a transposed problem (e.g., A becomes B, row major becomes column major). Thus, when emitting these GEMM declarations, CUTLASS's generation framework transposes the problem and operand layouts. This can be seen here and here. So the CUTLASS profiler is reporting the correct layout for the NN kernel in question.

If you find reasoning abou…

View full answer

jackkosaian · 2022-10-18T13:39:14Z

jackkosaian
Oct 18, 2022

Hi, @LeiWang1999 .

You are correct that T represents "transpose" and N represents "no transpose."

However, N is used for column-major layouts and T for row-major layouts.

Is the kernel you list above one that is generated by CUTLASS?

The reason that I ask is that kernels generated by CUTLASS via CMake leverage CUTLASS's GemmUniversalAdapter, which expects a transposed problem (e.g., A becomes B, row major becomes column major). Thus, when emitting these GEMM declarations, CUTLASS's generation framework transposes the problem and operand layouts. This can be seen here and here. So the CUTLASS profiler is reporting the correct layout for the NN kernel in question.

If you find reasoning about the layouts using the GemmUniversalAdapter a bit tricky, you may find it useful to look through examples from the CUTLASS unit tests. Many of the CUTLASS unit tests do not use the GemmUniversalAdapter interface, and thus declare layouts as one would expect (e.g., this NN kernel uses column-major A and B).

I hope this helps!

4 replies

LeiWang1999 Oct 18, 2022
Author

Thanks, this helps me alot.

LeiWang1999 Oct 20, 2022
Author

@jackkosaian Hi, there is one more thing I wanna confirm, take a cutlass kernel cutlass_tensorop_h1688gemm_64x256_32x2_tn_align4 as an example, is that mean matrix B is transposed, and A is no-transposed?

jackkosaian Oct 20, 2022

Hi, @LeiWang1999.

In CUTLASS kernels with names of the form cutlass_tensorop_h1688gemm_64x256_32x2_XY_align8, X refers to the layout of operand A and Y refers to the layout of operand B. Thus, for the example you've provided, the layout of operand A is t (transposed; row major), and that of operand B is n (non-transpose; column major).

The XY portion in the kernel names I described above are emitted here. You can see that the layout_name method populates X from operand A and Y from operand B. The codes for each value that can appear in X and Y can be found here.

I hope this helps!

LeiWang1999 Oct 20, 2022
Author

thanks, I have another few question here, could you please help me to understand the design?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about cutlass layout #666

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Confused about cutlass layout #666

LeiWang1999 Oct 18, 2022

Replies: 1 comment · 4 replies

jackkosaian Oct 18, 2022

LeiWang1999 Oct 18, 2022 Author

LeiWang1999 Oct 20, 2022 Author

jackkosaian Oct 20, 2022

LeiWang1999 Oct 20, 2022 Author

LeiWang1999
Oct 18, 2022

Replies: 1 comment 4 replies

jackkosaian
Oct 18, 2022

LeiWang1999 Oct 18, 2022
Author

LeiWang1999 Oct 20, 2022
Author

LeiWang1999 Oct 20, 2022
Author