The convolution implemented in CUTLASS on SM90 is through the im2col, Instead of implicit precom gemm ? #1423

luliyucoordinate · 2024-03-25T10:15:08Z

luliyucoordinate
Mar 25, 2024

Currently, the convolution implemented in CUTLASS on SM90 is through the im2col method, while for SM architectures less than SM90, the implicit precompute GEMM approach is used. Why? And, how can I implement the implicit precompute GEMM CONV using cute ?

hwu36 · 2024-04-18T17:46:08Z

hwu36
Apr 18, 2024
Maintainer

sm90 still use implicit gemm, but it uses tma to do complicated address computation and boundary check.

https://github.com/NVIDIA/cutlass/tree/main/examples/59_ampere_gather_scatter_conv shows how to use cute to do conv.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The convolution implemented in CUTLASS on SM90 is through the im2col, Instead of implicit precom gemm ? #1423

{{title}}

Replies: 1 comment

{{title}}

Select a reply

The convolution implemented in CUTLASS on SM90 is through the im2col, Instead of implicit precom gemm ? #1423

luliyucoordinate Mar 25, 2024

Replies: 1 comment

hwu36 Apr 18, 2024 Maintainer

luliyucoordinate
Mar 25, 2024

hwu36
Apr 18, 2024
Maintainer