The convolution implemented in CUTLASS on SM90 is through the im2col, Instead of implicit precom gemm ? #1423
Replies: 1 comment
-
sm90 still use implicit gemm, but it uses tma to do complicated address computation and boundary check. https://github.com/NVIDIA/cutlass/tree/main/examples/59_ampere_gather_scatter_conv shows how to use cute to do conv. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently, the convolution implemented in CUTLASS on SM90 is through the im2col method, while for SM architectures less than SM90, the implicit precompute GEMM approach is used. Why? And, how can I implement the implicit precompute GEMM CONV using cute ?
Beta Was this translation helpful? Give feedback.
All reactions