Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cutlass 3.3.0 GEMM performance regression "wgmma.mma_async instructions are serialized due to the presence of Extern calls" #1255

Closed
kadeng opened this issue Dec 7, 2023 · 5 comments
Labels
? - Needs Triage bug Something isn't working

Comments

@kadeng
Copy link

kadeng commented Dec 7, 2023

When compiling and running the exact same GEMM code example on Cutlass 3.2.2 and Cutlass 3.3.0,
Cutlass 3.3.0 exhibits an extremely degraded performance ( it takes 80 ms to run, what took 12 ms before )

During compilation, the following warning appears:

ptxas info    : (C7509) Potential Performance Loss: wgmma.mma_async instructions are serialized due to the presence of Extern calls in the function '_ZN7cutlass13device_kernelI119cutlass3x_sm90_tensorop_s64x32x16gemm_f16_f16_f32_void_f16_64x32x64_1x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tmaEEvNT_6ParamsE'

Problem shape:
m=10240, n=10240, k=2048, batch_size=10

Timings (obtained via nsys profile inspection):

  • Cutlass 3.3.0: ~ 80ms
  • Cutlass 3.2.2: ~ 12ms

Environment:

  • CUDA 12.1
  • GPU: H100 / SM90
  • Linux x64

To reproduce, see code and build / run instructions here
https://gist.github.com/kadeng/6df8a529dcc2d50c96cbb50fe97c96c0

@kadeng kadeng added ? - Needs Triage bug Something isn't working labels Dec 7, 2023
@hwu36
Copy link
Collaborator

hwu36 commented Dec 7, 2023

what about using O3?

@kadeng
Copy link
Author

kadeng commented Dec 7, 2023

what about using O3?

I get the same warning during compilation, so I assume it's the same problem..

@hwu36
Copy link
Collaborator

hwu36 commented Dec 7, 2023

@IonThruster

@IonThruster
Copy link
Collaborator

Could you try adding -DNDEBUG to your build flags and see if the issue persists. We always recommend using that when building CUTLASS kernels (unless you explicitly want to be in debug mode).

@kadeng
Copy link
Author

kadeng commented Dec 7, 2023

Could you try adding -DNDEBUG to your build flags and see if the issue persists. We always recommend using that when building CUTLASS kernels (unless you explicitly want to be in debug mode).

That solves it, the compilation warning is gone, thanks @IonThruster

@kadeng kadeng closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants