You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When compiling and running the exact same GEMM code example on Cutlass 3.2.2 and Cutlass 3.3.0,
Cutlass 3.3.0 exhibits an extremely degraded performance ( it takes 80 ms to run, what took 12 ms before )
During compilation, the following warning appears:
ptxas info : (C7509) Potential Performance Loss: wgmma.mma_async instructions are serialized due to the presence of Extern calls in the function '_ZN7cutlass13device_kernelI119cutlass3x_sm90_tensorop_s64x32x16gemm_f16_f16_f32_void_f16_64x32x64_1x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tmaEEvNT_6ParamsE'
Problem shape:
m=10240, n=10240, k=2048, batch_size=10
Could you try adding -DNDEBUG to your build flags and see if the issue persists. We always recommend using that when building CUTLASS kernels (unless you explicitly want to be in debug mode).
Could you try adding -DNDEBUG to your build flags and see if the issue persists. We always recommend using that when building CUTLASS kernels (unless you explicitly want to be in debug mode).
That solves it, the compilation warning is gone, thanks @IonThruster
When compiling and running the exact same GEMM code example on Cutlass 3.2.2 and Cutlass 3.3.0,
Cutlass 3.3.0 exhibits an extremely degraded performance ( it takes 80 ms to run, what took 12 ms before )
During compilation, the following warning appears:
Problem shape:
m=10240, n=10240, k=2048, batch_size=10
Timings (obtained via nsys profile inspection):
Environment:
To reproduce, see code and build / run instructions here
https://gist.github.com/kadeng/6df8a529dcc2d50c96cbb50fe97c96c0
The text was updated successfully, but these errors were encountered: