You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems this does not implement L2 cache optimization (swizzle), and shared memory bank conflict, double buffer, optimization? Is it possible to use them? Thank you!!!!
The text was updated successfully, but these errors were encountered:
the goal of this example was to be a four part series of tutorials. they will be released over time in the future with the part four approaching about the speed of light. That said, all of the optimizations you mention can certainly be implemented in this example as well.
https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/sgemm_nt_1.cu
It seems this does not implement L2 cache optimization (swizzle), and shared memory bank conflict, double buffer, optimization? Is it possible to use them? Thank you!!!!
The text was updated successfully, but these errors were encountered: