Skip to content

Commit

Permalink
[SGEMM] Update SGEMM TF32 Benchmark (#87)
Browse files Browse the repository at this point in the history
* Update README.md

* Update hgemm_wmma_stage.cu

* Update README.md

* Update README.md

* Update sgemm.py
  • Loading branch information
DefTruth authored Oct 17, 2024
1 parent c4db4f8 commit 8c6922b
Show file tree
Hide file tree
Showing 3 changed files with 491 additions and 339 deletions.
7 changes: 3 additions & 4 deletions hgemm/hgemm_wmma_stage.cu
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ using namespace nvcuda;
// Support A and B matrix with row-major inorder to compare with the kernels using CUDA Cores in
// hgemm.cu and hgemm_async.cu.


HOST_DEVICE_INLINE
int div_ceil(int a, int b) { return (a % b != 0) ? (a / b + 1) : (a / b); }

Expand All @@ -41,7 +40,7 @@ int div_ceil(int a, int b) { return (a % b != 0) ? (a / b + 1) : (a / b); }
// 共享内存,调用kernel时 需要指定动态共享内存大小,且smem的寻址
// 方式需要按照一维数组来使用 2. 提高L2 Cache的局部性(Thread
// Block Swizzle): https://zhuanlan.zhihu.com/p/555339335
// 3. nedd __launch_bounds__ to avoid error 'too many resources required for launch'
// 3. __launch_bounds__: avoid error 'too many resources required for launch'
// reference: https://blog.csdn.net/feng__shuai/article/details/124395023
template<const int WMMA_M=16, const int WMMA_N=16, const int WMMA_K=16,
const int WMMA_TILE_M=4, const int WMMA_TILE_N=2,
Expand Down Expand Up @@ -257,7 +256,7 @@ hgemm_wmma_m16n16k16_mma4x2_warp2x4_stages_kernel(
// 共享内存,调用kernel时 需要指定动态共享内存大小,且smem的寻址
// 方式需要按照一维数组来使用 2. 提高L2 Cache的局部性(Thread
// Block Swizzle): https://zhuanlan.zhihu.com/p/555339335
// 3. nedd __launch_bounds__ to avoid error 'too many resources required for launch'
// 3. __launch_bounds__: avoid error 'too many resources required for launch'
// reference: https://blog.csdn.net/feng__shuai/article/details/124395023
template<const int WMMA_M=16, const int WMMA_N=16, const int WMMA_K=16,
const int WMMA_TILE_M=4, const int WMMA_TILE_N=2,
Expand Down Expand Up @@ -476,7 +475,7 @@ hgemm_wmma_m16n16k16_mma4x2_warp2x4_stages_dsmem_kernel(
}

// stage with 256x256 block, dynamic smem
// nedd __launch_bounds__ to avoid error 'too many resources required for launch'
// __launch_bounds__: avoid error 'too many resources required for launch'
// reference: https://blog.csdn.net/feng__shuai/article/details/124395023
template<const int WMMA_M=16, const int WMMA_N=16, const int WMMA_K=16,
const int WMMA_TILE_M=4, const int WMMA_TILE_N=4,
Expand Down
Loading

0 comments on commit 8c6922b

Please sign in to comment.