forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve FP8 grouped GEMM perf via tileshape and cooperative (pytorch#…
…3653) Summary: X-link: facebookresearch/FBGEMM#729 Pull Request resolved: pytorch#3653 Tuning tileshape and leveraging cooperative bring **additional up to 1.4x speedup** compared to the existing FP8 grouped GEMM kernel configs for non-memory-bound shapes Reviewed By: jianyuh, jwfromm Differential Revision: D68609019 fbshipit-source-id: e5c5680d30b60a97d0bfe50906600f133e0f2391
- Loading branch information
1 parent
4957ca1
commit 98d54f7
Showing
1 changed file
with
19 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters