Skip to content

Questions about GemmWithReduceK #723

Answered by hwu36
Enter-tainer asked this question in Q&A
Discussion options

You must be logged in to vote

The gemm+reduce_k kernel will first do some partial reduction while performing the threadblock scope mma. And the result of the partial reduction would be stored in gemm_k_accumulators. After finishing the epilogue, the kernel will sum gemm_k_accumulators up and get the final reduction result.

Only split k will do it. If you don't use splitk, we don't output partial sum.

And I just found that maybe I should also add code here to perform partial reduction while doing mma.

Yes, that is the right place. Suppose you want to reduce A operand, your cuda code is

            gemm_k_reduction[m * 2] += float(A[m * 4]);
            gemm_k_reduction[m * 2] += float(A[m * 4 + 2]);
  
            …

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
1 reply
@Enter-tainer
Comment options

Comment options

You must be logged in to vote
2 replies
@Enter-tainer
Comment options

@18321961708
Comment options

Answer selected by Enter-tainer
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants