You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have several question using CUTLASS. It would be very much appreciated to be answered.
How to know if I am calling CUTLASS code correctly?
I am using matrix A (M x K), matrix B (N x K), matrix C (M x N) and calling cutlass call like below. I transposed B and provided A, B^T and C as argument, which aligns with the layout in the picture. Each A, B, C are half precision, and accumulator is fp32.
I am also calling CUBLAS kernel, but this time providing A and B, making kernel transpose by itself.
When I compare the result for M = N = K = 4096, I see maximum error of 0.25, which seems to be above 0.05 default value given for CUTLASS profiler. I wonder whether changes in instruction shape, warp size or thread block size affects error value. I am using shapes like this picture.
Should I be getting 0 error to know whether I am calling the same kernel with same layout setting?
What is epilson and non-zero floor in this function used by CUTLASS profiler?
What is the correlation between thread block shape, warp shape and instruction shape?
If I use any of the warp shapes given in documentation of picture below, it gives me error in the second picture.
Thus, none of the warp shapes I can use to directly use to call GemmUniversal.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
haeunlee99
changed the title
[QST] Questions about correctness, layout and cutlass usage on H100
[QST] Questions about correctness test and layout
Aug 29, 2024
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Hello, I have several question using CUTLASS. It would be very much appreciated to be answered.
I am using matrix A (M x K), matrix B (N x K), matrix C (M x N) and calling cutlass call like below. I transposed B and provided A, B^T and C as argument, which aligns with the layout in the picture. Each A, B, C are half precision, and accumulator is fp32.
I am also calling CUBLAS kernel, but this time providing A and B, making kernel transpose by itself.
When I compare the result for M = N = K = 4096, I see maximum error of 0.25, which seems to be above 0.05 default value given for CUTLASS profiler. I wonder whether changes in instruction shape, warp size or thread block size affects error value. I am using shapes like this picture.
Should I be getting 0 error to know whether I am calling the same kernel with same layout setting?
What is epilson and non-zero floor in this function used by CUTLASS profiler?
If I use any of the warp shapes given in documentation of picture below, it gives me error in the second picture.
Thus, none of the warp shapes I can use to directly use to call GemmUniversal.
Thanks in advance!
The text was updated successfully, but these errors were encountered: