-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Incorrect matrix multiplication result with CuTe library #1866
Comments
I believe the error is due to the small size of your input matrices (M=N=K=4). Here to partition gA, where mA has the shape (4, 4), and cta_tiler is (128, 8). Notice that the cta_tiler shape is larger than mA, resulting in gA having the shape (128, 8), but not all of this data is valid.
After partitioning, you declare tiled_copy and use it to perform a memory copy:
This copy will ignore the boundaries of mA(4, 4) and read a (128, 8) submatrix from gA, leading to incorrect data loading. So you will get an Additionally, I noticed:
I don't know why these configurations can compile successfully. This copy layout and mma layout seems mismatch with any ldmatrix and mma instruction, I have no idea how CUTE will deal with those configurations. @thakkarV can you give us more information? |
This issue has been labeled |
The kernel you have included does not support predication for incomplete tiles. you will not see correct results for 4x4x4 input shape (not aligned to the tile shape used) as a result. |
Description:
I encountered an issue when using the CuTe library for matrix multiplication. The output result does not match the expected values, and there are unexpected odd numbers like 27 and 33 in the result matrix, which should not occur in this case.
Reproduction
Below is the terminal output:
Observations:
The result matrix is unexpected, which EVEN contains unexpected odd values like 27 and 33, whereas they should not appear given the input matrices.
Below is my code:
I made just a little change of the original sgemm_sm80.cu
changes:
Could you help me identify if this is a bug or if I am missing something in the setup? I can provide additional details if necessary.
Thank you for your assistance!
whole code:
The text was updated successfully, but these errors were encountered: