-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] How to avoid too many resources requested #1166
Comments
what shapes do you want to use? we don't want to use too many resources which can hurt the performance. |
@hwu36 hi, the ThreadblockShape is |
What is data type and architecture? |
Data type is float16. Data type of internal accumulation is float32. GPU is 2080ti. |
your threadblock size is 128x128, your warp size is 32x32. so you need 128/32 x 128/32 = 16 warps. we usually use 4 or 8 warps. so you'd better use warp size 64x64. if i am not wrong, 2080ti is a turing card. it better use instruction shape 16x8x8. so you could use threadblock shape 128x128x32, warp shape 64x64x32, and instruction shape 16x8x8. you could find more plausible tile sizes in our profilers https://github.com/NVIDIA/cutlass/blob/main/python/cutlass_library/generator.py#L1375-L1383 |
how can I convert these tile descriptions to TVM tile size shape? thanks |
|
What do you mean by "TVM tile size shape"? |
This issue has been labeled |
This issue has been labeled |
What is your question?
I try to use the
cutlass::conv::device::Convolution
with the fixed ThreadblockShape, WarpShape and InstructionShape. There is internal error which is too many resources requested actually. It may be useful to modify the ThreadblockShape or WarpShape. Is there any other solutions? For example,__launch_bounds__
may be useful in such case of cuda kernel.The text was updated successfully, but these errors were encountered: