Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Could developer optimize cute for L2, bank conflict, double buffer and so on? Thank you!!! #1229

Closed
ziyuhuang123 opened this issue Dec 4, 2023 · 3 comments
Labels
feature request New feature or request

Comments

@ziyuhuang123
Copy link

https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/sgemm_nt_1.cu

It seems this does not implement L2 cache optimization (swizzle), and shared memory bank conflict, double buffer, optimization? Is it possible to use them? Thank you!!!!

@thakkarV
Copy link
Collaborator

thakkarV commented Dec 4, 2023

the goal of this example was to be a four part series of tutorials. they will be released over time in the future with the part four approaching about the speed of light. That said, all of the optimizations you mention can certainly be implemented in this example as well.

@ziyuhuang123
Copy link
Author

Well.... thank you! It is just, I am not that clever to write the code. So I will wait for your update, thanks~

@mnicely
Copy link
Collaborator

mnicely commented Jan 2, 2024

Closing due to inactivity. Feel free to reopen if needed.

@mnicely mnicely closed this as completed Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants