You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No special copy atoms are needed for Volta as all copies were thread copies with CUDA C++ exposure. The Default/Universal copy is sufficient pre Ampere
Hi, I am currently studying these slides: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9593-cutensor-high-performance-tensor-operations-in-cuda-v2.pdf
I was wondering if there's any sample code for loading data from global memory to shared memory as shown in page 23?
Can I also ask why there does not exist
copy_sm70.hpp
incutlass/include/cute/arch
andcopy_traits_sm70.hpp
incutlass/include/cute/atom
?Thanks!
The text was updated successfully, but these errors were encountered: