Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Question About Cute::Tied_Copy #1859

Closed
ZhangZhiPku opened this issue Oct 10, 2024 · 1 comment
Closed

[QST] Question About Cute::Tied_Copy #1859

ZhangZhiPku opened this issue Oct 10, 2024 · 1 comment

Comments

@ZhangZhiPku
Copy link

Please help me to understand how TiledCopy works.

In the example at

copy_kernel_vectorized<<< gridDim, blockDim >>>(

I see that we declare a thr_layout in the form of {32, 8}, and a vec_layout in the form of {4,1}. Meanwhile, the block_shape (line 182) is declared as {128, 64}. I think that each thread block will be responsible for moving data of size {128, 64}. However we only have 32*8=256 threads, with each thread processing 4 elements, it seems that each Copy Operation can only move 1024 elements at a time, and some looping operations must be performed to move the entire data block{128, 64}.

But why does the code only call copy(tiled_copy, thr_tile_S, fragment) once? Does this mean that copy(tiled_copy, ...) internally copies the data in a loop? If so, could you explain how this loop is implemented, and how its number of iterations is determined?

Additionally, since there are 256 threads in my thread block, could you tell me why thr_layout is defined as {32, 8} instead of other forms like {8, 32}, {256, 1}, etc.? How do these different forms affect tiled_copy?

@thakkarV
Copy link
Collaborator

Does this mean that copy(tiled_copy, ...) internally copies the data in a loop?

Yes

If so, could you explain how this loop is implemented, and how its number of iterations is determined?

By the size of the input tensors to the copy operation

thr_layout is defined as {32, 8} instead of other forms like {8, 32}, {256, 1}, etc.

To promote coaleaced accesses

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants