Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG?] Behavior change of local_tile from 3.3.0 #1201

Closed
cloudhan opened this issue Nov 19, 2023 · 1 comment
Closed

[BUG?] Behavior change of local_tile from 3.3.0 #1201

cloudhan opened this issue Nov 19, 2023 · 1 comment
Labels
? - Needs Triage bug Something isn't working

Comments

@cloudhan
Copy link

Describe the bug
I am not quite sure if it is a bug or design change, but a behavior change is observed in commit c008b4a (cutlass 3.3.0).

Steps/Code to reproduce bug

#include <vector>

#include <cute/tensor.hpp>
#include <cute/layout.hpp>

using namespace cute;

int main() {
  // A tensor of shape (128, 4, 2), think of it as a double buffered smem tensor 128x4
  auto layout = make_layout(make_shape(Int<128>{}, Int<4>{}, Int<2>{})); 
  std::vector<int> buffer(size(layout));
  auto tensor = make_tensor(buffer.data(), layout);
  for (int i = 0; i < size(tensor); i++) {
    tensor(i) = i;
  }

  const auto stripe_of_tensor = local_tile(tensor, make_tile(Int<8>{}, Int<4>{}), 0);
  print(stripe_of_tensor.layout());
}

Before the commit, the code print a layout (_8,_4,_1,_2):(_1,_128,_0,_512).
Since the commit, the code print (_8,_4):(_1,_128).

Expected behavior
The old layout should be print. Or maybe? Please elobrate.

@cloudhan cloudhan added ? - Needs Triage bug Something isn't working labels Nov 19, 2023
@ccecka
Copy link

ccecka commented Nov 19, 2023

There was a subtle design change for consistency and correctness, yes.

The implementation of local_tile(tensor, tiler, coord) is essentially two lines, the divide and the slice:

// Divide the tensor into rank-2 according to tiler
Tensor tiled_tensor = zipped_divide(tensor, tiler);           // ((TileM,TileN,...),(RestM,RestN,...))
// Slice into the Rest mode with coord
Tensor result = tiled_tensor(repeat<rank(tiler)>(_), coord);  // (TileM,TileN,...)

Previously, the coord was always appended to which gave you a slice into only the RestM mode. We now treat coord more faithfully to the above and if it is integral then it will directly slice into all of the Rest modes.

Thus, you're getting the 0th 8x4 tile of the 128x4x2 tensor.

You can retrieve the old behavior by using one of the following:

// Explicit coord that slices into RestM and keeps RestN and RestP
Tensor stripe = local_tile(tensor, make_tile(Int<8>{},Int<4>{}), make_coord(0,_,_)); // (8,4,1,2)

// Explicit coord that slices into only RestM (and keeps the others)
Tesnor stripe = local_tile(tensor, make_tile(Int<8>{},Int<4>{}), make_coord(0));     // (8,4,1,2)

cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Nov 28, 2023
cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Dec 4, 2023
cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Dec 12, 2023
cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants