[BUG?] Behavior change of `local_tile` from 3.3.0 #1201

cloudhan · 2023-11-19T02:18:32Z

Describe the bug
I am not quite sure if it is a bug or design change, but a behavior change is observed in commit c008b4a (cutlass 3.3.0).

Steps/Code to reproduce bug

#include <vector>

#include <cute/tensor.hpp>
#include <cute/layout.hpp>

using namespace cute;

int main() {
  // A tensor of shape (128, 4, 2), think of it as a double buffered smem tensor 128x4
  auto layout = make_layout(make_shape(Int<128>{}, Int<4>{}, Int<2>{})); 
  std::vector<int> buffer(size(layout));
  auto tensor = make_tensor(buffer.data(), layout);
  for (int i = 0; i < size(tensor); i++) {
    tensor(i) = i;
  }

  const auto stripe_of_tensor = local_tile(tensor, make_tile(Int<8>{}, Int<4>{}), 0);
  print(stripe_of_tensor.layout());
}

Before the commit, the code print a layout (_8,_4,_1,_2):(_1,_128,_0,_512).
Since the commit, the code print (_8,_4):(_1,_128).

Expected behavior
The old layout should be print. Or maybe? Please elobrate.

The text was updated successfully, but these errors were encountered:

ccecka · 2023-11-19T03:13:57Z

There was a subtle design change for consistency and correctness, yes.

The implementation of local_tile(tensor, tiler, coord) is essentially two lines, the divide and the slice:

// Divide the tensor into rank-2 according to tiler
Tensor tiled_tensor = zipped_divide(tensor, tiler);           // ((TileM,TileN,...),(RestM,RestN,...))
// Slice into the Rest mode with coord
Tensor result = tiled_tensor(repeat<rank(tiler)>(_), coord);  // (TileM,TileN,...)

Previously, the coord was always appended to which gave you a slice into only the RestM mode. We now treat coord more faithfully to the above and if it is integral then it will directly slice into all of the Rest modes.

Thus, you're getting the 0th 8x4 tile of the 128x4x2 tensor.

You can retrieve the old behavior by using one of the following:

// Explicit coord that slices into RestM and keeps RestN and RestP
Tensor stripe = local_tile(tensor, make_tile(Int<8>{},Int<4>{}), make_coord(0,_,_)); // (8,4,1,2)

// Explicit coord that slices into only RestM (and keeps the others)
Tesnor stripe = local_tile(tensor, make_tile(Int<8>{},Int<4>{}), make_coord(0));     // (8,4,1,2)

cloudhan added ? - Needs Triage bug Something isn't working labels Nov 19, 2023

cloudhan closed this as completed Nov 19, 2023

cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Nov 28, 2023

Fix local_tile for version 3.3 and newer, NVIDIA/cutlass#1201

3e0e416

cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Dec 4, 2023

Fix local_tile for version 3.3 and newer, NVIDIA/cutlass#1201

73a1f91

cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Dec 12, 2023

Fix local_tile for version 3.3 and newer, NVIDIA/cutlass#1201

0ad9f48

cloudhan added a commit to cloudhan/rabbit-hole that referenced this issue Dec 12, 2023

Fix local_tile for version 3.3 and newer, NVIDIA/cutlass#1201

a26f6ad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG?] Behavior change of `local_tile` from 3.3.0 #1201

[BUG?] Behavior change of `local_tile` from 3.3.0 #1201

cloudhan commented Nov 19, 2023

ccecka commented Nov 19, 2023 •

edited

Loading

[BUG?] Behavior change of local_tile from 3.3.0 #1201

[BUG?] Behavior change of local_tile from 3.3.0 #1201

Comments

cloudhan commented Nov 19, 2023

ccecka commented Nov 19, 2023 • edited Loading

[BUG?] Behavior change of `local_tile` from 3.3.0 #1201

[BUG?] Behavior change of `local_tile` from 3.3.0 #1201

ccecka commented Nov 19, 2023 •

edited

Loading