ttnn pad is not working for some inputs and seems to ignore pad value #15511

bbradelTT · 2024-11-27T20:03:08Z

pad value seems to be ignored. Also, some shapes seem not to work.

Repro steps:

Run the following pytest.

import pytest
import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc
from models.utility_functions import torch_random
    
def test_pad(device):
    input_shape = (1, 1, 12, 8)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    c = ttnn.pad(b, [1, 1, 12, 64], [0, 0, 0, 0], 3)
    print(c)   

def test_pad2(device):
    input_shape = (1, 2, 3, 4)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    c = ttnn.pad(b, [1, 2, 32, 32], [0, 0, 0, 0], value=3)
    print(c)

Expected:
The printed tensors should be printed, be the right size, and have 3 padded.

Actual:

First test throws an error:

E       RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/pad/device/pad_op.cpp:26: input_tensor.get_legacy_shape()[2] + this->input_tensor_start[2] <= this->output_tensor_shape[2]
E       info:
E       Output size cannot fit input with offset

Second test
prints

ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               [ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               ...,
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000]],

              [[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               [ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               ...,
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000]]]], shape=Shape([1, 2, 32, 32]), dtype=DataType::FLOAT32, layout=Layout::TILE)

The text was updated successfully, but these errors were encountered:

TT-BrianLiu · 2024-11-28T16:01:33Z

ttnn.from_torch automatically pads to tile if you specify TILE layout. So in your cases:

Case 1: b is already shape [1, 1, 12[32], 8[32]] after ttnn.from_torch and it's complaining that your requested output shape on the second dim is < 32. If you change that output shape to [1, 1, 32, 64] it should work.
Case 2: b is already shape [1, 2, 3[32], 4[32]] after ttnn.from_torch so the pad doesn't actually run.

TT-BrianLiu · 2024-11-28T16:02:11Z

This is a symptom of our pad/unpad ops not being treated as actual ops that returns you a new logical tensor. @ayerofieiev-tt

bbradelTT · 2024-11-28T16:18:04Z

@TT-BrianLiu then how do I fill in the tensor outside of the logical shape to the pad value?

def test_pad(device):
    input_shape = (1, 1, 12, 8)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    c = ttnn.pad(b, [1, 1, 32, 32], [0, 0, 0, 0], 3)
    print(c)

produces

ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               [ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               ...,
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::FLOAT32, layout=Layout::TILE)

bbradelTT · 2024-11-28T18:02:54Z

The following also doesn't work:

def test_pad(device):
    input_shape = (1, 1, 12, 8)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    b2 = ttnn.to_layout(b, layout=ttnn.ROW_MAJOR_LAYOUT)
    c = ttnn.pad(b2, [1, 1, 32, 32], [0, 0, 0, 0], 3)
    print(c)

There are still 0s

ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  3.00392,  3.00392],
               [ 1.00000,  1.00000,  ...,  3.00392,  3.00392],
               ...,
               [ 3.00392,  3.00392,  ...,  0.00000,  0.00000],
               [ 3.00392,  3.00392,  ...,  0.00000,  0.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::FLOAT32, layout=Layout::ROW_MAJOR)

bbradelTT · 2024-11-28T18:24:55Z

Update. The issue with row major layout is due to a known bug: #12896

I will look at seeing if I can convert back and forth between tile and row major layouts and hope that bug is fixed soon.

jaykru-tt · 2024-12-03T17:15:23Z

Wanted to give an update on this (will give daily updates going forward.) I'm starting with the row major issue (last dim padding). I met with @yugaoTT yesterday to chat about this which revealed that we will need new kernels to support that functionality. I'm starting on those today and hope to be done by end of day on Friday.

jaykru-tt · 2024-12-04T17:00:13Z

Update on progress from yesterday: I've got the new stick-wise sharded reader written out on paper. I just need to transcribe it to code this morning. I'll also be adding a new program factory for these cases. Fortunately the writer I've already written should work just fine. After that, the rest of the week will hopefully be devoted to debugging. Thanks all for your patience! :)

jaykru-tt · 2024-12-05T20:03:35Z

This is taking a little longer than expected due to some unforeseen issues with unaligned reads/writes. I'm still aiming for completing this by tomorrow as a stretch goal, but a more conservative estimate is mid-late next week with time for debug and any other unforeseen wrenches that should crop up.

jaykru-tt · 2024-12-06T19:04:47Z

Today's update: I've got the kernels written and 99% of the program factory written. The rest is cleanup work and debug. The alignment complexities are mostly resolved, but I may still need to make some tweaks for certain classes of shapes. I'd like to be done end of day today if possible, but that is highly ambitious. I've gotten some pushback on my mid-late next week estimate, so I'll revise it to Monday by enlisting some help from the TM team if the debugging stage doesn't go perfectly.

jaykru-tt · 2024-12-06T20:46:59Z

I've got the program factory and kernels cleaned up. Now working on modifying the pytests Borys has provided to confirm that this fixes his cases. @bbradelTT one thing to confirm with you: the solution I've been working on will only work on row-major height-sharded tensors; is this a problem for your use case?

bbradelTT · 2024-12-06T22:48:34Z

@jaykru-tt Hopefully. I'll try to make it work. We need to have a general solution though.

jaykru-tt · 2024-12-06T22:55:15Z

I've spoken a bit with @bbradelTT about this offline. The general solution is basically a rewrite to the tiled kernels to accomodate bf8_b, last dim padding, and padding by shapes that aren't full tiles. I agree that we need this. Pad in general has very poor generality. We'll need to come up with a plan to tackle this and in particular priorities for the requisite new kernels. cc @ntarafdar

jaykru-tt · 2024-12-11T04:12:05Z

@bbradelTT I've got this working on one of your cases above now and Evan's case from #12896 as well as some others. I am working on a fix for a bug exposed by the shapes from the test_pad2 case above. In particular padding that adds extra rows (padding along the N/C/H dimensions) is currently messed up in my implementation. I'm think I'm nearly done with the fix. If I'm right, I should have a PR up some time tomorrow. Thanks again for your patience! This wound up being a lot trickier than expected.

jaykru-tt · 2024-12-11T05:31:58Z

I have happy news sooner than expected 😄 I've got it working on that final test, so should definitely be in shape to have the PR up tomorrow.

jaykru-tt · 2024-12-11T06:57:32Z

I spoke too soon :( My approach works on a minimized version of test_pad2 but not the full case (padding to 32x32). I've got an idea for how to patch around this which I'll be able to implement pretty quickly in the morning.

jaykru-tt · 2024-12-12T04:10:56Z

Update for today: I've got my workaround implemented and working for most cases. Working on a bug in one case and should hopefully be able to wrap up the rest quickly after that.

cmaryanTT · 2024-12-13T17:25:01Z

@jaykru-tt - please provide update today

ntarafdar · 2024-12-13T17:29:14Z

Hey @cmaryanTT @jaykru-tt has a PR addressing thr issue that he is going to merge after ppl approve.

#15985

@yugaoTT

### Tickets - #15511 - #15603 (90% resolved with these changes and to be fully resolved in a future PR) - #12896 ### Problem description ttnn.pad's RM sharded implementation only has support for padding along the non-width dimensions. The row major implementation additionally is not fully general with respect to the width dimension, so until now there are no great options for padding along width. In a future PR coming tomorrow, I'll add input massaging code to convert to row-major and shard as needed for input configurations that aren't currently supported by pad. ### What's changed - Adds new kernels to support padding along the width dimension. - For pad operations requiring both NCH and width padding, we use a fused op using the original height-padding kernels and the new width kernels. - The previous point required extensive refactoring to the host code. I would like eyes on pad.cpp please @yugaoTT @sminakov-tt. - Also adds a bunch of common utility functions for working with sharded tensors: - A function for easily creating sharded memory configs from C++ (analogous to the Python `create_sharded_memory_config` utility function created by @ntarafdar) - A function for locating elements of a shard by their coordinates within the tensor. I've tested this one in the context of this PR, but it didn't end up being necessary in the final implementation. ### Checklist - [~] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12327681570) - [x] [Model regression CI testing passes](https://github.com/tenstorrent/tt-metal/actions/runs/12308045581) - [x] [Device performance regression CI testing passes](https://github.com/tenstorrent/tt-metal/actions/runs/12308046347) - [ ] Blackhole Post commit (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes --------- Co-authored-by: tarafdarTT <[email protected]>

jaykru-tt · 2024-12-14T22:21:04Z

@bbradelTT see the linked PR. Let me know if I can close this out.

bbradelTT · 2024-12-16T14:38:10Z

I think so. I'll close this.
Thanks.

bbradelTT added bug Something isn't working op_cat: TM P0 labels Nov 27, 2024

bbradelTT assigned jaykru-tt Nov 27, 2024

This was referenced Nov 27, 2024

[Bug Report] ttnn.mean op - Data Mismatch #13621

Closed

[Bug Report] ttnn.max wrong results #12662

Closed

[Bug Report] ttnn.sum resulting shape is wrong #14530

Closed

bbradelTT mentioned this issue Dec 2, 2024

TTNN pad does not support last dimension #12896

Closed

jaykru-tt mentioned this issue Dec 10, 2024

ttnn.pad does not when support output_shape[-1] is odd #15603

Closed

jaykru-tt mentioned this issue Dec 13, 2024

Add support for padding along width dimension to ttnn.pad #15985

Merged

6 tasks

bbradelTT closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ttnn pad is not working for some inputs and seems to ignore pad value #15511

ttnn pad is not working for some inputs and seems to ignore pad value #15511

bbradelTT commented Nov 27, 2024

TT-BrianLiu commented Nov 28, 2024

TT-BrianLiu commented Nov 28, 2024

bbradelTT commented Nov 28, 2024 •

edited

Loading

bbradelTT commented Nov 28, 2024

bbradelTT commented Nov 28, 2024

jaykru-tt commented Dec 3, 2024

jaykru-tt commented Dec 4, 2024

jaykru-tt commented Dec 5, 2024

jaykru-tt commented Dec 6, 2024

jaykru-tt commented Dec 6, 2024 •

edited

Loading

bbradelTT commented Dec 6, 2024

jaykru-tt commented Dec 6, 2024

jaykru-tt commented Dec 11, 2024

jaykru-tt commented Dec 11, 2024

jaykru-tt commented Dec 11, 2024

jaykru-tt commented Dec 12, 2024

cmaryanTT commented Dec 13, 2024

ntarafdar commented Dec 13, 2024

jaykru-tt commented Dec 14, 2024

bbradelTT commented Dec 16, 2024

ttnn pad is not working for some inputs and seems to ignore pad value #15511

ttnn pad is not working for some inputs and seems to ignore pad value #15511

Comments

bbradelTT commented Nov 27, 2024

TT-BrianLiu commented Nov 28, 2024

TT-BrianLiu commented Nov 28, 2024

bbradelTT commented Nov 28, 2024 • edited Loading

bbradelTT commented Nov 28, 2024

bbradelTT commented Nov 28, 2024

jaykru-tt commented Dec 3, 2024

jaykru-tt commented Dec 4, 2024

jaykru-tt commented Dec 5, 2024

jaykru-tt commented Dec 6, 2024

jaykru-tt commented Dec 6, 2024 • edited Loading

bbradelTT commented Dec 6, 2024

jaykru-tt commented Dec 6, 2024

jaykru-tt commented Dec 11, 2024

jaykru-tt commented Dec 11, 2024

jaykru-tt commented Dec 11, 2024

jaykru-tt commented Dec 12, 2024

cmaryanTT commented Dec 13, 2024

ntarafdar commented Dec 13, 2024

jaykru-tt commented Dec 14, 2024

bbradelTT commented Dec 16, 2024

bbradelTT commented Nov 28, 2024 •

edited

Loading

jaykru-tt commented Dec 6, 2024 •

edited

Loading