Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ttnn pad is not working for some inputs and seems to ignore pad value #15511

Closed
bbradelTT opened this issue Nov 27, 2024 · 20 comments
Closed

ttnn pad is not working for some inputs and seems to ignore pad value #15511

bbradelTT opened this issue Nov 27, 2024 · 20 comments
Assignees
Labels
bug Something isn't working op_cat: TM P0

Comments

@bbradelTT
Copy link
Contributor

pad value seems to be ignored. Also, some shapes seem not to work.

Repro steps:

Run the following pytest.

import pytest
import torch
import ttnn
from tests.ttnn.utils_for_testing import assert_with_pcc
from models.utility_functions import torch_random
    
def test_pad(device):
    input_shape = (1, 1, 12, 8)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    c = ttnn.pad(b, [1, 1, 12, 64], [0, 0, 0, 0], 3)
    print(c)   

def test_pad2(device):
    input_shape = (1, 2, 3, 4)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    c = ttnn.pad(b, [1, 2, 32, 32], [0, 0, 0, 0], value=3)
    print(c)

Expected:
The printed tensors should be printed, be the right size, and have 3 padded.

Actual:

First test throws an error:

E       RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/data_movement/pad/device/pad_op.cpp:26: input_tensor.get_legacy_shape()[2] + this->input_tensor_start[2] <= this->output_tensor_shape[2]
E       info:
E       Output size cannot fit input with offset

Second test
prints

ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               [ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               ...,
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000]],

              [[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               [ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               ...,
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000]]]], shape=Shape([1, 2, 32, 32]), dtype=DataType::FLOAT32, layout=Layout::TILE)
@TT-BrianLiu
Copy link
Contributor

ttnn.from_torch automatically pads to tile if you specify TILE layout. So in your cases:

  • Case 1: b is already shape [1, 1, 12[32], 8[32]] after ttnn.from_torch and it's complaining that your requested output shape on the second dim is < 32. If you change that output shape to [1, 1, 32, 64] it should work.
  • Case 2: b is already shape [1, 2, 3[32], 4[32]] after ttnn.from_torch so the pad doesn't actually run.

@TT-BrianLiu
Copy link
Contributor

This is a symptom of our pad/unpad ops not being treated as actual ops that returns you a new logical tensor. @ayerofieiev-tt

@bbradelTT
Copy link
Contributor Author

bbradelTT commented Nov 28, 2024

@TT-BrianLiu then how do I fill in the tensor outside of the logical shape to the pad value?

def test_pad(device):
    input_shape = (1, 1, 12, 8)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    c = ttnn.pad(b, [1, 1, 32, 32], [0, 0, 0, 0], 3)
    print(c)

produces

ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               [ 1.00000,  1.00000,  ...,  0.00000,  0.00000],
               ...,
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000],
               [ 0.00000,  0.00000,  ...,  0.00000,  0.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::FLOAT32, layout=Layout::TILE)

@bbradelTT
Copy link
Contributor Author

The following also doesn't work:

def test_pad(device):
    input_shape = (1, 1, 12, 8)
    a = torch.ones(input_shape)
    b = ttnn.from_torch(a, dtype=ttnn.float32, layout=ttnn.TILE_LAYOUT, device=device)
    b2 = ttnn.to_layout(b, layout=ttnn.ROW_MAJOR_LAYOUT)
    c = ttnn.pad(b2, [1, 1, 32, 32], [0, 0, 0, 0], 3)
    print(c)

There are still 0s

ttnn.Tensor([[[[ 1.00000,  1.00000,  ...,  3.00392,  3.00392],
               [ 1.00000,  1.00000,  ...,  3.00392,  3.00392],
               ...,
               [ 3.00392,  3.00392,  ...,  0.00000,  0.00000],
               [ 3.00392,  3.00392,  ...,  0.00000,  0.00000]]]], shape=Shape([1, 1, 32, 32]), dtype=DataType::FLOAT32, layout=Layout::ROW_MAJOR)

@bbradelTT
Copy link
Contributor Author

Update. The issue with row major layout is due to a known bug: #12896

I will look at seeing if I can convert back and forth between tile and row major layouts and hope that bug is fixed soon.

@jaykru-tt
Copy link
Contributor

Wanted to give an update on this (will give daily updates going forward.) I'm starting with the row major issue (last dim padding). I met with @yugaoTT yesterday to chat about this which revealed that we will need new kernels to support that functionality. I'm starting on those today and hope to be done by end of day on Friday.

@jaykru-tt
Copy link
Contributor

Update on progress from yesterday: I've got the new stick-wise sharded reader written out on paper. I just need to transcribe it to code this morning. I'll also be adding a new program factory for these cases. Fortunately the writer I've already written should work just fine. After that, the rest of the week will hopefully be devoted to debugging. Thanks all for your patience! :)

@jaykru-tt
Copy link
Contributor

This is taking a little longer than expected due to some unforeseen issues with unaligned reads/writes. I'm still aiming for completing this by tomorrow as a stretch goal, but a more conservative estimate is mid-late next week with time for debug and any other unforeseen wrenches that should crop up.

@jaykru-tt
Copy link
Contributor

Today's update: I've got the kernels written and 99% of the program factory written. The rest is cleanup work and debug. The alignment complexities are mostly resolved, but I may still need to make some tweaks for certain classes of shapes. I'd like to be done end of day today if possible, but that is highly ambitious. I've gotten some pushback on my mid-late next week estimate, so I'll revise it to Monday by enlisting some help from the TM team if the debugging stage doesn't go perfectly.

@jaykru-tt
Copy link
Contributor

jaykru-tt commented Dec 6, 2024

I've got the program factory and kernels cleaned up. Now working on modifying the pytests Borys has provided to confirm that this fixes his cases. @bbradelTT one thing to confirm with you: the solution I've been working on will only work on row-major height-sharded tensors; is this a problem for your use case?

@bbradelTT
Copy link
Contributor Author

@jaykru-tt Hopefully. I'll try to make it work. We need to have a general solution though.

@jaykru-tt
Copy link
Contributor

I've spoken a bit with @bbradelTT about this offline. The general solution is basically a rewrite to the tiled kernels to accomodate bf8_b, last dim padding, and padding by shapes that aren't full tiles. I agree that we need this. Pad in general has very poor generality. We'll need to come up with a plan to tackle this and in particular priorities for the requisite new kernels. cc @ntarafdar

@jaykru-tt
Copy link
Contributor

@bbradelTT I've got this working on one of your cases above now and Evan's case from #12896 as well as some others. I am working on a fix for a bug exposed by the shapes from the test_pad2 case above. In particular padding that adds extra rows (padding along the N/C/H dimensions) is currently messed up in my implementation. I'm think I'm nearly done with the fix. If I'm right, I should have a PR up some time tomorrow. Thanks again for your patience! This wound up being a lot trickier than expected.

@jaykru-tt
Copy link
Contributor

I have happy news sooner than expected 😄 I've got it working on that final test, so should definitely be in shape to have the PR up tomorrow.

@jaykru-tt
Copy link
Contributor

I spoke too soon :( My approach works on a minimized version of test_pad2 but not the full case (padding to 32x32). I've got an idea for how to patch around this which I'll be able to implement pretty quickly in the morning.

@jaykru-tt
Copy link
Contributor

Update for today: I've got my workaround implemented and working for most cases. Working on a bug in one case and should hopefully be able to wrap up the rest quickly after that.

@cmaryanTT
Copy link

@jaykru-tt - please provide update today

@ntarafdar
Copy link
Contributor

Hey @cmaryanTT @jaykru-tt has a PR addressing thr issue that he is going to merge after ppl approve.

#15985

jaykru-tt added a commit that referenced this issue Dec 14, 2024
### Tickets
- #15511 
- #15603 (90% resolved
with these changes and to be fully resolved in a future PR)
- #12896

### Problem description
ttnn.pad's RM sharded implementation only has support for padding along
the non-width dimensions. The row major implementation additionally is
not fully general with respect to the width dimension, so until now
there are no great options for padding along width. In a future PR
coming tomorrow, I'll add input massaging code to convert to row-major
and shard as needed for input configurations that aren't currently
supported by pad.

### What's changed
- Adds new kernels to support padding along the width dimension.
- For pad operations requiring both NCH and width padding, we use a
fused op using the original height-padding kernels and the new width
kernels.
- The previous point required extensive refactoring to the host code. I
would like eyes on pad.cpp please @yugaoTT @sminakov-tt.
- Also adds a bunch of common utility functions for working with sharded
tensors:
- A function for easily creating sharded memory configs from C++
(analogous to the Python `create_sharded_memory_config` utility function
created by @ntarafdar)
- A function for locating elements of a shard by their coordinates
within the tensor. I've tested this one in the context of this PR, but
it didn't end up being necessary in the final implementation.

### Checklist
- [~] [Post commit CI
passes](https://github.com/tenstorrent/tt-metal/actions/runs/12327681570)
- [x] [Model regression CI testing
passes](https://github.com/tenstorrent/tt-metal/actions/runs/12308045581)
- [x] [Device performance regression CI testing
passes](https://github.com/tenstorrent/tt-metal/actions/runs/12308046347)
- [ ] Blackhole Post commit (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [ ] New/Existing tests provide coverage for changes

---------

Co-authored-by: tarafdarTT <[email protected]>
@jaykru-tt
Copy link
Contributor

@bbradelTT see the linked PR. Let me know if I can close this out.

@bbradelTT
Copy link
Contributor Author

I think so. I'll close this.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working op_cat: TM P0
Projects
None yet
Development

No branches or pull requests

5 participants