Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert aten.ceil to ttnn.ceil #198

Merged
merged 20 commits into from
Dec 19, 2024
Merged

Convert aten.ceil to ttnn.ceil #198

merged 20 commits into from
Dec 19, 2024

Conversation

jdh8
Copy link
Contributor

@jdh8 jdh8 commented Sep 13, 2024

Ticket

Problem description

Convert aten.ceil to ttnn.ceil and probably other rounding ops

Currently, 1-D cases fail because ttnn.ceil produces 2-D results. For instance, ttnn.ceil takes a (1066,) tensor but produces a (1, 1066) tensor.

FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape0] - AssertionError: list(expected_pytorch_result.shape)=[1066] vs list(actual_pytorch_result.shape)=[1, 1066]
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape1] - AssertionError: list(expected_pytorch_result.shape)=[120] vs list(actual_pytorch_result.shape)=[1, 120]
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape2] - AssertionError: list(expected_pytorch_result.shape)=[128] vs list(actual_pytorch_result.shape)=[1, 128]
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape3] - AssertionError: list(expected_pytorch_result.shape)=[160] vs list(actual_pytorch_result.shape)=[1, 160]

I doubt if this happens to other elementwise ops, too.

What's changed

  • Convert aten.ceil to ttnn.ceil
  • Test conversion of aten.ceil
  • Convert aten.round to ttnn.round
  • Test conversion of aten.round
  • Convert aten.trunc to ttnn.trunc
  • Test conversion of aten.trunc

@ayerofieiev-tt
Copy link
Member

@jdh8 , lets fire a ticket and lets overcome in compiler by squeezing the dimension?

@jdh8
Copy link
Contributor Author

jdh8 commented Sep 13, 2024

I'll probably deal with this PR after #113 and #170. I suggest patching once for all univariate functions because there are many of them.

@jdh8
Copy link
Contributor Author

jdh8 commented Oct 3, 2024

Unfortunately, the workaround by squeezing out the extraneous dimension for 1-D tensors (e6e9ef8) still leaves errors. It cannot resolve tenstorrent/tt-metal#12671.

FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape0] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape2] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape3] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_cos.py::test_cos[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_erf.py::test_erf[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_exp.py::test_exp[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_floor.py::test_floor[input_shape4] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_gelu.py::test_gelu[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_neg.py::test_neg[input_shapes0] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape0-7-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape1-3-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape2-5-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape3-1-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape4-1-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_rsqrt.py::test_rsqrt[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_sqrt.py::test_sqrt[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
========================================================================= 17 failed, 43 passed in 39.79s ==========================================================================

@jerrysky3
Copy link
Contributor

jerrysky3 commented Oct 4, 2024

Unfortunately, the workaround by squeezing out the extraneous dimension for 1-D tensors (e6e9ef8) still leaves errors. It cannot resolve tenstorrent/tt-metal#12671.

FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape0] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape2] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_ceil.py::test_ceil[input_shape3] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_cos.py::test_cos[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_erf.py::test_erf[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_exp.py::test_exp[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_floor.py::test_floor[input_shape4] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_gelu.py::test_gelu[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_neg.py::test_neg[input_shapes0] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape0-7-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape1-3-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape2-5-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape3-1-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_remainder.py::test_remainder[input_shape4-1-True] - AssertionError: assert 0 == 1
FAILED tests/lowering/eltwise/unary/test_rsqrt.py::test_rsqrt[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
FAILED tests/lowering/eltwise/unary/test_sqrt.py::test_sqrt[input_shape1] - RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/tensor/types.cpp:170: normalized_index >= 0 and normalized_index < rank
========================================================================= 17 failed, 43 passed in 39.79s ==========================================================================

I suspect the problem is the tensor returned by ttnn.ceil is in TILE_LAYOUT, which ttnn.squeeze don't know how to turn an 2D tiled tensor into 1d. It works for me if the tensor is first converted into ROW_MARJO_LAYOUT:

result_after = ttnn.ceil(input_tensor)
result_after = ttnn.to_layout(result_after, ttnn.ROW_MAJOR_LAYOUT)
result_after = ttnn.squeeze(result_after, 0)

@jdh8
Copy link
Contributor Author

jdh8 commented Oct 24, 2024

My test results:

(python_env) jdh8@tt-loudbox:~/pytorch2.0_ttnn$ pytest tests/lowering/eltwise/unary/test_round.py 
=============================== test session starts ================================
platform linux -- Python 3.8.10, pytest-7.2.2, pluggy-1.5.0
rootdir: /home/jdh8/pytorch2.0_ttnn/tests, configfile: pytest.ini
plugins: split-0.8.2, anyio-4.5.2, xdist-3.6.1, dash-2.15.0, timeout-2.2.0
collected 5 items                                                                  

tests/lowering/eltwise/unary/test_round.py .....                             [100%]

================================ 5 passed in 3.40s =================================
                 Device | INFO     | Closing user mode device drivers

@jdh8 jdh8 added enhancement New feature or request and removed enhancement New feature or request labels Nov 22, 2024
@jdh8
Copy link
Contributor Author

jdh8 commented Nov 23, 2024

All the failing ops are aten.max_pool2d_with_indices_backward.default from U-Net and friends. We can catch these ops with

pytest tests/autogen_op/U*-train/*aten_max_pool2d_with_indices_backward_default.py

@jdh8
Copy link
Contributor Author

jdh8 commented Nov 26, 2024

@jdh8 jdh8 force-pushed the feature/rounding branch 2 times, most recently from 204dfe0 to 4c07949 Compare November 26, 2024 20:00
@jdh8
Copy link
Contributor Author

jdh8 commented Nov 26, 2024

Tests are running at https://github.com/tenstorrent/pytorch2.0_ttnn/actions/runs/12038678310

return unsqueeze_to_2d(TTNN_POINTWISE_UNARY_OPS[node.target])

if node.target == torch.ops.aten.round.default:
return unsqueeze_to_2d(ttnn.round, (args[0],), {"decimals": 0})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must update ttnn op binding to default the argument to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great! I once tried at tenstorrent/tt-metal#13851 but in vain.

return result

if node.target in TTNN_POINTWISE_UNARY_OPS:
return unsqueeze_to_2d(TTNN_POINTWISE_UNARY_OPS[node.target])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we unsqueeze here? Why only for unary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for inspiring me with a simpler algorithm. Reshape the result back when ndims < 2.

@jdh8
Copy link
Contributor Author

jdh8 commented Nov 29, 2024

I cannot locally reproduce the errors found in CI.
https://github.com/tenstorrent/pytorch2.0_ttnn/actions/runs/12073595035/job/33670553655#step:5:26198

==== 4 passed, 156 deselected, 2 xfailed, 8 warnings in 1076.48s (0:17:56) =====
|                 path                 |     passed     |   xfailed    | subtotal |
| ------------------------------------ | -------------- | ------------ | -------: |
| models/falcon/test_falcon.py         | :green_circle: |              |        1 |
| models/flan_t5/test_flan_t5.py       | :green_circle: | :red_circle: |        2 |
| models/glpn_kitti/test_glpn_kitti.py | :green_circle: | :red_circle: |        2 |
| models/gpt2/test_gpt2.py             | :green_circle: |              |        1 |
| TOTAL                                |              4 |            2 |        6 |

However, ResNet raises these errors only locally.

ERROR tests/models/resnet/test_resnet.py::test_resnet[train] - TypeError: ResNet18-train compiled failed to run.
ERROR tests/models/resnet/test_resnet.py::test_resnet[eval] - TypeError: ResNet18 compiled failed to run.

@ayerofieiev-tt
Copy link
Member

@kevinwuTT if you can suggest anything to enable debugging of this case

@jdh8 jdh8 force-pushed the feature/rounding branch 2 times, most recently from fe9f57c to 59c0fb2 Compare December 12, 2024 21:56
@jdh8
Copy link
Contributor Author

jdh8 commented Dec 19, 2024

@ayerofieiev-tt
Copy link
Member

@ayerofieiev-tt ayerofieiev-tt merged commit 75ef405 into main Dec 19, 2024
1 check passed
@ayerofieiev-tt ayerofieiev-tt deleted the feature/rounding branch December 19, 2024 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

aten.ceil.default
3 participants