Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ttnn.moreh_cumsum causes low accuracy on Bloom model #17594

Open
amalbasaTT opened this issue Feb 5, 2025 · 6 comments
Open

ttnn.moreh_cumsum causes low accuracy on Bloom model #17594

amalbasaTT opened this issue Feb 5, 2025 · 6 comments
Assignees
Labels

Comments

@amalbasaTT
Copy link
Contributor

Ticket

Link to Github Issue

Describe the bug
ttnn.moreh_cumsum causes low PCC (0.8703882797784891) on Bloom model:
- input: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
- shape: [1, 32, 1, 1]
- dtype: ttnn.uint32
- layout: Layout::TILE
Output tensor has 8388608 where input tensor has 1.
Issue happens at ttnn_moreh_cumsum = ttnn.moreh_cumsum(ttnn_to_device, 1, )

To Reproduce
Steps to reproduce the behavior:

  1. Install torch-ttnn as explained in https://github.com/tenstorrent/pytorch2.0_ttnn?tab=readme-ov-file
  2. Possition to pytorch2.0_ttnn directory
  3. Generate Bloom_code.py and Bloom_inputs.pickle by running command:
pytest tests/models/bloom/test_bloom.py --gen_op_accuracy_tests
  1. In Bloom_code.py modify 109. line of code (_tensor_constant0 = 0.7071067690849304) to _tensor_constant0 = torch.tensor(0.7071067690849304) (check issue Bloom_code.py forward method fails at aten.lift_fresh_copy.default(_tensor_constant0, ) because _tensor_constant0 is not a Tensor pytorch2.0_ttnn#743)
  2. Run Bloom_code.py:
python3 tests/autogen_accuracy_tests/Bloom_code.py

Low accuracy error will be at test_accuracy(cumsum, ttnn_reshape_1)

@ayerofieiev-tt
Copy link
Member

@amalbasaTT can we create a simple TT-NN test that fails?

@ayerofieiev-tt ayerofieiev-tt added accuracy and removed accuracy bug Something isn't working labels Feb 12, 2025
@ayerofieiev-tt
Copy link
Member

Also, now that we know about this issue, can someone from the team start looking into this deeper? Is there an expertise in the team?

@amalbasaTT
Copy link
Contributor Author

Also, now that we know about this issue, can someone from the team start looking into this deeper? Is there an expertise in the team?

We had some experience debugging some composite ops and sharding in cpp, however not as much with kernels per se. We most certainly can look deeper into it, but would appreciate it if you could provide us with some docs to figure out in more detail how some of the lower level functions which appear in kernel implementation work.

@amalbasaTT
Copy link
Contributor Author

Unit test:

# SPDX-FileCopyrightText: © 2025 Tenstorrent Inc.

# SPDX-License-Identifier: Apache-2.0

from loguru import logger
import random
import pytest
import torch
import ttnn
import traceback

from tests.ttnn.utils_for_testing import assert_with_pcc
from tests.ttnn.utils_for_testing import check_with_pcc

aten = torch.ops.aten


def run_moreh_cumsum_tests(
    x,
    device,
):
    x = torch.tensor(x).to(torch.int32)
    
    try:
        ref_value = aten.cumsum.default(x, -1, )
        
        tt_x = ttnn.from_torch(
            x,
            dtype=ttnn.uint32,
            layout=ttnn.ROW_MAJOR_LAYOUT,
        )
        tt_x = ttnn.reshape(tt_x, (1, 32, 1, 1), )
        passed, message = check_with_pcc(x.reshape(1, 32, 1, 1), ttnn.to_torch(tt_x))
        assert passed, message
        tt_x = ttnn.from_device(tt_x, )
        passed, message = check_with_pcc(x.reshape(1, 32, 1, 1), ttnn.to_torch(tt_x))
        assert passed, message
        tt_x = ttnn.to_layout(tt_x, ttnn.TILE_LAYOUT, )
        passed, message = check_with_pcc(x.reshape(1, 32, 1, 1), ttnn.to_torch(tt_x))
        assert passed, message
        tt_x = ttnn.to_device(tt_x, device=device)
        passed, message = check_with_pcc(x.reshape(1, 32, 1, 1), ttnn.to_torch(tt_x))
        assert passed, message
        tt_result = ttnn.moreh_cumsum(tt_x, 1, )
        passed, message = check_with_pcc(ref_value.reshape(1, 32, 1, 1), ttnn.to_torch(tt_result))
        assert passed, f"{message, tt_x}"
        tt_result = ttnn.to_layout(tt_result, ttnn.ROW_MAJOR_LAYOUT, )
        tt_result = ttnn.reshape(tt_result, (1, 32), )
        
        tt_result = ttnn.to_torch(tt_result)
        
    except Exception as e:
        logger.warning(f"Test execution crashed: {e}")
        print(traceback.format_exc())
        raise e

    #assert len(tt_result.shape) == len(ref_value.shape)
    assert tt_result.shape == ref_value.shape
    assert_with_pcc(ref_value, tt_result, 0.999)


test_sweep_args = [
    (
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 1, 1, 1, 1, 1, 1, 1],
    ),
]


@pytest.mark.parametrize(
    "x",
    (test_sweep_args),
)
def test_moreh_cumsum(x, device):
    run_moreh_cumsum_tests(x, device)

@bfilipovicTT
Copy link
Contributor

@ayerofieiev-tt I have looked into it, and I found out that we've had a similar issue while testing "add" operation where second op was int scalar (#17019 (comment)).
Since this op also consists of summing integers, and since Andrija's unit test is passing when I change int to float, I suspect it could be related.

@nemanjagrujic
Copy link
Contributor

@ayerofieiev-tt

We looked into this a bit deeper. It turned out that ttnn.add and ttnn.experimental.add also don't work with uint32.

Workaround for now can be that we change the lowering to include typecast to and from bfloat16. That will improve accuracy of model so we can continue.

Considering moreh_cumsum , it's using ckernel::add_tiles I'am not sure how to make it to work with uint32. There is SFPU operation to add integers, and we are looking into that now.

@umadevimcw any insight?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Todo
Development

No branches or pull requests

4 participants