Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFPU shift operator issue when using sfpi #15514

Open
rdjogoTT opened this issue Nov 27, 2024 · 11 comments
Open

SFPU shift operator issue when using sfpi #15514

rdjogoTT opened this issue Nov 27, 2024 · 11 comments
Assignees
Labels
bug Something isn't working LLK P2

Comments

@rdjogoTT
Copy link
Contributor

Describe the bug
The following SFPU llk, implemented using sfpi, does not work:

template <bool APPROXIMATION_MODE, int ITERATIONS = 8>
inline void calculate_left_shift(const uint shift_amt) {
#pragma GCC unroll 0
    for (int d = 0; d < ITERATIONS; d++) {
        vInt val = dst_reg[0];
        val = val << shift_amt;
        dst_reg[0] = val;

        dst_reg++;
    }
}

But this one does, and it uses SFPU instructions directly:

template <bool APPROXIMATION_MODE, int ITERATIONS = 8>
inline void calculate_left_shift(const uint shift_amt) {
#pragma GCC unroll 0
    for (int d = 0; d < ITERATIONS; d++) {
        TTI_SFPLOAD(0,4,3,0);
        TT_SFPSHFT(shift_amt,0,0,1);
        TTI_SFPSTORE(0,4,3,0);
        dst_reg++;
    }
}

Both simply load 32 bit values from Dest, left shift by shift_amt, and store back. Sfpi version does not work when bit 31 (MSB) has to change from 0->1 or 1->0 as a result of shift operation.
See this issue for more details: #13415

To Reproduce
Use this test:

from loguru import logger
import random
import pytest
import torch
import ttnn

from tests.ttnn.utils_for_testing import assert_with_pcc
from tests.ttnn.python_api_testing.sweep_tests import ttnn_ops


def run_bitwise_left_shift_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits, device):
    torch.manual_seed(data_seed)

    x = torch.randint(-2147483647, 2147483648, input_shape[0], dtype=torch.int32)
    
    changebit = 31 - shift_bits
    exludebit_mask = torch.bitwise_not(torch.tensor(2**changebit).to(torch.int32))
    includebit_mask = torch.tensor(2**changebit).to(torch.int32)

    x = torch.where(x < 0, torch.bitwise_and(x, exludebit_mask), torch.bitwise_or(x, includebit_mask))
    # Uncomment the line bellow and comment the line above for the unit test to pass
    #x = torch.where(x < 0, torch.bitwise_or(x, includebit_mask), torch.bitwise_and(x, exludebit_mask))
    
    try:
        # get ref result
        ref_value = torch.bitwise_left_shift(x, shift_bits)
        
        tt_x = ttnn_ops.setup_ttnn_tensor(x, device, dlayout[0], in_mem_config[0], dtype[0])
        
        tt_result = ttnn.bitwise_left_shift(tt_x, shift_bits)
        tt_result = ttnn_ops.ttnn_tensor_to_torch(tt_result, output_mem_config[0])

    except Exception as e:
        logger.warning(f"Operation execution crashed")
        raise e

    assert len(tt_result.shape) == len(ref_value.shape)
    assert tt_result.shape == ref_value.shape
    assert_with_pcc(ref_value, tt_result, 0.99)


test_sweep_args = [
    (
        [(32, 96)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.DRAM_MEMORY_CONFIG],
        [ttnn.L1_MEMORY_CONFIG],
        17799073,
        1,
    ),
    (
        [(3, 160, 224)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.L1_MEMORY_CONFIG],
        [ttnn.DRAM_MEMORY_CONFIG],
        3121221,
        11,
    ),
    (
        [(6, 6, 224, 256)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.DRAM_MEMORY_CONFIG],
        [ttnn.DRAM_MEMORY_CONFIG],
        10286194,
        23,
    ),
    (
        [(6, 10, 256, 256)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.L1_MEMORY_CONFIG],
        [ttnn.L1_MEMORY_CONFIG],
        10286194,
        30,
    ),
]

@pytest.mark.parametrize(
    "input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits",
    (test_sweep_args),
)
def test_bitwise_left_shift(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits, device):
    run_bitwise_left_shift_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits, device)

Expected behavior
Sfpi version of the LLK should pass the test as well.

@rdjogoTT rdjogoTT added bug Something isn't working LLK labels Nov 27, 2024
VirdhatchaniKN added a commit that referenced this issue Dec 1, 2024
### Ticket
#13415

### Problem description
Update left shift

### What's changed
Updated file as mentioned in Issue
#15514.

### Checklist
- [x] [Post commit
CI](https://github.com/tenstorrent/tt-metal/actions/runs/12074417574)
passes
- [x] [Blackhole post-commit
tests](https://github.com/tenstorrent/tt-metal/actions/runs/12074418588)
passes
@cmaryanTT
Copy link

@VirdhatchaniKN - I see you referenced this in your changes. Can you please provide an update here.

@VirdhatchaniKN
Copy link
Contributor

@VirdhatchaniKN - I see you referenced this in your changes. Can you please provide an update here.

I have merged the code that is mentioned in this issue, which is a temporary fix to the main branch. However, I believe that the issue is with the sfpi as the SFPU instructions work.

@prajaramanTT
Copy link

@rdjogoTT @VirdhatchaniKN is this still an open issue ? If not, can you please close this ? Thanks

@eyonland
Copy link
Contributor

@VirdhatchaniKN is calling these instructions directly so it is not a blocker. However it would be great to have the sfpi version updated.

@nathan-TT
Copy link
Contributor

nathan-TT commented Feb 4, 2025

The code changed by @VirdhatchaniKN 's fix is not the same as mentioned above. it is replacing:

        vInt val = dst_reg[0];
        vInt v = val;

        val = val << shift_amt;
        val = setsgn(val, v);
        dst_reg[0] = val;

which is intentionally preserving the sign bit from the original value. The replacement code is not equivalent.

@rdjogoTT
Copy link
Contributor Author

rdjogoTT commented Feb 4, 2025

@nathan-TT it is true that the code is not equivalent. @VirdhatchaniKN was trying to get rid of the part that preserves the sign bit of the original value by writing:

        vInt val = dst_reg[0];
        val = val << shift_amt;
        dst_reg[0] = val;

but that was not producing correct results. So I was asked to help debug and when I wrote it out using TTI instructions as:

        TTI_SFPLOAD(0,4,3,0);
        TT_SFPSHFT(shift_amt,0,0,1);
        TTI_SFPSTORE(0,4,3,0);

I found that it worked.

I thought that these two code blocks should provide the same functionality, so that's why I made this issue.

@nathan-TT
Copy link
Contributor

update: The observed behavior is because of juggling 2's complement and sign-magnitude data representations. It is unclear what the 'right' answer should be.

@rdjogoTT
Copy link
Contributor Author

rdjogoTT commented Feb 7, 2025

Can you please explain a bit how 2's complement and sign-magnitude are handled differently, for future reference if something similar pops up?

@nathan-TT
Copy link
Contributor

wait a moment, signed integer overflow is undefined behavior -- why are you expecting any particular behavior in these circumstances?

@nathan-TT
Copy link
Contributor

sigh, in C++17 such shifts were UB, in C++20, 2's complement representation is required and IIUC the shift is not UB. But the issue we have is that we use sign-magnitude internal representation, and there appear to be inconsistencies in how that is handled. This will take some care unraveling.

@nathan-TT
Copy link
Contributor

In investigating this I implemented .ttinsn pseudo, which makes disassembling synthesized sfpu insts better. The compiler landed at #17800, but the llk headers need updating too: tenstorrent/tt-llk#22

Documenting internal data type representations to progress this issue itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working LLK P2
Projects
None yet
Development

No branches or pull requests

7 participants