SFPU shift operator issue when using sfpi #15514

rdjogoTT · 2024-11-27T21:01:43Z

Describe the bug
The following SFPU llk, implemented using sfpi, does not work:

template <bool APPROXIMATION_MODE, int ITERATIONS = 8>
inline void calculate_left_shift(const uint shift_amt) {
#pragma GCC unroll 0
    for (int d = 0; d < ITERATIONS; d++) {
        vInt val = dst_reg[0];
        val = val << shift_amt;
        dst_reg[0] = val;

        dst_reg++;
    }
}

But this one does, and it uses SFPU instructions directly:

template <bool APPROXIMATION_MODE, int ITERATIONS = 8>
inline void calculate_left_shift(const uint shift_amt) {
#pragma GCC unroll 0
    for (int d = 0; d < ITERATIONS; d++) {
        TTI_SFPLOAD(0,4,3,0);
        TT_SFPSHFT(shift_amt,0,0,1);
        TTI_SFPSTORE(0,4,3,0);
        dst_reg++;
    }
}

Both simply load 32 bit values from Dest, left shift by shift_amt, and store back. Sfpi version does not work when bit 31 (MSB) has to change from 0->1 or 1->0 as a result of shift operation.
See this issue for more details: #13415

To Reproduce
Use this test:

from loguru import logger
import random
import pytest
import torch
import ttnn

from tests.ttnn.utils_for_testing import assert_with_pcc
from tests.ttnn.python_api_testing.sweep_tests import ttnn_ops


def run_bitwise_left_shift_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits, device):
    torch.manual_seed(data_seed)

    x = torch.randint(-2147483647, 2147483648, input_shape[0], dtype=torch.int32)
    
    changebit = 31 - shift_bits
    exludebit_mask = torch.bitwise_not(torch.tensor(2**changebit).to(torch.int32))
    includebit_mask = torch.tensor(2**changebit).to(torch.int32)

    x = torch.where(x < 0, torch.bitwise_and(x, exludebit_mask), torch.bitwise_or(x, includebit_mask))
    # Uncomment the line bellow and comment the line above for the unit test to pass
    #x = torch.where(x < 0, torch.bitwise_or(x, includebit_mask), torch.bitwise_and(x, exludebit_mask))
    
    try:
        # get ref result
        ref_value = torch.bitwise_left_shift(x, shift_bits)
        
        tt_x = ttnn_ops.setup_ttnn_tensor(x, device, dlayout[0], in_mem_config[0], dtype[0])
        
        tt_result = ttnn.bitwise_left_shift(tt_x, shift_bits)
        tt_result = ttnn_ops.ttnn_tensor_to_torch(tt_result, output_mem_config[0])

    except Exception as e:
        logger.warning(f"Operation execution crashed")
        raise e

    assert len(tt_result.shape) == len(ref_value.shape)
    assert tt_result.shape == ref_value.shape
    assert_with_pcc(ref_value, tt_result, 0.99)


test_sweep_args = [
    (
        [(32, 96)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.DRAM_MEMORY_CONFIG],
        [ttnn.L1_MEMORY_CONFIG],
        17799073,
        1,
    ),
    (
        [(3, 160, 224)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.L1_MEMORY_CONFIG],
        [ttnn.DRAM_MEMORY_CONFIG],
        3121221,
        11,
    ),
    (
        [(6, 6, 224, 256)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.DRAM_MEMORY_CONFIG],
        [ttnn.DRAM_MEMORY_CONFIG],
        10286194,
        23,
    ),
    (
        [(6, 10, 256, 256)],
        [ttnn.int32],
        [ttnn.TILE_LAYOUT],
        [ttnn.L1_MEMORY_CONFIG],
        [ttnn.L1_MEMORY_CONFIG],
        10286194,
        30,
    ),
]

@pytest.mark.parametrize(
    "input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits",
    (test_sweep_args),
)
def test_bitwise_left_shift(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits, device):
    run_bitwise_left_shift_tests(input_shape, dtype, dlayout, in_mem_config, output_mem_config, data_seed, shift_bits, device)

Expected behavior
Sfpi version of the LLK should pass the test as well.

The text was updated successfully, but these errors were encountered:

### Ticket #13415 ### Problem description Update left shift ### What's changed Updated file as mentioned in Issue #15514. ### Checklist - [x] [Post commit CI](https://github.com/tenstorrent/tt-metal/actions/runs/12074417574) passes - [x] [Blackhole post-commit tests](https://github.com/tenstorrent/tt-metal/actions/runs/12074418588) passes

cmaryanTT · 2024-12-20T15:43:04Z

@VirdhatchaniKN - I see you referenced this in your changes. Can you please provide an update here.

VirdhatchaniKN · 2024-12-23T13:49:37Z

@VirdhatchaniKN - I see you referenced this in your changes. Can you please provide an update here.

I have merged the code that is mentioned in this issue, which is a temporary fix to the main branch. However, I believe that the issue is with the sfpi as the SFPU instructions work.

prajaramanTT · 2025-01-13T20:24:12Z

@rdjogoTT @VirdhatchaniKN is this still an open issue ? If not, can you please close this ? Thanks

eyonland · 2025-01-27T15:34:01Z

@VirdhatchaniKN is calling these instructions directly so it is not a blocker. However it would be great to have the sfpi version updated.

nathan-TT · 2025-02-04T19:31:35Z

The code changed by @VirdhatchaniKN 's fix is not the same as mentioned above. it is replacing:

        vInt val = dst_reg[0];
        vInt v = val;

        val = val << shift_amt;
        val = setsgn(val, v);
        dst_reg[0] = val;

which is intentionally preserving the sign bit from the original value. The replacement code is not equivalent.

rdjogoTT · 2025-02-04T19:43:44Z

@nathan-TT it is true that the code is not equivalent. @VirdhatchaniKN was trying to get rid of the part that preserves the sign bit of the original value by writing:

        vInt val = dst_reg[0];
        val = val << shift_amt;
        dst_reg[0] = val;

but that was not producing correct results. So I was asked to help debug and when I wrote it out using TTI instructions as:

        TTI_SFPLOAD(0,4,3,0);
        TT_SFPSHFT(shift_amt,0,0,1);
        TTI_SFPSTORE(0,4,3,0);

I found that it worked.

I thought that these two code blocks should provide the same functionality, so that's why I made this issue.

nathan-TT · 2025-02-06T19:39:44Z

update: The observed behavior is because of juggling 2's complement and sign-magnitude data representations. It is unclear what the 'right' answer should be.

rdjogoTT · 2025-02-07T15:59:40Z

Can you please explain a bit how 2's complement and sign-magnitude are handled differently, for future reference if something similar pops up?

nathan-TT · 2025-02-09T17:45:37Z

wait a moment, signed integer overflow is undefined behavior -- why are you expecting any particular behavior in these circumstances?

nathan-TT · 2025-02-10T16:26:33Z

sigh, in C++17 such shifts were UB, in C++20, 2's complement representation is required and IIUC the shift is not UB. But the issue we have is that we use sign-magnitude internal representation, and there appear to be inconsistencies in how that is handled. This will take some care unraveling.

nathan-TT · 2025-02-28T13:46:11Z

In investigating this I implemented .ttinsn pseudo, which makes disassembling synthesized sfpu insts better. The compiler landed at #17800, but the llk headers need updating too: tenstorrent/tt-llk#22

Documenting internal data type representations to progress this issue itself

rdjogoTT added bug Something isn't working LLK labels Nov 27, 2024

This was referenced Nov 28, 2024

ttnn.bitwise_left_shift gives results different from the torch version when the bit used to represent a sign differs from the bit on position 31 - shift_bits #13415

Closed

#13415: Update left shift #15538

Merged

eyonland assigned pgkeller Jan 27, 2025

eyonland added P1 P2 and removed P1 labels Jan 27, 2025

pgkeller assigned nathan-TT Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFPU shift operator issue when using sfpi #15514

SFPU shift operator issue when using sfpi #15514

rdjogoTT commented Nov 27, 2024

cmaryanTT commented Dec 20, 2024

VirdhatchaniKN commented Dec 23, 2024

prajaramanTT commented Jan 13, 2025

eyonland commented Jan 27, 2025

nathan-TT commented Feb 4, 2025 •

edited

Loading

rdjogoTT commented Feb 4, 2025

nathan-TT commented Feb 6, 2025

rdjogoTT commented Feb 7, 2025

nathan-TT commented Feb 9, 2025

nathan-TT commented Feb 10, 2025

nathan-TT commented Feb 28, 2025

SFPU shift operator issue when using sfpi #15514

SFPU shift operator issue when using sfpi #15514

Comments

rdjogoTT commented Nov 27, 2024

cmaryanTT commented Dec 20, 2024

VirdhatchaniKN commented Dec 23, 2024

prajaramanTT commented Jan 13, 2025

eyonland commented Jan 27, 2025

nathan-TT commented Feb 4, 2025 • edited Loading

rdjogoTT commented Feb 4, 2025

nathan-TT commented Feb 6, 2025

rdjogoTT commented Feb 7, 2025

nathan-TT commented Feb 9, 2025

nathan-TT commented Feb 10, 2025

nathan-TT commented Feb 28, 2025

nathan-TT commented Feb 4, 2025 •

edited

Loading