Try lowering `aten._scaled_dot_product_flash_attention` #569

jdh8 · 2024-12-08T10:48:29Z

Ticket

Resolves aten._scaled_dot_product_flash_attention.default #541

Problem description

Convert aten._scaled_dot_product_flash_attention to a series of ops. Future goal might be implementing it as a composite kernel op instead.

The source op is functionally equivalent to its high-level counterpart:
https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

What's changed

jerrysky3 · 2024-12-09T03:53:08Z

ttnn seems to already have the corresponding API https://docs.tenstorrent.com/ttnn/latest/ttnn/api/ttnn.transformer.scaled_dot_product_attention.html, can it be used here?

swimdi · 2024-12-09T07:32:19Z

There's its coressponding test

https://github.com/tenstorrent/tt-metal/blob/main/tests/tt_eager/python_api_testing/unit_testing/misc/test_scaled_dot_product_attention.py

ayerofieiev-tt · 2024-12-10T01:43:07Z

@jdh8 please let us know whatever question you got, I will help to connect with the right stakeholder to resolve this fast.

jdh8 · 2024-12-10T02:26:42Z

tests/lowering/misc/test_scaled_dot_product_attention.py

+        ((1, 12, 50, 64), False),
+        ((1, 16, 1370, 80), False),
+        ((1, 12, 1, 64), False),
+        ((1, 12, 4, 64), True),


Inferred 0 batch size 🤔

FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape0-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 16, 197, 64] vs list(actual_pytorch_result.shape)=[0, 16, 197, 64] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape1-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 12, 197, 64] vs list(actual_pytorch_result.shape)=[0, 12, 197, 64] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape2-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 16, 50, 64] vs list(actual_pytorch_result.shape)=[0, 16, 50, 64] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape3-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 8, 4096, 40] vs list(actual_pytorch_result.shape)=[0, 8, 4096, 40] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape4-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 8, 1024, 80] vs list(actual_pytorch_result.shape)=[0, 8, 1024, 80] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape5-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 8, 256, 160] vs list(actual_pytorch_result.shape)=[0, 8, 256, 160] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape6-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 8, 64, 160] vs list(actual_pytorch_result.shape)=[0, 8, 64, 160] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape7-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 12, 50, 64] vs list(actual_pytorch_result.shape)=[0, 12, 50, 64] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape8-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 16, 1370, 80] vs list(actual_pytorch_result.shape)=[0, 16, 1370, 80] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape9-False] - AssertionError: list(expected_pytorch_result.shape)=[1, 12, 1, 64] vs list(actual_pytorch_result.shape)=[0, 12, 1, 64] FAILED tests/lowering/misc/test_scaled_dot_product_attention.py::test_sdpa[input_shape10-True] - AssertionError: list(expected_pytorch_result.shape)=[1, 12, 4, 64] vs list(actual_pytorch_result.shape)=[0, 12, 4, 64] =================================================================== 11 failed in 14.22s ==================================================================== Device | INFO | Closing user mode device drivers

def assert_with_pcc(expected_pytorch_result, actual_pytorch_result, pcc=0.999): > assert list(expected_pytorch_result.shape) == list( actual_pytorch_result.shape ), f"list(expected_pytorch_result.shape)={list(expected_pytorch_result.shape)} vs list(actual_pytorch_result.shape)={list(actual_pytorch_result.shape)}" E AssertionError: list(expected_pytorch_result.shape)=[1, 12, 4, 64] vs list(actual_pytorch_result.shape)=[0, 12, 4, 64]

Is this still an issue?

I just confirmed that this issue still lingers. I filed tenstorrent/tt-metal#16021 to keep track on this.

ayerofieiev-tt · 2024-12-12T23:29:44Z

torch_ttnn/passes/lowering/to_tt_pass.py

+                    {"is_causal": is_causal},
+                )
+
+            return select(*args[3:])


need to fire a ticket for unsupported cases?

I'm marking this issue as a feature request tenstorrent/tt-metal#16022. No input variation now has nonzero dropout_p yet. It's still good to keep an eye.

Doesn't this logic drop the attention mask, which must be provided if is_causal == False

aten._scaled_dot_product_flash_attention does not provide attention mask as far as I know. I have not yet found better documentation. Please correct me if I am wrong.

I see. I am unfamiliar with the aten API. My understanding of the op comes from the functional API https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

…ult` by removing its blocklist

jdh8 requested review from swimdi and ayerofieiev-tt December 8, 2024 10:48

jdh8 self-assigned this Dec 8, 2024

jdh8 force-pushed the jdh8/scaled_dot_product_flash_attention branch from 0f79229 to 7df6ee4 Compare December 10, 2024 02:25

jdh8 commented Dec 10, 2024

View reviewed changes

jdh8 marked this pull request as ready for review December 10, 2024 02:28

ayerofieiev-tt reviewed Dec 12, 2024

View reviewed changes

ayerofieiev-tt approved these changes Dec 12, 2024

View reviewed changes

jdh8 force-pushed the jdh8/scaled_dot_product_flash_attention branch from 7df6ee4 to ea1841f Compare December 13, 2024 17:48

This was referenced Dec 13, 2024

[Bug Report] ttnn.transformer.scaled_dot_product_attention infers 0 batch size from 1 tenstorrent/tt-metal#16021

Open

[Feature Request] Let ttnn.transformer.scaled_dot_product_attention support dropout_p (dropout probability) tenstorrent/tt-metal#16022

Open

jdh8 added 2 commits December 20, 2024 09:00

Try lowering aten._scaled_dot_product_flash_attention

2a5b989

Fix passing arguments to ttnn.transformer.scaled_dot_product_attention

866703f

jdh8 force-pushed the jdh8/scaled_dot_product_flash_attention branch from ea1841f to 866703f Compare December 20, 2024 09:01

Enable conversion from `aten._scaled_dot_product_flash_attention.defa…

34e32af

…ult` by removing its blocklist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try lowering `aten._scaled_dot_product_flash_attention` #569

Try lowering `aten._scaled_dot_product_flash_attention` #569

jdh8 commented Dec 8, 2024 •

edited

Loading

jerrysky3 commented Dec 9, 2024

swimdi commented Dec 9, 2024

ayerofieiev-tt commented Dec 10, 2024

jdh8 Dec 10, 2024 •

edited

Loading

ayerofieiev-tt Dec 12, 2024

jdh8 Dec 13, 2024

ayerofieiev-tt Dec 12, 2024

jdh8 Dec 13, 2024

cglagovichTT Dec 17, 2024

jdh8 Dec 18, 2024

cglagovichTT Dec 18, 2024

Try lowering aten._scaled_dot_product_flash_attention #569

Are you sure you want to change the base?

Try lowering aten._scaled_dot_product_flash_attention #569

Conversation

jdh8 commented Dec 8, 2024 • edited Loading

Ticket

Problem description

What's changed

jerrysky3 commented Dec 9, 2024

swimdi commented Dec 9, 2024

ayerofieiev-tt commented Dec 10, 2024

jdh8 Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

ayerofieiev-tt Dec 12, 2024

Choose a reason for hiding this comment

jdh8 Dec 13, 2024

Choose a reason for hiding this comment

ayerofieiev-tt Dec 12, 2024

Choose a reason for hiding this comment

jdh8 Dec 13, 2024

Choose a reason for hiding this comment

cglagovichTT Dec 17, 2024

Choose a reason for hiding this comment

jdh8 Dec 18, 2024

Choose a reason for hiding this comment

cglagovichTT Dec 18, 2024

Choose a reason for hiding this comment

Try lowering `aten._scaled_dot_product_flash_attention` #569

Try lowering `aten._scaled_dot_product_flash_attention` #569

jdh8 commented Dec 8, 2024 •

edited

Loading

jdh8 Dec 10, 2024 •

edited

Loading