-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Hoist Passes to TTIR to TTNN Pipeline #2148
base: main
Are you sure you want to change the base?
Conversation
…e1 2025-02-07 (#2139) This PR uplifts the third_party/tt-metal to the b552fb8ae14236777983e3521a312dd78432f0e1 - Update include locations for small_vector after metal commit b552fb8ae1 --------- Co-authored-by: kmabeeTT <[email protected]> Co-authored-by: brataTT <[email protected]>
### Problem description This PR adds support for CPU hoisting changes to our TTNNToFlatbuffer translation. ### What's changed This PR introduces new flatbuffer fields for CPU hoisted ops. It hooks up the TTNNToFlatbuffer pass to properly utilize passes introduced in other PRs s.t. a CPU dylib can be embedded in our flatbuffer. Note: this PR is pretty much impossible to test standalone, unfortunately. 2 following PRs will be strictly dependent on this PR merging: 1. #2148 this PR adds earlier support for hoisting in the TTNNPipeline for TTIRToTTNN. However, this generates IR that TTNNToFlatbuffer pass cannot parse until this PR lands. (So this TTNNToFlatbuffer PR is useless without TTIRToTTNN PR, but TTIRToTTNN breaks ttmlir-translate without this PR 😄 ). 2. #2152 Runtime PR, which will add support for actually executing new flatbuffers. This is obviously dependent on this PR + its flatbuffer changes as well.
### Problem description This PR adds support for CPU hoisting changes to our TTNNToFlatbuffer translation. ### What's changed This PR introduces new flatbuffer fields for CPU hoisted ops. It hooks up the TTNNToFlatbuffer pass to properly utilize passes introduced in other PRs s.t. a CPU dylib can be embedded in our flatbuffer. Note: this PR is pretty much impossible to test standalone, unfortunately. 2 following PRs will be strictly dependent on this PR merging: 1. #2148 this PR adds earlier support for hoisting in the TTNNPipeline for TTIRToTTNN. However, this generates IR that TTNNToFlatbuffer pass cannot parse until this PR lands. (So this TTNNToFlatbuffer PR is useless without TTIRToTTNN PR, but TTIRToTTNN breaks ttmlir-translate without this PR 😄 ). 2. #2152 Runtime PR, which will add support for actually executing new flatbuffers. This is obviously dependent on this PR + its flatbuffer changes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. Just not sure if it's okay to turn off the canonicalizer.
test/ttmlir/Dialect/TTNN/Transforms/ttnn_modify_signatures_for_dylib_0.mlir
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! Thanks for taking a look through this and adding the functionality :)
) | ||
output_connections[source_node.id] += 1 | ||
# Process the module hierarchy recursively | ||
process_module( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logically, these changes are basically just to make it so we can recursively call process_operations
on nested submodules, since I can't find a graceful way to convert a python OpView
into a Module, annoyingly. But this func was kind of convoluted to start with, so I broke it up into pieces to make it simpler to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big change and requires some in depth testing of of how the resulting graphs look on the frontend since we don't have automated UI tests for this.
Could this be a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot merge the rest of the PR without some such change, since it will fail tt-explorer CI test. I think this change to tt-explorer could merge before the rest of the PR in theory, but I'm not sure that makes sense, since w/o other changes, we won't really be exercising new logic in meaningful ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's just hold of until Monday.
) | ||
output_connections[source_node.id] += 1 | ||
# Process the module hierarchy recursively | ||
process_module( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big change and requires some in depth testing of of how the resulting graphs look on the frontend since we don't have automated UI tests for this.
Could this be a separate PR?
Problem description
Modify TTIR -> TTNN pipelines to handle hoisting
What's changed
Modify TTNN pipeline to split out hoisted code into new CPUModule + other code into DeviceModule, run 2 separate pipelines on each of these. Also make small change to TTIR Pipeline for linalg -> LLVM s.t. we don't run canonicalizer anymore (since this doesn't work well w/o return values).
Checklist
ttmlir-translate
tests will break.