-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime support for CPU hoist ops #2152
base: main
Are you sure you want to change the base?
Conversation
### Problem description This PR adds support for CPU hoisting changes to our TTNNToFlatbuffer translation. ### What's changed This PR introduces new flatbuffer fields for CPU hoisted ops. It hooks up the TTNNToFlatbuffer pass to properly utilize passes introduced in other PRs s.t. a CPU dylib can be embedded in our flatbuffer. Note: this PR is pretty much impossible to test standalone, unfortunately. 2 following PRs will be strictly dependent on this PR merging: 1. #2148 this PR adds earlier support for hoisting in the TTNNPipeline for TTIRToTTNN. However, this generates IR that TTNNToFlatbuffer pass cannot parse until this PR lands. (So this TTNNToFlatbuffer PR is useless without TTIRToTTNN PR, but TTIRToTTNN breaks ttmlir-translate without this PR 😄 ). 2. #2152 Runtime PR, which will add support for actually executing new flatbuffers. This is obviously dependent on this PR + its flatbuffer changes as well.
### Problem description This PR adds support for CPU hoisting changes to our TTNNToFlatbuffer translation. ### What's changed This PR introduces new flatbuffer fields for CPU hoisted ops. It hooks up the TTNNToFlatbuffer pass to properly utilize passes introduced in other PRs s.t. a CPU dylib can be embedded in our flatbuffer. Note: this PR is pretty much impossible to test standalone, unfortunately. 2 following PRs will be strictly dependent on this PR merging: 1. #2148 this PR adds earlier support for hoisting in the TTNNPipeline for TTIRToTTNN. However, this generates IR that TTNNToFlatbuffer pass cannot parse until this PR lands. (So this TTNNToFlatbuffer PR is useless without TTIRToTTNN PR, but TTIRToTTNN breaks ttmlir-translate without this PR 😄 ). 2. #2152 Runtime PR, which will add support for actually executing new flatbuffers. This is obviously dependent on this PR + its flatbuffer changes as well.
1ea72a8
to
2fbe384
Compare
2fbe384
to
d947859
Compare
@@ -36,21 +37,6 @@ static std::string asJson(void const *fbb, uint8_t const *binarySchema, | |||
return text; | |||
} | |||
|
|||
static std::vector<uint32_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to a common header because I need it for my cpu.cpp file
@@ -141,7 +140,12 @@ class ProgramContext { | |||
const std::unordered_map<uint32_t, ::ttnn::Tensor *> &liveTensors, | |||
const std::vector<uint32_t> &programInputs, | |||
const std::vector<uint32_t> &programOutputs, | |||
::ttnn::MeshDevice *parentMesh); | |||
const DylibHandleMap *programDylibs, ::ttnn::MeshDevice *parentMesh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo this ctor should be defined in header, since 1. it's trivial and 2. other ctors are defined in header here (via= default
admittedly)
sizes[j] = ins->Get(i)->desc()->shape()->Get(j); | ||
} | ||
std::vector<int64_t> strides = common::calculateStride(sizes); | ||
int64_t *sizes_and_strides = new int64_t[2 * rank]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't like using manual new []
calls, but this is annoying because we can't a use an std::vector in this function (since memory needs to outlive this func). I guess we could pass in a 2D vector (one for each of input tensors) we exists in the calling func instead, but that's sort of ugly too imo
@@ -0,0 +1,24 @@ | |||
// SPDX-FileCopyrightText: (c) 2024 Tenstorrent AI ULC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if some existing header here would be more appropriate than a standalone for this--would've said it should maybe go in workarounds.h, but I don't think we have any similar funcs there
Problem description
Add runtime support for executing hoisted CPU ops.
What's changed
Add mechanism for reading + opening dylibs from flatbuffer, executing CPU ops (by executing proper function within a given dylib).
Checklist