Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime support for CPU hoist ops #2152

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

vwellsTT
Copy link
Contributor

@vwellsTT vwellsTT commented Feb 7, 2025

Problem description

Add runtime support for executing hoisted CPU ops.

What's changed

Add mechanism for reading + opening dylibs from flatbuffer, executing CPU ops (by executing proper function within a given dylib).

Checklist

@vwellsTT vwellsTT changed the title copy changes from e2e branch Runtime support for CPU hoist ops Feb 7, 2025
vwellsTT added a commit that referenced this pull request Feb 19, 2025
### Problem description
This PR adds support for CPU hoisting changes to our TTNNToFlatbuffer
translation.

### What's changed
This PR introduces new flatbuffer fields for CPU hoisted ops. It hooks
up the TTNNToFlatbuffer pass to properly utilize passes introduced in
other PRs s.t. a CPU dylib can be embedded in our flatbuffer.

Note: this PR is pretty much impossible to test standalone,
unfortunately. 2 following PRs will be strictly dependent on this PR
merging:
1. #2148 this PR adds earlier
support for hoisting in the TTNNPipeline for TTIRToTTNN. However, this
generates IR that TTNNToFlatbuffer pass cannot parse until this PR
lands. (So this TTNNToFlatbuffer PR is useless without TTIRToTTNN PR,
but TTIRToTTNN breaks ttmlir-translate without this PR 😄 ).
2. #2152 Runtime PR, which
will add support for actually executing new flatbuffers. This is
obviously dependent on this PR + its flatbuffer changes as well.
vwellsTT added a commit that referenced this pull request Feb 20, 2025
### Problem description
This PR adds support for CPU hoisting changes to our TTNNToFlatbuffer
translation.

### What's changed
This PR introduces new flatbuffer fields for CPU hoisted ops. It hooks
up the TTNNToFlatbuffer pass to properly utilize passes introduced in
other PRs s.t. a CPU dylib can be embedded in our flatbuffer.

Note: this PR is pretty much impossible to test standalone,
unfortunately. 2 following PRs will be strictly dependent on this PR
merging:
1. #2148 this PR adds earlier
support for hoisting in the TTNNPipeline for TTIRToTTNN. However, this
generates IR that TTNNToFlatbuffer pass cannot parse until this PR
lands. (So this TTNNToFlatbuffer PR is useless without TTIRToTTNN PR,
but TTIRToTTNN breaks ttmlir-translate without this PR 😄 ).
2. #2152 Runtime PR, which
will add support for actually executing new flatbuffers. This is
obviously dependent on this PR + its flatbuffer changes as well.
@vwellsTT vwellsTT force-pushed the vwells/cpu_hoist_runtime branch from 1ea72a8 to 2fbe384 Compare February 21, 2025 15:11
@vwellsTT vwellsTT force-pushed the vwells/cpu_hoist_runtime branch from 2fbe384 to d947859 Compare February 21, 2025 15:15
@@ -36,21 +37,6 @@ static std::string asJson(void const *fbb, uint8_t const *binarySchema,
return text;
}

static std::vector<uint32_t>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to a common header because I need it for my cpu.cpp file

@@ -141,7 +140,12 @@ class ProgramContext {
const std::unordered_map<uint32_t, ::ttnn::Tensor *> &liveTensors,
const std::vector<uint32_t> &programInputs,
const std::vector<uint32_t> &programOutputs,
::ttnn::MeshDevice *parentMesh);
const DylibHandleMap *programDylibs, ::ttnn::MeshDevice *parentMesh)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo this ctor should be defined in header, since 1. it's trivial and 2. other ctors are defined in header here (via= default admittedly)

sizes[j] = ins->Get(i)->desc()->shape()->Get(j);
}
std::vector<int64_t> strides = common::calculateStride(sizes);
int64_t *sizes_and_strides = new int64_t[2 * rank];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't like using manual new [] calls, but this is annoying because we can't a use an std::vector in this function (since memory needs to outlive this func). I guess we could pass in a 2D vector (one for each of input tensors) we exists in the calling func instead, but that's sort of ugly too imo

@@ -0,0 +1,24 @@
// SPDX-FileCopyrightText: (c) 2024 Tenstorrent AI ULC
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if some existing header here would be more appropriate than a standalone for this--would've said it should maybe go in workarounds.h, but I don't think we have any similar funcs there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant