[WIP][DRAFT] Use MeshDevice 1x1 instead of Device #18470

sminakov-tt · 2025-02-28T04:07:01Z

Ticket

Link to Github Issue

Problem description

Provide context for the problem.

What's changed

Describe the approach used to solve the problem.
Summarize the changes made and its impact.

Checklist

All post commit CI passes
Blackhole Post commit CI passes (if applicable)
Model regression CI passes (if applicable)
Device performance regression CI passes (if applicable)
(For models and ops writers) Full new models tests CI passes (if applicable)
New/Existing tests provide coverage for changes

… into sminakov/all-mesh2

This reverts commit e092c0e.

This reverts commit 5281a83.

This reverts commit b12c32c.

This reverts commit 8f4caf4.

models/demos/metal_BERT_large_11/tests/test_bert_batch_dram.py

…d add to pipeline select (#18348) ### Ticket #17496 ### Problem description We did not have TG nightly tests as a selectable pipeline in choose your pipeline. This was inconvenient for developers. ### What's changed * Split out nightly tests workflow into wrapper + impl * Add impl to choose your pipeline ### Checklist - [ ] [All post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml) CI passes - [ ] [Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml) CI passes (if applicable) - [ ] [Model regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) CI passes (if applicable) - [ ] [Device performance regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) CI passes (if applicable) - [ ] **(For models and ops writers)** Full [new models tests](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) CI passes (if applicable) - [ ] New/Existing tests provide coverage for changes --------- Co-authored-by: Andrew Fuller <[email protected]>

### Ticket N/A ### Problem description Expose mesh events to TTNN, further integration of TT-distributed with TTNN. ### What's changed Related cleanups bundled in this PR: * Make `EnqueueRecordEvent`, `EnqueueRecordEventToHost` return `MeshEvent` by value, instead of accepting `std::shared_ptr<MeshEvent>` and mutating the reference internally. * Make `EnqueueWaitForEvent` and `EventSynchronize` accept `MeshEvent` by constant reference. * Expose `MeshCoordinateRange` to TTNN - this is needed for the `MeshEvent` APIs. ### Checklist - [X] [All post commit](https://github.com/tenstorrent/tt-metal/actions/runs/13577312078) - [X] New/Existing tests provide coverage for changes - ran `MeshEvents` tests from `distributed_unit_tests`

### Problem description It is not easy to customize different neural networks. ### What's changed Now all created NNs are immutable by desing. Added operator () to the base class. All should derive. Id it is possible you should always store std::shared_ptr<ModuleBase> instead of your class. ### Checklist - [x] [All post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml) CI passes - [x] New/Existing tests provide coverage for changes

… into sminakov/all-mesh2

This reverts commit 351edaf.

This reverts commit 01ca8ac.

…17790) ### Ticket Link to Github Issue #15136 ### Problem description Limited parallelization for tilize/untilize along one dimension, which affects the performance ### What's changed This PR adds parallelization along last two dims for tilize/untilize with padding. For large tensors, the operations use more cores and improve the perf by around 30 times for the tests added. The average device samples/s is more than 3.5 times larger for models like vgg11 and vgg16. It is also improved for some Bert tiny tests. ### Checklist - [x] [All post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml) CI passes https://github.com/tenstorrent/tt-metal/actions/runs/13243407566 - [ ] [Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml) CI passes (if applicable) - [ ] [Model regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) CI passes (if applicable) - [x] [Device performance regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) CI passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/13249354224 - [ ] **(For models and ops writers)** Full [new models tests](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) CI passes (if applicable) - [ ] New/Existing tests provide coverage for changes

This change updates the sliding window config tensor generation to use structs instead of tuples to improve readibility and type safety.

### Ticket None ### Problem description A dependency on ASIO was introduced in #18395, but we only "accidentally" have it available because it (sometimes) comes along with other Boost headers. CI was fine because it happened to be there. But in some envs it isn't there (or too old). ### What's changed Declared our dependency on it.

This is in addition to the existing LICENSE file to help clarify Tenstorrent's position and understanding of specific aspects of the existing LICENSE Signed-off-by: John 'Warthog9' Hawley <[email protected]>

Reverts #17828 Post-commit : https://github.com/tenstorrent/tt-metal/actions/runs/13592890232

… into sminakov/all-mesh2

sminakov-tt added 20 commits February 25, 2025 21:29

Always use MeshDevice instead of SingleDevice

31e0fb0

Merge branch 'jchu/ttnn-integration-with-mesh' into sminakov/all-mesh2

b300883

Merge remote-tracking branch 'origin/jchu/ttnn-integration-with-mesh'…

5d1daef

… into sminakov/all-mesh2

Fixes

cf26c54

Change tests

5281a83

Variety of fixes

4f1d5be

Expose constants from MeshDevice

6156980

Added multi host buffer support

e855241

Creation fixes

45fe6e8

Remove program cache checks

e092c0e

Merge branch 'jchu/ttnn-integration-with-mesh' into sminakov/all-mesh2

9d6a374

Revert "Remove program cache checks"

8f4caf4

This reverts commit e092c0e.

Revert "Change tests"

b12c32c

This reverts commit 5281a83.

Tests fix

f5c70d3

to_device fix

5def98f

Cache test fix

e57bab4

Partial tests fixup

1e76525

Revert "Revert "Change tests""

01ca8ac

This reverts commit b12c32c.

Revert "Revert "Remove program cache checks""

351edaf

This reverts commit 8f4caf4.

Expose num_program_cache_entries

6ef2fd8

ayerofieiev-tt reviewed Feb 28, 2025

View reviewed changes

models/demos/metal_BERT_large_11/tests/test_bert_batch_dram.py Outdated Show resolved Hide resolved

tt-rkim and others added 9 commits February 28, 2025 16:46

Merge remote-tracking branch 'origin/jchu/ttnn-integration-with-mesh'…

d298eef

… into sminakov/all-mesh2

Revert "Revert "Revert "Remove program cache checks"""

8475369

This reverts commit 351edaf.

Revert "Revert "Revert "Change tests"""

4de44fe

This reverts commit 01ca8ac.

Refactor sliding window shard boundary and tensor metadata types

3971289

This change updates the sliding window config tensor generation to use structs instead of tuples to improve readibility and type safety.

warthog9 and others added 14 commits February 28, 2025 16:41

Adding LICENSE_understanding.txt

ce17124

This is in addition to the existing LICENSE file to help clarify Tenstorrent's position and understanding of specific aspects of the existing LICENSE Signed-off-by: John 'Warthog9' Hawley <[email protected]>

[skip ci] Update Yolov4 model README.md (#18526)

719cbcb

Fix tensor deallocate test

d3a327f

Fix move

1393403

Fix group_attn_matmul

8aad141

Fix bad merge

83f892d

Comment out buffer pages len check

370c5e1

Revert "#17687: Add data_type checker" (#18503)

b309929

Reverts #17828 Post-commit : https://github.com/tenstorrent/tt-metal/actions/runs/13592890232

Merge remote-tracking branch 'origin/jchu/ttnn-integration-with-mesh'…

220bc24

… into sminakov/all-mesh2

Merge remote-tracking branch 'origin/main' into sminakov/all-mesh2

76f8840

Fix deserialization crash

575b35e

Fix tensor allocation, convert reads and writes to support mesh device

d82c0c1

Fix lost tile size in to_layout for MeshDevice

e61f604

Add downcast to MeshDevice

3e20628

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][DRAFT] Use MeshDevice 1x1 instead of Device #18470

[WIP][DRAFT] Use MeshDevice 1x1 instead of Device #18470

sminakov-tt commented Feb 28, 2025

[WIP][DRAFT] Use MeshDevice 1x1 instead of Device #18470

Are you sure you want to change the base?

[WIP][DRAFT] Use MeshDevice 1x1 instead of Device #18470

Conversation

sminakov-tt commented Feb 28, 2025

Ticket

Problem description

What's changed

Checklist