Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#17477: Adopt ND coordinate system in system mesh, coordinate translation #17926

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

omilyutin-tt
Copy link
Contributor

@omilyutin-tt omilyutin-tt commented Feb 18, 2025

Ticket

#17477

Problem description

TT-distributed needs to adopt ND coordinate system for mesh primitives.

What's changed

Plumbed SimpleMeshShape in SystemMesh, logical to physical coordinate translation mapping.

Checklist

Comment on lines +63 to +67
{1, std::make_pair("device.json", SimpleMeshShape(1, 1))},
{2, std::make_pair("N300.json", SimpleMeshShape(1, 2))},
{8, std::make_pair("T3000.json", SimpleMeshShape(2, 4))},
{32, std::make_pair(is_qg ? "QG.json" : "TG.json", SimpleMeshShape(8, 4))},
{64, std::make_pair("TGG.json", SimpleMeshShape(8, 8))},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping these 2 together to ensure things are in sync.

for (auto& [device_range, program] : programs_) {
if (program_idx) {
TT_ASSERT(sem_size == program.get_sem_size(device, logical_core, core_type));
TT_ASSERT(sem_size == program.get_sem_size(mesh_device.get(), logical_core, core_type));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to expose get_device_index directly any more.

@@ -112,7 +115,8 @@ std::vector<chip_id_t> SystemMesh::Impl::get_mapped_physical_device_ids(const Me
auto line_coords = MeshDeviceView::get_line_coordinates(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfjchu is 1D case fundamentally special? Below, the 2D code attempts to "rotate" the shape and the constraint is that every dim has to fit inside SystemMesh. In ND case, do we keep the same rotating logic + keep the same special case for 1D? Would 2D (or any reduced dimensionality) be a special case too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our WH mesh is 2D physically connected. So for example, when a user requests a 1x8 on a 2x4 physically connected T3000 mesh, we don't constrain the line to be bounded by the max-dim on row/col (i.e. 2, 4). Instead, we do return a line back to the user that snakes the 2D physical mesh and we follow a convention where we snake from top-left going right and then left.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So we need the special case for 1D representation as there are dependencies, and in the future we might do something similar for collapsing 3D into 2D and so on.

Copy link
Contributor

@cfjchu cfjchu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 . Are these the remaining set of touch points that still use MeshShape? Can we swap out MeshShape and SimpleMeshShape after this?

@omilyutin-tt
Copy link
Contributor Author

Are these the remaining set of touch points that still use MeshShape? Can we swap out MeshShape and SimpleMeshShape after this?

There is some more stuff in MeshWorkload for computing ranges of devices, MeshBuffer read/write APIs, and MeshDeviceView... Overall, relatively straightforward, but some algorithms need to be generalized to ND, so it makes it challenging to do in one giant PR.

@tt-asaigal
Copy link
Contributor

Are these the remaining set of touch points that still use MeshShape? Can we swap out MeshShape and SimpleMeshShape after this?

There is some more stuff in MeshWorkload for computing ranges of devices, MeshBuffer read/write APIs, and MeshDeviceView... Overall, relatively straightforward, but some algorithms need to be generalized to ND, so it makes it challenging to do in one giant PR.

I have changes in flight for MeshTrace that rely on the 2D intersection, adjacency, etc. algorithms. We'll also need these for heterogenous RTA support. My preference would be to get functionality in for these features, see what the additional touch-points are, and then add MeshCoordinate and MeshShape support. Does this sound reasonable to you guys?

@omilyutin-tt
Copy link
Contributor Author

My preference would be to get functionality in for these features, see what the additional touch-points are, and then add MeshCoordinate and MeshShape support. Does this sound reasonable to you guys?

Yep, sg.

@omilyutin-tt omilyutin-tt merged commit 911e5c8 into main Feb 18, 2025
236 of 252 checks passed
@omilyutin-tt omilyutin-tt deleted the omilyutin/nd-adopt1 branch February 18, 2025 21:43
dgomezTT pushed a commit that referenced this pull request Feb 19, 2025
…tion (#17926)

### Ticket
#17477

### Problem description
TT-distributed needs to adopt ND coordinate system for mesh primitives.

### What's changed
Plumbed `SimpleMeshShape` in `SystemMesh`, logical to physical
coordinate translation mapping.

### Checklist
- [X] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/runs/13395057290)
- [X] New/Existing tests provide coverage for changes
hschoi4448 pushed a commit that referenced this pull request Feb 20, 2025
…tion (#17926)

### Ticket
#17477

### Problem description
TT-distributed needs to adopt ND coordinate system for mesh primitives.

### What's changed
Plumbed `SimpleMeshShape` in `SystemMesh`, logical to physical
coordinate translation mapping.

### Checklist
- [X] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/runs/13395057290)
- [X] New/Existing tests provide coverage for changes
TT-billteng pushed a commit that referenced this pull request Feb 21, 2025
…tion (#17926)

### Ticket
#17477

### Problem description
TT-distributed needs to adopt ND coordinate system for mesh primitives.

### What's changed
Plumbed `SimpleMeshShape` in `SystemMesh`, logical to physical
coordinate translation mapping.

### Checklist
- [X] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/runs/13395057290)
- [X] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants