Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDM Packet Header Optimization: Simplify packet size storage and access (+other misc low-level optimizations) #17724

Conversation

SeanNijjar
Copy link
Contributor

@SeanNijjar SeanNijjar commented Feb 7, 2025

Ticket

#17686

What's changed

Several low-level optimizations in EDM fabric:

  • Simplify packet size storage and access

    • promote to "top-level" of packet to remove conditionality previously needed to get size info from packet
    • NOTE: packet size now specifies PAYLOAD SIZE ONLY!!! The header size must be implicitly added by fabric.
      • net this is still fine because we had to previous subtract header size when writing out to noc.
  • Migrate eth_send_packet calls to new version that takes size in bytes.

    • This version avoids a number of shift operations that were present in the previously used version.
  • Add and using new eth write remote reg (eth_write_remote_reg_no_txq_check) that doesn't have conditional context switch in body of function

Extra functionality: Added NOC_UNICAST_INLINE_WRITE eth packet command type to address a regression as a result of the above change (if the command type wasn't added)

Checklist

@SeanNijjar SeanNijjar force-pushed the snijjar/optimize-edm-fabric-packet-size-repr-issue-17686 branch from 960bbde to 313f88a Compare February 7, 2025 17:53
@@ -7,6 +7,7 @@
#include <tt-metalium/global_semaphore.hpp>
#include "cpp/ttnn/global_semaphore.hpp"
#include "pybind11/pybind11.h"
#include "pybind11/stl.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to call this out. It's a bit suspicious that I had to start including this after a rebase even though it appears nothing fundamentally changed here or at the caller. @tt-aho, @ayerofieiev-tt - any ideas why or objections to the new inclusion here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference this is the error:

mesh_device = MeshDevice(2x4 grid, 8 devices), cores = {[(x=0,y=0) - (x=7,y=6)]}, initial_value = 0

    def create_global_semaphore_with_same_address(mesh_device, cores, initial_value):
        semaphore_handles = ttnn.create_global_semaphore_with_same_address(mesh_device, cores, initial_value)
>       addrs = ttnn.get_global_semaphore_address(semaphore_handles)
E       TypeError: Unable to convert function return value to a Python type! The signature was
E           (global_semaphore: ttnn._ttnn.global_semaphore.multi_device_global_semaphore) -> std::__1::vector<unsigned long, std::__1::allocator<unsigned long>>
E       
E       Did you forget to `#include <pybind11/stl.h>`? Or <pybind11/complex.h>,
E       <pybind11/functional.h>, <pybind11/chrono.h>, etc. Some automatic
E       conversions are optional and require extra headers to be included
E       when compiling your pybind11 module.

tests/ttnn/unit_tests/operations/ccl/test_ccl_common.py:35: TypeError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is fine. It's needed for binding vector. Might be that this was included indirectly from a different file and that this is no longer the case now.

Make it in the same spot for all command types to remove any sort of conditional lookup to determin packet size.

Also switch the size to now represent payload size only (excludes header). This simplifies some caller code as well.

> Conflicts:
>	ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_edm_packet_header.hpp
>	ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_edm_packet_transmission.hpp
>	ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_erisc_datamover.cpp
@SeanNijjar SeanNijjar force-pushed the snijjar/optimize-edm-fabric-packet-size-repr-issue-17686 branch from 313f88a to 9902c15 Compare February 7, 2025 18:26
@@ -7,6 +7,7 @@
#include <tt-metalium/global_semaphore.hpp>
#include "cpp/ttnn/global_semaphore.hpp"
#include "pybind11/pybind11.h"
#include "pybind11/stl.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is fine. It's needed for binding vector. Might be that this was included indirectly from a different file and that this is no longer the case now.

@SeanNijjar
Copy link
Contributor Author

Going to close this and migrate to this other PR which has some additional changes.

Reason: There was a regression in this PR that was fixed there. Additionally this other PR heavily modifies the same files and I wanted to avoid headaches with rebasing conflicts. It's all the same type of change (optimizing EDM fabric)
#17749

@SeanNijjar SeanNijjar closed this Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants