Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional EDM fabric optimizations (mix of low level and experimental flow control protocol trimming) #17749

Merged
merged 7 commits into from
Feb 10, 2025

Conversation

SeanNijjar
Copy link
Contributor

@SeanNijjar SeanNijjar commented Feb 7, 2025

Ticket

#17686

Problem description

EDM fabric perf is 💩 , make it better. We're SW bound. Put in known/obvious optimizations to make analysis less noisy.

What's changed

High level changes:

  1. Optimize size information in packet header.
  • simplifies packet processing and setup
  1. Optimize routing information storage in packet header
  • simplifies packet processing
  1. Added missing inline write command type which is required after these changes
  2. Migrate to more optimized eth APIs
  • eth_write_reg and eth_send_packet that omit bit shifts and omit context switch calls
  1. Trimming flow control protocols further
  • various force inlines for tiny getter functions

Packet Header Size Field Optimization

  • Simplify packet size storage and access
    • promote to "top-level" of packet to remove conditionality previously needed to get size info from packet
    • NOTE: packet size now specifies PAYLOAD SIZE ONLY!!! The header size must be implicitly added by fabric.
      • net this is still fine because we had to previous subtract header size when writing out to noc.

Packet Header Routing Info Optimization

Merged the mcast and unicast representation to match so I can uniformly process the packet to decide the following:

  • Does packet get sent to local device noc?
  • Does packet get forwarded through the fabric?

The previous implementation was required to first check the fabric send type before being able to do further inspection to answer the above questions. Now the code is much simpler - no fabric type info checked - single code path to check both. Additionally the check logic is also streamlined.

New packet command type

Extra functionality: Added NOC_UNICAST_INLINE_WRITE eth packet command type to address a regression as a result of the above change (if the command type wasn't added)

Optimized Eth send APIs

  • Migrate eth_send_packet calls to new version that takes size in bytes.
    • This version avoids a number of shift operations that were present in the previously used version.
  • Add and using new eth write remote reg (eth_write_remote_reg_no_txq_check) that doesn't have conditional context switch in body of function

Flow Control Protocol Trimming

  • Enabled (by default) a less granular syncing mode between sender and receiver channels. Overall, in a theoretical sense, this is suboptimal. However, in a severely SW bound implementation like present, this will save on instruction count.

We disable the following:

  • first level ack (i.e. when receiver gets the packet and notifies sender of packet received)
  • separate pointer management for write flush ptr and completion pointer send on receiver channel
    • Flush ptr now merged with completion pointer so we cut down on processing.

Checklist

Make it in the same spot for all command types to remove any sort of conditional lookup to determin packet size.

Also switch the size to now represent payload size only (excludes header). This simplifies some caller code as well.

> Conflicts:
>	ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_edm_packet_header.hpp
>	ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_edm_packet_transmission.hpp
>	ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_erisc_datamover.cpp
@SeanNijjar SeanNijjar force-pushed the snijjar/opt-edm-fabric-chip-cmd-type branch from 343fc44 to 78f8965 Compare February 9, 2025 18:22
Copy link
Contributor

@tt-aho tt-aho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@TT-BrianLiu TT-BrianLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files owned by me looks good

@SeanNijjar SeanNijjar merged commit 66f0c03 into main Feb 10, 2025
247 of 257 checks passed
@SeanNijjar SeanNijjar deleted the snijjar/opt-edm-fabric-chip-cmd-type branch February 10, 2025 23:03
hschoi4448 pushed a commit that referenced this pull request Feb 20, 2025
…l flow control protocol trimming) (#17749)

High level changes:
1) Optimize size information in packet header. 
 - simplifies packet processing and setup
2) Optimize routing information storage in packet header
 - simplifies packet processing
3) Added missing inline write command type which is required after these
changes
4) Migrate to more optimized eth APIs
- eth_write_reg and eth_send_packet that omit bit shifts and omit
context switch calls
5) Trimming flow control protocols further

+ various force inlines for tiny getter functions

## Packet Header Size Field Optimization
- Simplify packet size storage and access
- promote to "top-level" of packet to remove conditionality previously
needed to get size info from packet
- NOTE: packet size now specifies PAYLOAD SIZE ONLY!!! The header size
must be implicitly added by fabric.
- net this is still fine because we had to previous subtract header size
when writing out to noc.

## Packet Header Routing Info Optimization
Merged the mcast and unicast representation to match so I can uniformly
process the packet to decide the following:
- Does packet get sent to local device noc?
- Does packet get forwarded through the fabric?

The previous implementation was required to first check the fabric send
type before being able to do further inspection to answer the above
questions. Now the code is much simpler - no fabric type info checked -
single code path to check both. Additionally the check logic is also
streamlined.

## New packet command type
Extra functionality: Added `NOC_UNICAST_INLINE_WRITE` eth packet command
type to address a regression as a result of the above change (if the
command type wasn't added)

## Optimized Eth send APIs
- Migrate `eth_send_packet` calls to new version that takes size in
bytes.
- This version avoids a number of shift operations that were present in
the previously used version.
- Add and using new eth write remote reg
(`eth_write_remote_reg_no_txq_check`) that doesn't have conditional
context switch in body of function

## Flow Control Protocol Trimming
- Enabled (by default) a less granular syncing mode between sender and
receiver channels. Overall, in a theoretical sense, this is suboptimal.
However, in a severely SW bound implementation like present, this will
save on instruction count.

We disable the following:
- first level ack (i.e. when receiver gets the packet and notifies
sender of packet received)
- separate pointer management for write flush ptr and completion pointer
send on receiver channel
- Flush ptr now merged with completion pointer so we cut down on
processing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants