-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional EDM fabric optimizations (mix of low level and experimental flow control protocol trimming) #17749
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 tasks
Make it in the same spot for all command types to remove any sort of conditional lookup to determin packet size. Also switch the size to now represent payload size only (excludes header). This simplifies some caller code as well. > Conflicts: > ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_edm_packet_header.hpp > ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_edm_packet_transmission.hpp > ttnn/cpp/ttnn/operations/ccl/kernels/edm_fabric/fabric_erisc_datamover.cpp
343fc44
to
78f8965
Compare
tt-aho
approved these changes
Feb 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
aliuTT
approved these changes
Feb 10, 2025
TT-BrianLiu
approved these changes
Feb 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Files owned by me looks good
hschoi4448
pushed a commit
that referenced
this pull request
Feb 20, 2025
…l flow control protocol trimming) (#17749) High level changes: 1) Optimize size information in packet header. - simplifies packet processing and setup 2) Optimize routing information storage in packet header - simplifies packet processing 3) Added missing inline write command type which is required after these changes 4) Migrate to more optimized eth APIs - eth_write_reg and eth_send_packet that omit bit shifts and omit context switch calls 5) Trimming flow control protocols further + various force inlines for tiny getter functions ## Packet Header Size Field Optimization - Simplify packet size storage and access - promote to "top-level" of packet to remove conditionality previously needed to get size info from packet - NOTE: packet size now specifies PAYLOAD SIZE ONLY!!! The header size must be implicitly added by fabric. - net this is still fine because we had to previous subtract header size when writing out to noc. ## Packet Header Routing Info Optimization Merged the mcast and unicast representation to match so I can uniformly process the packet to decide the following: - Does packet get sent to local device noc? - Does packet get forwarded through the fabric? The previous implementation was required to first check the fabric send type before being able to do further inspection to answer the above questions. Now the code is much simpler - no fabric type info checked - single code path to check both. Additionally the check logic is also streamlined. ## New packet command type Extra functionality: Added `NOC_UNICAST_INLINE_WRITE` eth packet command type to address a regression as a result of the above change (if the command type wasn't added) ## Optimized Eth send APIs - Migrate `eth_send_packet` calls to new version that takes size in bytes. - This version avoids a number of shift operations that were present in the previously used version. - Add and using new eth write remote reg (`eth_write_remote_reg_no_txq_check`) that doesn't have conditional context switch in body of function ## Flow Control Protocol Trimming - Enabled (by default) a less granular syncing mode between sender and receiver channels. Overall, in a theoretical sense, this is suboptimal. However, in a severely SW bound implementation like present, this will save on instruction count. We disable the following: - first level ack (i.e. when receiver gets the packet and notifies sender of packet received) - separate pointer management for write flush ptr and completion pointer send on receiver channel - Flush ptr now merged with completion pointer so we cut down on processing.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ticket
#17686
Problem description
EDM fabric perf is 💩 , make it better. We're SW bound. Put in known/obvious optimizations to make analysis less noisy.
What's changed
High level changes:
Packet Header Size Field Optimization
Packet Header Routing Info Optimization
Merged the mcast and unicast representation to match so I can uniformly process the packet to decide the following:
The previous implementation was required to first check the fabric send type before being able to do further inspection to answer the above questions. Now the code is much simpler - no fabric type info checked - single code path to check both. Additionally the check logic is also streamlined.
New packet command type
Extra functionality: Added
NOC_UNICAST_INLINE_WRITE
eth packet command type to address a regression as a result of the above change (if the command type wasn't added)Optimized Eth send APIs
eth_send_packet
calls to new version that takes size in bytes.eth_write_remote_reg_no_txq_check
) that doesn't have conditional context switch in body of functionFlow Control Protocol Trimming
We disable the following:
Checklist