-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#0: Use posted writes for profiler. #17261
Conversation
Thanks for this! Could you please make sure the following CI also pass: https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-profiler-tests.yaml |
By posting writes and only waiting for the flush, we can save around .2 us between ops. The only reason there's a flush here is to ensure that later writes to the buffer don't clobber data that's in the process of being written.
45aed6f
to
baa26d6
Compare
#if defined(COMPILE_FOR_NCRISC) || defined(COMPILE_FOR_BRISC) || defined(COMPILE_FOR_ERISC) || \ | ||
defined(COMPILE_FOR_IDLE_ERISC) | ||
inline void __attribute__((always_inline)) noc_async_write_posted( | ||
std::uint32_t src_local_l1_addr, std::uint64_t dst_noc_addr, std::uint32_t size, uint8_t noc = noc_index) { | ||
WAYPOINT("NAWW"); | ||
DEBUG_SANITIZE_NOC_WRITE_TRANSACTION(noc, dst_noc_addr, src_local_l1_addr, size); | ||
ncrisc_noc_fast_write_any_len<proc_type, noc_mode>( | ||
noc, write_cmd_buf, src_local_l1_addr, dst_noc_addr, size, NOC_UNICAST_WRITE_VC, false, false, 1, true, true); | ||
WAYPOINT("NAWD"); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we expose the noc_async_posted_writes_flushed
fn in dataflow_api, but this fn we only define in profiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noc_async_posted_writes_flushed seems more useful as it's the one type of barrier that happens to be missing, and dataflow_api.h already contains other functions that can cause posted writes. noc_async_write_posted is just one specific write function with a specific set of args that makes sense here, but I'm not sure how useful it would be to other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point worth mentioning is, in order to make dataflow_api "profilable", the plan is to create a stripped down, profiler only noc api in order to remove the circular dependancy. Keeping this function separated is helpful towards that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Would be good to have this in a separate file when that is refactored. Not a blocker. I think there are cases where we might want to switch to posted writes as well for kernels (ex kernels where we mcast data followed by a semaphore can probably be posted since receiver waits on the sem), but can look at that later
Problem description
Currently we take a nontrivial amount of time in the profiler waiting for the writes to DRAM to be completed.
What's changed
By posting writes and only waiting for the flush, we can save around .2 us between ops. The only reason there's a flush here is to ensure that later writes to the buffer don't clobber data that's in the process of being written.
Checklist