#18332: Update BN Kernel #18278

VirdhatchaniKN · 2025-02-25T09:47:20Z

Ticket

Problem description

The current BN implementation splits tensor reading between NCRISC (Input tensor) and BRISC (input stat tensor) , while BRISC also handles writing. This stalls BRISC from writing until compute is done, and also writes are more expensive. It also creates performance inconsistencies.

What's changed

Moved all the input reads to reader file

Checklist

All post-commit tests
Blackhole Post commit
Full new models - Link to test
(Single-card) Demo tests
(Single-card) Device perf regressions - Same as in main
Single-card Model perf tests

sjameelTT

How is the functionality atm, did it solve your problem with bigger dimensions?

sjameelTT · 2025-02-26T15:51:16Z

ttnn/cpp/ttnn/operations/normalization/batch_norm/device/kernels/dataflow/reader_batch_norm.cpp

    uint32_t num_tiles_read = 0;
    for (uint32_t n = start_n; n < N && num_tiles_read < num_tiles; ++n, start_c = 0) {
        for (uint32_t c = start_c; c < C && num_tiles_read < num_tiles; ++c, start_t = 0) {
+            // read a tile from batch_mean
+            cb_reserve_back(cb_id_batch_mean, onetile);


this works, but you could also consider issuing the reads for both the mean tile and the var tile, and adding the barrier for both of them:

cb_reserve_back(cb_id_batch_mean, onetile);
cb_reserve_back(cb_id_batch_var, onetile);
noc_async_read_tile(tile_offset_stat, batch_mean, l1_write_addr);
noc_async_read_tile(tile_offset_stat, batch_var, l1_batch_var_write_addr);
uint32_t l1_write_addr = get_write_ptr(cb_id_batch_mean);
uint32_t l1_batch_var_write_addr = get_write_ptr(cb_id_batch_var);
noc_async_read_barrier();
FILL_TILE_WITH_FIRST_ELEMENT(cb_id_batch_mean);
FILL_TILE_WITH_FIRST_ELEMENT(cb_id_batch_var);
cb_push_back(cb_id_batch_mean, onetile);
cb_push_back(cb_id_batch_var, onetile);

You will be able to amortize both of the reads with each other. The way you have it currently should work too though, so it's fine, and let's get this working completely before you make micro-optimizations.

Sure. I will update this change. But this is not solving the hang issue that we are working on. Need to work more on that.

Hi @sjameelTT , I've updated the files

Continuation of another PR. Will be merged once CI passes Used for testing

tt-aho · 2025-02-27T19:33:51Z

...p/ttnn/operations/normalization/batch_norm/device/kernels/compute/batch_norm_sfpu_kernel.cpp


-        tile_regs_wait();
+        tile_regs_commit();


Is there a reason this is a loop of one? Might as well remove the loop unless you expect this to actually change?

Overall I think it makes things clearer for tile_regs_wait() to come after tile_regs_commit() just for readability, even if your setup works for single loop. This loop and how it is set up with the tile_regs call also doesn't work if the loop is > 1.

VirdhatchaniKN force-pushed the virdhatchani/update_bn branch from 180cc5c to 7b7bd5a Compare February 26, 2025 06:29

VirdhatchaniKN mentioned this pull request Feb 26, 2025

Test hangs while running for larger shapes for Batch Norm #18332

Open

4 tasks

VirdhatchaniKN changed the title ~~#0: Update~~ #18332: Update BN Kernel Feb 26, 2025

VirdhatchaniKN force-pushed the virdhatchani/update_bn branch 3 times, most recently from 96ea39e to 2afcad4 Compare February 26, 2025 14:41

VirdhatchaniKN requested a review from sjameelTT February 26, 2025 14:54

sjameelTT reviewed Feb 26, 2025

View reviewed changes

VirdhatchaniKN force-pushed the virdhatchani/update_bn branch from 2afcad4 to 1f73719 Compare February 26, 2025 17:24

VirdhatchaniKN requested a review from sjameelTT February 26, 2025 18:30

VirdhatchaniKN and others added 4 commits February 26, 2025 18:35

#18332: Update BN Kernel

92f6e24

#18332: Remove unused args in BN

1bda477

#18332: Move input stats to reader file (#18335)

cebf525

Continuation of another PR. Will be merged once CI passes Used for testing

#18332: Update

1c06eb2

VirdhatchaniKN force-pushed the virdhatchani/update_bn branch from e7d7e82 to 1c06eb2 Compare February 26, 2025 18:35

VirdhatchaniKN marked this pull request as ready for review February 26, 2025 23:55

VirdhatchaniKN requested review from yugaoTT, tt-aho, bbradelTT, vsureshTT and edwinleeTT as code owners February 26, 2025 23:55

sjameelTT approved these changes Feb 27, 2025

View reviewed changes

tt-aho reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#18332: Update BN Kernel #18278

#18332: Update BN Kernel #18278

VirdhatchaniKN commented Feb 25, 2025 •

edited

Loading

sjameelTT left a comment

sjameelTT Feb 26, 2025

VirdhatchaniKN Feb 26, 2025

VirdhatchaniKN Feb 26, 2025

tt-aho Feb 27, 2025

#18332: Update BN Kernel #18278

Are you sure you want to change the base?

#18332: Update BN Kernel #18278

Conversation

VirdhatchaniKN commented Feb 25, 2025 • edited Loading

Ticket

Problem description

What's changed

Checklist

sjameelTT left a comment

Choose a reason for hiding this comment

sjameelTT Feb 26, 2025

Choose a reason for hiding this comment

VirdhatchaniKN Feb 26, 2025

Choose a reason for hiding this comment

VirdhatchaniKN Feb 26, 2025

Choose a reason for hiding this comment

tt-aho Feb 27, 2025

Choose a reason for hiding this comment

VirdhatchaniKN commented Feb 25, 2025 •

edited

Loading