Skip to content

Commit

Permalink
Updating Allocator docs to explain first fit usage
Browse files Browse the repository at this point in the history
  • Loading branch information
abhullar-tt committed Dec 19, 2024
1 parent 2df6b54 commit 5d0a532
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 13 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Get started with [simple kernels](https://docs.tenstorrent.com/tt-metalium/lates
- [Data Formats](./tech_reports/data_formats/data_formats.md) (updated Sept 7th)
- [Reconfiguring Data Formats](./tech_reports/data_formats/reconfig_data_format.md) (updated Oct 17th)
- [Handling special floating-point numbers](./tech_reports/Handling_Special_Value/special_values.md) (updated Oct 5th)
- [Allocator](./tech_reports/memory/allocator.md) (Updated Oct 30th)
- [Allocator](./tech_reports/memory/allocator.md) (Updated Dec 19th)
- [Tensor Layouts](./tech_reports/tensor_layouts/tensor_layouts.md) (updated Sept 6th)
- [Saturating DRAM Bandwidth](./tech_reports/Saturating_DRAM_bandwidth/Saturating_DRAM_bandwidth.md) (updated Sept 6th)
- [Flash Attention on Wormhole](./tech_reports/FlashAttention/FlashAttention.md) (updated Sept 6th)
Expand Down
30 changes: 18 additions & 12 deletions tech_reports/memory/allocator.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@

The Allocator maintains a view of on device memory across DRAM and SRAM (i.e. L1). It assigns addresses for `Buffers` and maintains a record of what is free or in-use. The allocator does not interact with on-device memory and is not responsible for copying data to/from the buffers.

1. [Memory Banks](#1-memory-banks)
2. [Lockstep Allocation](#2-lockstep-allocation)
- [2.1 Alignment](#21-alignment)
3. [SRAM Banks and CircularBuffers](#3-sram-banks-and-circularbuffers)
4. [Memory Reports](#4-memory-reports)
1. [Architecture](#1-memory-banks)
2. [Memory Banks](#2-memory-banks)
3. [Lockstep Allocation](#3-lockstep-allocation)
- [3.1 Alignment](#31-alignment)
4. [SRAM Banks and CircularBuffers](#4-sram-banks-and-circularbuffers)
5. [Memory Reports](#5-memory-reports)

## 1. Memory Banks
## 1. Architecture
We use a first fit memory algorithm to allow both top-down and bottom-up allocations in the same memory region. Buffers are able to specify whether they should be alloated in lower address space or higher address space. This enables Buffers of the same class to be grouped together without being fragmented by other Buffer classes. For example, user data Buffers are alloated bottom-up in DRAM while Program binaries are allocated top-down to avoid binaries fragmenting user data space. A use case for this in SRAM for L1 Buffers and CircularBuffers is described [below](#4-sram-banks-and-circularbuffers). Based on this, while best fit algorithms reduce fragmentation for Buffers of the same class they don't account for fragmentation caused by mixing different classes.

Note: This hybrid of top-down and bottom-up alloations can also be enabled by dividing up available memory into different memory pools that are located at higher/lower address spaces and then allocating Buffers of a particular class into their associated memory pool. This approach has not been explored in detail and would first require collecting data to appropriately size these memory pools and then consider dynamic resizing.

## 2. Memory Banks

The number and size of DRAM and SRAM banks are architecture specific and derived from information specified in the SoC descriptor.

Expand All @@ -22,7 +28,7 @@ There is one SRAM bank per Tensix compute core and `N` SRAM banks per storage on

The SRAM bank size is the storage core bank size if specified otherwise it is `total SRAM size - reserved size`. Reserved region in SRAM is used to hold FW, Kernels, and related runtime metadata.

## 2. Lockstep Allocation
## 3. Lockstep Allocation

We follow a `lockstep allocation` scheme across the banks. This means that each bank reserves the same amount of space for a given buffer regardless of how many banks the data is in. Lockstep allocation is illustrated in *Figure 2*

Expand All @@ -36,25 +42,25 @@ Note: this example shows allocating interleaved buffers but a similar scheme is

Managing allocations and deallocations is currently done using a free list.

### 2.1 Alignment
### 3.1 Alignment

Allocations in DRAM and SRAM are done at `DRAM_ALIGNMENT` aligned addresses to ensure that reads from DRAM into SRAM do not violate NoC alignment constraints.

When sizes in the buffer config do not respect this alignment, allocator pads each page or shard. This leads to some internal fragmentation in the allocated region.

## 3. SRAM Banks and CircularBuffers
## 4. SRAM Banks and CircularBuffers

DRAM buffers are allocated from low address to high address (bottom up) whereas SRAM buffers are allocatd from high address to low address (top down). Data within the allocated region always grows from low to high address.
DRAM buffers are allocated from low address to high address (bottom-up) whereas SRAM buffers are allocatd from high address to low address (top-down). Data within the allocated region always grows from low to high address.

SRAM buffers are allocated top down to avoid growing into `CircularBuffers (CBs)` which are *typically* not managed by the allocator and are placed after the reserved region as shown in *Figure 3*. Typically a CBs lifetime is tied to its corresponding `Program` with the `Program` being responsible for managing all CB addresses. To reduce memory footprint, a CB can share the same address space as a SRAM buffer. In this case the `Allocator` rather than the `Program` manages this CB's address.
SRAM buffers are allocated top down to avoid growing into `CircularBuffers (CBs)` which are *typically* not managed by the allocator and are placed after the reserved region as shown in *Figure 3*. Typically a CBs lifetime is tied to its corresponding `Program` with the `Program` being responsible for managing all CB addresses since its lifetime is within an Op. To reduce memory footprint, a CB can share the same address space as a SRAM buffer. In this case the `Allocator` rather than the `Program` manages this CB's address and the lifetime of the CB extends across Ops.

<img src="images/sram-bank-cbs.png" style="width:300px;"/>

*Figure 3: SRAM buffers and CircularBuffers*

Note: before any `Program` is run on device there is a validation step to ensure that CBs and SRAM buffers do not overlap.

## 4. Memory Reports
## 5. Memory Reports

Debug reports can be generated to get a dump of the memory state at a particular point in time or set to log memory state during each `Program`compile.

Expand Down

0 comments on commit 5d0a532

Please sign in to comment.