diff --git a/tech_reports/ttnn/graph-tracing.md b/tech_reports/ttnn/graph-tracing.md
new file mode 100644
index 00000000000..e1e07c1d731
--- /dev/null
+++ b/tech_reports/ttnn/graph-tracing.md
@@ -0,0 +1,708 @@
+# TT-NN Graph Trace
+TT-NN provides a mechanism for tracing operations and memory activities in a neural network's execution.
+
+Using this trace it is possible to analyze the operation even without executing it on the accelerator.
+The output trace can then be processed to get a single number like Peak Memory Load or to print tabular data or visualize a call graph.
+
+## 🪄 How to Use
+Wrap any number of TT-NN calls with `GraphProcessor::begin_graph_capture` and `GraphProcessor::end_graph_capture` or use with any callable.
+In the example below `ttnn::zeros` is not included in a trace, but `ttnn::add` is
+https://github.com/tenstorrent/tt-metal/blob/4ae4ac3c30cd24ea27cbac8cc5811c90d077e9c0/tests/ttnn/unit_tests/gtests/test_graph_add.cpp#L50-L58
+
+You can then analyze the trace with some of the provided utility functions
+https://github.com/tenstorrent/tt-metal/blob/4ae4ac3c30cd24ea27cbac8cc5811c90d077e9c0/tests/ttnn/unit_tests/gtests/test_graph_add.cpp#L64-L66
+or process it manually to extract whatever data in whatever format, like this table
+```
+           current_op                           event  total_cb  total_buffer                                                                                 info
+0            ttnn::add                        begin_op         0       9011200                                                 {'inputs': '8', 'name': 'ttnn::add'}
+1         ttnn::repeat                        begin_op         0       9011200                                              {'inputs': '2', 'name': 'ttnn::repeat'}
+2         ttnn::repeat                 buffer_allocate         0      17203200    {'address': '753696', 'layout': 'INTERLEAVED', 'size': '8192000', 'type': 'DRAM'}
+3         ttnn::repeat                 buffer_allocate         0      17209344   {'address': '1073735680', 'layout': 'INTERLEAVED', 'size': '6144', 'type': 'DRAM'}
+4         ttnn::repeat        circular_buffer_allocate      4096      17209344    {'addr': '107360', 'core_range_set': '{[(x=0,y=0) - (x=7,y=7)]}', 'size': '4096'}
+5         ttnn::repeat               buffer_deallocate      4096      17203200                               {'layout': 'INTERLEAVED', 'size': '0', 'type': 'DRAM'}
+6         ttnn::repeat  circular_buffer_deallocate_all         0      17203200                                                                                   {}
+7   ttnn::prim::binary                        begin_op         0      17203200                                       {'inputs': '10', 'name': 'ttnn::prim::binary'}
+8   ttnn::prim::binary                 buffer_allocate         0      25395200   {'address': '1437728', 'layout': 'INTERLEAVED', 'size': '8192000', 'type': 'DRAM'}
+9   ttnn::prim::binary                 buffer_allocate         0      25409536  {'address': '1073735680', 'layout': 'INTERLEAVED', 'size': '14336', 'type': 'DRAM'}
+10  ttnn::prim::binary        circular_buffer_allocate      4096      25409536    {'addr': '107360', 'core_range_set': '{[(x=0,y=0) - (x=7,y=7)]}', 'size': '4096'}
+11  ttnn::prim::binary        circular_buffer_allocate      8192      25409536    {'addr': '111456', 'core_range_set': '{[(x=0,y=0) - (x=7,y=7)]}', 'size': '4096'}
+12  ttnn::prim::binary        circular_buffer_allocate     12288      25409536    {'addr': '115552', 'core_range_set': '{[(x=0,y=0) - (x=7,y=7)]}', 'size': '4096'}
+13  ttnn::prim::binary               buffer_deallocate     12288      25395200                               {'layout': 'INTERLEAVED', 'size': '0', 'type': 'DRAM'}
+14  ttnn::prim::binary  circular_buffer_deallocate_all         0      25395200                                                                                   {}
+15           ttnn::add               buffer_deallocate         0      17203200                               {'layout': 'INTERLEAVED', 'size': '0', 'type': 'DRAM'}
+16           ttnn::add  circular_buffer_deallocate_all         0      17203200                                                                                   {}
+```
+or a graph
+![trace](https://github.com/user-attachments/assets/42501a1f-8354-4b3b-a5d9-707f30b23f4f)
+
+## Trace Format
+Trace is captured as a JSON and you can find the code producing it [here](https://github.com/tenstorrent/tt-metal/blob/main/ttnn/cpp/ttnn/graph/graph_processor.cpp).
+Below you can find a detailed description of the schema.
+
+### Node Types
+The trace is represented as a directed graph, where each node corresponds to a specific operation or memory event. Below is an overview of the various types of nodes that can be present in the trace:
+
+First, each node has these parameters
+
+* `counter`: The unique identifier of the node within the graph.
+* `node_type`: node type, available types are listed below
+* `params`: the map of parameters \[mapping a string string property name to a string value\]
+* `connections`: An array of connections to subsequent nodes.
+
+### Node Connections
+Each node in the graph maintains a list of connections to other nodes. These connections represent the flow of data and control through the various operations and memory events during the execution of the network.
+
+### 1\. capture\_start
+Marks the beginning of the graph capture process. This node is the root of the graph and does not have any parent nodes.
+
+#### Parameters
+* Empty, as this is a marker node.
+
+#### Connections
+* First element is next operation call
+* Last element is the corresponding `capture_end`
+
+### 2\. capture\_end
+Marks the end of the graph capture process.
+
+#### Parameters
+* Empty, as this is a marker node.
+
+### 3\. function\_start
+Represents the beginning of a function or operation within the graph. This node captures details about the function name and the number of input parameters it received.
+
+#### Parameters
+* `inputs`: Number of input parameters.
+* `name`: The name of the function or operation.
+
+#### Connections
+* Another op call, primitive op call, or a corresponding `function_end`
+
+#### Functions types
+* TT-NN operation: for example ttnn::add, ttnn::repeat
+* TT-NN primitive operation: for example ttnn::prim::binary (if it uses TMP infra) or ttnn::prim::old\_infra\_device\_operation (if not)
+* TT-NN device operation: name “Device Operation” (should change in future)
+* Unregistered, but manually tracked functions: (will change)
+  create\_device\_tensor, Tensor::to, Tensor::cpu, Tensor::cpu\_sharded, Tensor::print, Tensor::pad, Tensor::unpad, Tensor::reshape
+  tt::tt\_metal::detail::convert\_python\_tensor\_to\_tt\_tensor, tt::tt\_metal::detail::convert\_tt\_tensor\_to\_torch\_tensor
+
+### 4\. function\_end
+Marks the end of a function or operation. This node is paired with the corresponding `begin_function` node and records the outputs or final state of the function.
+
+#### Parameters
+* `name`: The name of the function or operation.
+
+#### Connections
+* Output tensor/s
+* Next control flow node (potentially `capture_end` node)
+
+### 5\. buffer
+Represents the allocation of a memory buffer. This node records details about the buffer's size, type (e.g., DRAM or L1 cache), and layout.
+
+#### Parameters
+* `size`: The size of the buffer in bytes.
+* `type`: The type of memory (e.g., "DRAM", "L1").
+* `layout`: The memory layout (e.g., "INTERLEAVED", "SINGLE\_BANK").
+
+#### Connections
+* Single element in the connection list specifies the associated Tensor ID
+
+### 6\. buffer\_allocate
+Denotes the allocation of a buffer in memory. This node captures the specifics of the memory allocation event, including the buffer's address and type.
+
+#### Parameters
+* `size`: The size of the buffer in bytes.
+* `address`: The memory address of the buffer.
+* `type`: The type of memory (e.g., "DRAM", "L1").
+* `layout`: The memory layout.
+
+#### Connections
+* Single element in the connections list specifies the allocated buffer ID
+
+### 7\. buffer\_deallocate
+Represents the deallocation of a buffer from memory. This node records the details of the buffer being deallocated.
+
+#### Parameters
+* `size`: The size of the buffer in bytes.
+* `type`: The type of memory (e.g., "DRAM", "L1").
+* `layout`: The memory layout.
+
+#### Connections
+* Single element in the connections list specifies the deallocated buffer ID
+
+### 8\. circular\_buffer\_allocate
+Represents the allocation of a circular buffer, typically used in handling streaming data or multi-buffering strategies. This node captures details like the core range set involved and the buffer size.
+
+#### Parameters
+* `size`: The size of the circular buffer in bytes.
+* `address`: The memory address associated with the buffer.
+* `core_range_set`: The range of cores involved in the circular buffer.
+
+#### Connections
+Usually empty
+
+### 9\. circular\_buffer\_deallocate\_all
+Marks the deallocation of all circular buffers. This is a bulk operation and is connected to all circular buffer nodes that are being deallocated.
+
+#### Parameters
+Empty, as this operation deallocates all circular buffers.
+
+#### Connections
+Usually empty
+
+### 10\. tensor
+Represents a tensor in the graph. This node captures the tensor's shape and is connected to the memory buffer it uses, if applicable.
+`[#]` means that each tensor is indexed, and instead of \# in real trace you will see an id
+
+#### Parameters
+* `tensor_id`: The identified of the tensor.
+* `shape`: The shape of the tensor.
+
+#### Connections
+Usually specifies function\_start of a function where given tensor is passed as an argument/used
+
+## Operation Dispatching
+When run in `NO_DISPATCH` run mode, real allocations do not happen, so trace collection does not have side effects on the allocator state.
+You can pass unrealistically big tensors in this mode and unless an operation does upper limit validation, you still can collect the trace.
+In this mode trace collection is faster because ops are dispatched to the device.
+
+When run in the `NORMAL` mode, memory can be fragmented, which can lead to a different trace and you see real addresses where everything is allocated.
+
+## Python
+Tracing is available through Python too
+https://github.com/tenstorrent/tt-metal/blob/4ae4ac3c30cd24ea27cbac8cc5811c90d077e9c0/tests/ttnn/unit_tests/test_graph_capture.py#L21-L25
+
+Here is a sample code to print a table from the beginning of this document
+
+```py
+def process_allocations(graph):
+   df = pd.DataFrame(columns=['current_op', 'event', 'total_cb', 'total_buffer', 'info'])
+
+   cur_op = []
+   total_cb = 0
+   total_buffer = 0
+   tensors = set()
+   i = 1 # lets skip initial node
+   while i < len(graph):
+       params = ''
+       v = graph[i]
+       params = v.params
+       print(v, len(df))
+       i += 1
+       if v.node_type == 'function_start':
+           if len(cur_op) == 0:
+               #entring first op, lets get all input tensors
+               while i < len(graph):
+                   print(graph[i], len(df))
+                   if graph[i].node_type == 'buffer':
+                       total_buffer += int(graph[i].params['size'])
+                       i += 1
+                   elif graph[i].node_type == 'tensor':
+                       i += 1
+                   else:
+                       break
+           name = v.params['name']
+           if name == "ttnn::prim::old_infra_device_operation":
+               name = "ttnn::prim::old_infra_op"
+           cur_op.append(name)
+       if v.node_type == 'circular_buffer_allocate':
+           total_cb += int(v.params['size'])
+       if v.node_type == 'circular_buffer_deallocate_all':
+           total_cb = 0
+       if v.node_type == 'buffer_allocate':
+           total_buffer += int(v.params['size'])
+       if v.node_type == 'function_end':
+           cur_op.pop()
+           #continue
+       if v.node_type == 'tensor':
+           continue
+       if v.node_type == 'buffer_deallocate':
+           total_buffer -= int(graph[v.connections[0]].params['size'])
+       if v.node_type == 'buffer':
+           continue
+       if len(cur_op) > 0:
+           data =  {'current_op': cur_op[-1], 'event' : v.node_type, 'total_cb': total_cb, 'total_buffer': total_buffer, 'info' : params}
+           df.loc[len(df)] = data
+   return df
+```
+
+## Sample Trace
+
+This is a sample trace of running `ttnn::add(Shape\[1, 1, 32, 32\], Shape\[4, 1, 32, 32\])`.
+This setup requires to broadcast the first tensor, so trace contains a call to ttnn::repeat.
+High level call stack here is:
+
+```
+ttnn::add
+ttnn::repeat
+ttnn::prim::old_infra_device_operation (calling ttnn primitive operation)
+Device Operation (dispatching device operation)
+create_device_tensor (creates intermediate output for ttnn::repeat)
+ttnn::prim::binary (calling ttnn primitive operation)
+Device Operation (dispatching device operation)
+create_device_tensor (creates final output)
+```
+
+And you can see when each Buffer and CB is allocated / deallocated.
+
+### PrettyPrint
+
+```
+Capture Start
+Begin: tt::tt_metal::detail::convert_python_tensor_to_tt_tensor
+End:   tt::tt_metal::detail::convert_python_tensor_to_tt_tensor
+Add Tensor: 0
+Begin: ttnn::to_layout
+    Begin: Tensor::reshape
+    End:   Tensor::reshape
+    Add Tensor: 1
+    Begin: Tensor::pad
+    End:   Tensor::pad
+    Add Tensor: 2
+    Begin: Tensor::to
+    End:   Tensor::to
+    Add Tensor: 3
+End:   ttnn::to_layout
+Begin: Tensor::to
+    Add Device Buffer
+    Allocate Device Buffer
+End:   Tensor::to
+Add Tensor: 4
+Begin: ttnn::add
+    Begin: Tensor::to
+        Add Tensor: -1
+        Add Device Buffer
+        Allocate Device Buffer
+    End:   Tensor::to
+    Add Tensor: 5
+    Begin: ttnn::prim::binary
+        Begin: BinaryDeviceOperation
+            Begin: tt::tt_metal::create_device_tensor
+                Add Device Buffer
+                Allocate Device Buffer
+            End:   tt::tt_metal::create_device_tensor
+            Add Tensor: 6
+            Add Tensor: 6
+            Add Device Buffer
+            Allocate Device Buffer
+            Allocate Device Buffer
+            Allocate Device Buffer
+            Allocate Device Buffer
+            Deallocate Device Buffer
+        End:   BinaryDeviceOperation
+        Add Tensor: 7
+    End:   ttnn::prim::binary
+    Deallocate Device Buffer
+End:   ttnn::add
+Begin: Tensor::cpu
+End:   Tensor::cpu
+Add Tensor: 8
+Begin: Tensor::to
+End:   Tensor::to
+Add Tensor: 9
+Begin: tt::tt_metal::detail::convert_tt_tensor_to_torch_tensor
+End:   tt::tt_metal::detail::convert_tt_tensor_to_torch_tensor
+Deallocate Device Buffer
+Deallocate Device Buffer
+```
+
+### Visualizer
+
+![visualizer](https://github.com/user-attachments/assets/03df00c6-4902-416d-a26a-6ffe874537a5)
+
+
+## Raw JSON
+```
+[
+    {
+        "connections": [
+            1,
+            32
+        ],
+        "counter": 0,
+        "node_type": "capture_start",
+        "params": {}
+    },
+    {
+        "connections": [
+            3,
+            5,
+            6,
+            18,
+            30,
+            31
+        ],
+        "counter": 1,
+        "node_type": "function_start",
+        "params": {
+            "inputs": "2",
+            "name": "ttnn::add"
+        }
+    },
+    {
+        "connections": [
+            1,
+            18
+        ],
+        "counter": 2,
+        "node_type": "tensor",
+        "params": {
+            "shape": "ttnn.Shape([4, 3, 32, 32])"
+        }
+    },
+    {
+        "connections": [
+            2,
+            2
+        ],
+        "counter": 3,
+        "name": "buffer",
+        "params": {
+            "layout": "INTERLEAVED",
+            "size": "24576",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [
+            1,
+            6
+        ],
+        "counter": 4,
+        "name": "tensor[1]",
+        "params": {
+            "shape": "ttnn.Shape([1, 3, 32, 32])"
+        }
+    },
+    {
+        "connections": [
+            4,
+            4
+        ],
+        "counter": 5,
+        "name": "buffer",
+        "params": {
+            "layout": "INTERLEAVED",
+            "size": "6144",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [
+            7,
+            17
+        ],
+        "counter": 6,
+        "name": "function_start",
+        "params": {
+            "inputs": "2",
+            "name": "ttnn::repeat"
+        }
+    },
+    {
+        "connections": [
+            8,
+            16
+        ],
+        "counter": 7,
+        "name": "function_start",
+        "params": {
+            "inputs": "5",
+            "name": "ttnn::prim::old_infra_device_operation"
+        }
+    },
+    {
+        "connections": [
+            9,
+            14,
+            15
+        ],
+        "counter": 8,
+        "name": "function_start",
+        "params": {
+            "inputs": "2",
+            "name": "Device Operation"
+        }
+    },
+    {
+        "connections": [
+            10,
+            11,
+            12
+        ],
+        "counter": 9,
+        "name": "function_start",
+        "params": {
+            "inputs": "5",
+            "name": "create_device_tensor"
+        }
+    },
+    {
+        "connections": [
+            13,
+            13,
+            13,
+            13,
+            13
+        ],
+        "counter": 10,
+        "name": "buffer",
+        "params": {
+            "layout": "INTERLEAVED",
+            "size": "24576",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [
+            10
+        ],
+        "counter": 11,
+        "name": "buffer_allocate",
+        "params": {
+            "address": "1953396066",
+            "layout": "INTERLEAVED",
+            "size": "24576",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [
+            13
+        ],
+        "counter": 12,
+        "name": "function_end",
+        "params": {
+            "name": "create_device_tensor"
+        }
+    },
+    {
+        "connections": [
+            18
+        ],
+        "counter": 13,
+        "name": "tensor[2]",
+        "params": {
+            "shape": "ttnn.Shape([4, 3, 32, 32])"
+        }
+    },
+    {
+        "connections": [],
+        "counter": 14,
+        "name": "circular_buffer_allocate",
+        "params": {
+            "address": "0",
+            "core_range_set": "{[(x=0,y=0) - (x=0,y=7)], [(x=1,y=0) - (x=1,y=3)]}",
+            "size": "4096"
+        }
+    },
+    {
+        "connections": [
+            13
+        ],
+        "counter": 15,
+        "name": "function_end",
+        "params": {
+            "name": "Device Operation"
+        }
+    },
+    {
+        "connections": [
+            13
+        ],
+        "counter": 16,
+        "name": "function_end",
+        "params": {
+            "name": "ttnn::prim::old_infra_device_operation"
+        }
+    },
+    {
+        "connections": [
+            13,
+            18
+        ],
+        "counter": 17,
+        "name": "function_end",
+        "params": {
+            "name": "ttnn::repeat"
+        }
+    },
+    {
+        "connections": [
+            19,
+            29
+        ],
+        "counter": 18,
+        "name": "function_start",
+        "params": {
+            "inputs": "10",
+            "name": "ttnn::prim::binary"
+        }
+    },
+    {
+        "connections": [
+            20,
+            25,
+            26,
+            27,
+            28
+        ],
+        "counter": 19,
+        "name": "function_start",
+        "params": {
+            "inputs": "2",
+            "name": "Device Operation"
+        }
+    },
+    {
+        "connections": [
+            21,
+            22,
+            23
+        ],
+        "counter": 20,
+        "name": "function_start",
+        "params": {
+            "inputs": "5",
+            "name": "create_device_tensor"
+        }
+    },
+    {
+        "connections": [
+            24,
+            24,
+            24,
+            24
+        ],
+        "counter": 21,
+        "name": "buffer",
+        "params": {
+            "layout": "INTERLEAVED",
+            "size": "24576",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [
+            21
+        ],
+        "counter": 22,
+        "name": "buffer_allocate",
+        "params": {
+            "address": "0",
+            "layout": "INTERLEAVED",
+            "size": "24576",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [
+            24
+        ],
+        "counter": 23,
+        "name": "function_end",
+        "params": {
+            "name": "create_device_tensor"
+        }
+    },
+    {
+        "connections": [],
+        "counter": 24,
+        "name": "tensor[3]",
+        "params": {
+            "shape": "ttnn.Shape([4, 3, 32, 32])"
+        }
+    },
+    {
+        "connections": [],
+        "counter": 25,
+        "name": "circular_buffer_allocate",
+        "params": {
+            "address": "0",
+            "core_range_set": "{[(x=0,y=0) - (x=7,y=7)]}",
+            "size": "4096"
+        }
+    },
+    {
+        "connections": [],
+        "counter": 26,
+        "name": "circular_buffer_allocate",
+        "params": {
+            "address": "0",
+            "core_range_set": "{[(x=0,y=0) - (x=7,y=7)]}",
+            "size": "4096"
+        }
+    },
+    {
+        "connections": [],
+        "counter": 27,
+        "name": "circular_buffer_allocate",
+        "params": {
+            "address": "0",
+            "core_range_set": "{[(x=0,y=0) - (x=7,y=7)]}",
+            "size": "4096"
+        }
+    },
+    {
+        "connections": [
+            24
+        ],
+        "counter": 28,
+        "name": "function_end",
+        "params": {
+            "name": "Device Operation"
+        }
+    },
+    {
+        "connections": [
+            24
+        ],
+        "counter": 29,
+        "name": "function_end",
+        "params": {
+            "name": "ttnn::prim::binary"
+        }
+    },
+    {
+        "connections": [
+            10
+        ],
+        "counter": 30,
+        "name": "buffer_deallocate",
+        "params": {
+            "layout": "INTERLEAVED",
+            "size": "0",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [
+            24,
+            33
+        ],
+        "counter": 31,
+        "name": "function_end",
+        "params": {
+            "name": "ttnn::add"
+        }
+    },
+    {
+        "connections": [
+            21
+        ],
+        "counter": 32,
+        "name": "buffer_deallocate",
+        "params": {
+            "layout": "INTERLEAVED",
+            "size": "0",
+            "type": "L1"
+        }
+    },
+    {
+        "connections": [],
+        "counter": 33,
+        "name": "capture_end",
+        "params": {}
+    }
+]
+```