Skip to content

Commit

Permalink
[skip ci] Update perf and latest features for llm models (Oct 7) (#13648
Browse files Browse the repository at this point in the history
)
  • Loading branch information
skhorasganiTT authored Oct 9, 2024
1 parent 53b60f4 commit 68b08ae
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 8 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@
| Model | Batch | Hardware | ttft (s) | t/s/u | Target t/s/u | Release |
|----------------------------------------------------------------------|-------|----------------------------------------------------------|------------|-------|--------------|---------------------------------------------------------------------------|
| [Falcon7B-decode](./models/demos/ttnn_falcon7b) | 32 | [e150](https://tenstorrent.com/hardware/grayskull) | | 4.2 | 4.4 | |
| [Falcon7B](./models/demos/wormhole/falcon7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 0.07 | 16.7 | 26 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
| [Falcon7B](./models/demos/wormhole/falcon7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 0.07 | 16.7 | 26 | [v0.53.0-rc9](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc9) |
| [Mistral-7B](./models/demos/wormhole/mistral7b) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | | 9.9 | 25 | [v0.51.0-rc28](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc28) |
| [Mamba-2.8B](./models/demos/wormhole/mamba) | 32 | [n150](https://tenstorrent.com/hardware/wormhole) | 0.04 | 12.3 | 41 | [v0.51.0-rc26](https://github.com/tenstorrent/tt-metal/tree/v0.51.0-rc26) |
| [LLaMA-3.1-8B](./models/demos/wormhole/llama31_8b) | 1 | [n150](https://tenstorrent.com/hardware/wormhole) | 0.20 | 21.4 | 23 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
| [Falcon7B (data parallel)](./models/demos/t3000/falcon7b) | 256 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.10 | 14.1 | 26 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
| [LLaMA-2-70B - (tensor parallel)](./models/demos/t3000/llama2_70b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.19 | 15.1 | 20 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
| [LLaMA-3.1-70B (tensor parallel)](./models/demos/t3000/llama3_70b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.19 | 15.1 | 20 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
| [Falcon40B (tensor parallel)](./models/demos/t3000/falcon40b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | | 5.3 | 36 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
| [Mixtral7Bx8 (tensor parallel)](./models/demos/t3000/mixtral8x7b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.23 | 14.2 | 33 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
| [Falcon7B (data parallel)](./models/demos/tg/falcon7b) |1024 | [Galaxy](https://tenstorrent.com/hardware/galaxy) | 0.24 | 4.3 | 26 | [v0.52.0-rc31](https://github.com/tenstorrent/tt-metal/tree/v0.52.0-rc31) |
> **Last Update:** September 23, 2024
| [Falcon7B (data parallel)](./models/demos/t3000/falcon7b) | 256 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.10 | 14.4 | 26 | [v0.53.0-rc9](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc9) |
| [LLaMA-2-70B - (tensor parallel)](./models/demos/t3000/llama2_70b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.19 | 15.1 | 20 | [v0.53.0-rc9](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc9) |
| [LLaMA-3.1-70B (tensor parallel)](./models/demos/t3000/llama3_70b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.19 | 15.1 | 20 | [v0.53.0-rc9](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc9) |
| [Falcon40B (tensor parallel)](./models/demos/t3000/falcon40b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | | 5.3 | 36 | [v0.53.0-rc2](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc2) |
| [Mixtral7Bx8 (tensor parallel)](./models/demos/t3000/mixtral8x7b) | 32 | [QuietBox](https://tenstorrent.com/hardware/tt-quietbox) | 0.23 | 14.2 | 33 | [v0.53.0-rc9](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc9) |
| [Falcon7B (data parallel)](./models/demos/tg/falcon7b) |1024 | [Galaxy](https://tenstorrent.com/hardware/galaxy) | 0.21 | 4.4 | 26 | [v0.53.0-rc9](https://github.com/tenstorrent/tt-metal/tree/v0.53.0-rc9) |
> **Last Update:** October 7, 2024
> **Notes:**
> - The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
Expand Down
7 changes: 7 additions & 0 deletions models/MODEL_UPDATES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
>
> Please refer to the front-page [README](../README.md) for the latest verified release for each model.
## October 7, 2024

### [Llama 3.1 - 8B](demos/wormhole/llama31_8b)
- Added support for continuous batching
- Added paged caching support for PagedAttention
- Added a demo which runs with TT-NN tracing (23 t/s/u decode on main)

## September 23, 2024

### [Llama 3/3.1 - 70B](demos/t3000/llama3_70b)
Expand Down

0 comments on commit 68b08ae

Please sign in to comment.