Tenstorrent Inference Server (tt-inference-server
) is the repo of available model APIs for deploying on Tenstorrent hardware.
https://github.com/tenstorrent/tt-inference-server
Please follow setup instructions for the model you want to serve, Model Name
in tables below link to corresponding implementation.
Note: models with Status [🔍 preview] are under active development. If you encounter setup or stability problems please file an issue and our team will address it.
Model Name | Model URL | Hardware | Status | Minimum Release Version |
---|---|---|---|---|
Qwen2.5-72B-Instruct | HF Repo | TT-QuietBox & TT-LoudBox | 🔍 preview | v0.0.2 |
Qwen2.5-72B | HF Repo | TT-QuietBox & TT-LoudBox | 🔍 preview | v0.0.2 |
Qwen2.5-7B-Instruct | HF Repo | n150 | 🔍 preview | v0.0.2 |
Qwen2.5-7B | HF Repo | n150 | 🔍 preview | v0.0.2 |
Llama-3.3-70B-Instruct | HF Repo | TT-QuietBox & TT-LoudBox | ✅ supported | v0.0.1 |
Llama-3.3-70B | HF Repo | TT-QuietBox & TT-LoudBox | ✅ supported | v0.0.1 |
Llama-3.2-11B-Vision-Instruct | HF Repo | n300 | 🔍 preview | v0.0.1 |
Llama-3.2-11B-Vision | HF Repo | n300 | 🔍 preview | v0.0.1 |
Llama-3.2-3B-Instruct | HF Repo | n150 | 🔍 preview | v0.0.1 |
Llama-3.2-3B | HF Repo | n150 | 🔍 preview | v0.0.1 |
Llama-3.2-1B-Instruct | HF Repo | n150 | 🔍 preview | v0.0.1 |
Llama-3.2-1B | HF Repo | n150 | 🔍 preview | v0.0.1 |
Llama-3.1-70B-Instruct | HF Repo | TT-QuietBox & TT-LoudBox | ✅ supported | v0.0.1 |
Llama-3.1-70B | HF Repo | TT-QuietBox & TT-LoudBox | ✅ supported | v0.0.1 |
Llama-3.1-8B-Instruct | HF Repo | n150 | ✅ supported | v0.0.1 |
Llama-3.1-8B | HF Repo | n150 | ✅ supported | v0.0.1 |
Model Name | Model URL | Hardware | Status | Minimum Release Version |
---|---|---|---|---|
YOLOv4 | GH Repo | n150 | 🔍 preview | v0.0.1 |