Skip to content

tenstorrent/tt-inference-server

Repository files navigation

TT-Inference-Server

Tenstorrent Inference Server (tt-inference-server) is the repo of available model APIs for deploying on Tenstorrent hardware.

Official Repository

https://github.com/tenstorrent/tt-inference-server

Getting Started

Please follow setup instructions for the model you want to serve, Model Name in tables below link to corresponding implementation.

Note: models with Status [🔍 preview] are under active development. If you encounter setup or stability problems please file an issue and our team will address it.

LLMs

Model Name Model URL Hardware Status Minimum Release Version
Qwen2.5-72B-Instruct HF Repo TT-QuietBox & TT-LoudBox 🔍 preview v0.0.2
Qwen2.5-72B HF Repo TT-QuietBox & TT-LoudBox 🔍 preview v0.0.2
Qwen2.5-7B-Instruct HF Repo n150 🔍 preview v0.0.2
Qwen2.5-7B HF Repo n150 🔍 preview v0.0.2
Llama-3.3-70B-Instruct HF Repo TT-QuietBox & TT-LoudBox ✅ supported v0.0.1
Llama-3.3-70B HF Repo TT-QuietBox & TT-LoudBox ✅ supported v0.0.1
Llama-3.2-11B-Vision-Instruct HF Repo n300 🔍 preview v0.0.1
Llama-3.2-11B-Vision HF Repo n300 🔍 preview v0.0.1
Llama-3.2-3B-Instruct HF Repo n150 🔍 preview v0.0.1
Llama-3.2-3B HF Repo n150 🔍 preview v0.0.1
Llama-3.2-1B-Instruct HF Repo n150 🔍 preview v0.0.1
Llama-3.2-1B HF Repo n150 🔍 preview v0.0.1
Llama-3.1-70B-Instruct HF Repo TT-QuietBox & TT-LoudBox ✅ supported v0.0.1
Llama-3.1-70B HF Repo TT-QuietBox & TT-LoudBox ✅ supported v0.0.1
Llama-3.1-8B-Instruct HF Repo n150 ✅ supported v0.0.1
Llama-3.1-8B HF Repo n150 ✅ supported v0.0.1

CNNs

Model Name Model URL Hardware Status Minimum Release Version
YOLOv4 GH Repo n150 🔍 preview v0.0.1