This repository serves as both:
- Tinfoil's default inference enclave running CPU-only models using Ollama inference server, on AWS Nitro Enclaves
- A template for running your choice of CPU-only models on Nitro Enclaves
The default enclave runs the following models:
llama3.2:1b
llama-guard3:1b
qwen2.5-coder:0.5b
nomic-embed-text
And exposes the following endpoints for inference:
/api/chat
/v1/chat/completions
/api/generate
/api/embed
As shown in config.json
.
If you want to run a different set of models and/or expose a different set of endpoints:
- Click "Use this template" to create a new repository
- Edit
config.json
to customize:models
: Any model from Ollama's librarypaths
: API endpoints from Ollama's API documentation you want to expose
- Create a release tag (e.g.
v0.0.1
) to trigger the build workflow