This recipe demonstrates how to use OpenAI models in Spice.ai.
- Ensure you have the Spice CLI installed. Follow the Getting Started guide if you haven't done so yet.
Populate .env
with the following:
GITHUB_TOKEN
: A personal access token.OPENAI_API_KEY
: A valid OpenAI API key.
Verify that the spicepod.yaml
is configured as follows:
datasets:
- from: github:github.com/spiceai/spiceai/files/trunk
name: spiceai.docs
description: Spice.ai project documentation (github.com/spiceai/spiceai)
params:
github_token: ${secrets:GITHUB_TOKEN}
include: 'docs/**/*.md'
acceleration:
enabled: true
columns:
- name: content
embeddings:
- from: embeddings-model
row_id:
- path
chunking:
enabled: false
target_chunk_size: 256
overlap_size: 64
file_format: md
embeddings:
- from: openai:text-embedding-3-small
name: embeddings-model
params:
openai_api_key: ${secrets:OPENAI_API_KEY}
models:
- from: openai:gpt-4o
name: chat-model
params:
openai_api_key: ${secrets:OPENAI_API_KEY}
tools: auto
system_prompt: |
You are a helpful Spice.ai Docs assistant.
spice run
Result:
2025/01/21 01:19:43 INFO Checking for latest Spice runtime release...
2025/01/21 01:19:44 INFO Spice.ai runtime starting...
2025-01-20T16:19:45.056778Z INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090
2025-01-20T16:19:45.057495Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2025-01-20T16:19:45.057562Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2025-01-20T16:19:45.061178Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2025-01-20T16:19:45.544466Z INFO runtime::init::embedding: Embedding [embeddings-model] ready to embed
2025-01-20T16:19:45.544649Z INFO runtime::init::dataset: Initializing dataset spiceai.docs
2025-01-20T16:19:45.544669Z INFO runtime::init::results_cache: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2025-01-20T16:19:45.544761Z INFO runtime::init::model: Loading model [chat-model] from openai:gpt-4o...
2025-01-20T16:19:46.164600Z INFO runtime::init::dataset: Dataset spiceai.docs registered (github:github.com/spiceai/spiceai/files/trunk), acceleration (arrow), results cache enabled.
2025-01-20T16:19:46.165929Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset spiceai.docs
2025-01-20T16:19:46.534044Z INFO runtime::init::model: Model [chat-model] deployed, ready for inferencing
2025-01-20T16:19:49.394003Z INFO runtime::accelerated_table::refresh_task: Loaded 93 rows (1.28 MiB) for dataset spiceai.docs in 3s 228ms.
- Execute a Basic SQL Query to perform keyword searches within the dataset:
spice sql
Then:
SELECT path
FROM spiceai.docs
WHERE
LOWER(content) LIKE '%errors%'
AND NOT contains(path, 'docs/release_notes');
Result:
+------------------------------+
| path |
+------------------------------+
| docs/criteria/definitions.md |
| docs/dev/error_handling.md |
| docs/dev/metrics.md |
| docs/dev/style_guide.md |
+------------------------------+
Time: 0.006798 seconds. 4 rows.
curl -XPOST http://localhost:8090/v1/search \
-H "Content-Type: application/json" \
-d "{
\"datasets\": [\"spiceai.docs\"],
\"text\": \"TEL metrics naming\",
\"where\": \"not contains(path, 'docs/release_notes')\",
\"additional_columns\": [\"download_url\"],
\"limit\": 2
}"
Result
{
"matches": [
{
"value": "# Metrics Naming\n\n## TL;DR\n\n**Metric Naming Guide**: Prioritize Developer Experience (DX) with intuitive, ...",
"score": 0.7941223368131454,
"dataset": "spiceai.docs",
"metadata": {
"download_url": "https://raw.githubusercontent.com/spiceai/spiceai/trunk/docs/dev/metrics.md"
}
},
{
"value": "# Criteria Definitions\n\n## RC\n\nAcronym for \"Release Candidate\". Identifies a version that is eligible for ...",
"score": 0.7145749783070606,
"dataset": "spiceai.docs",
"metadata": {
"download_url": "https://raw.githubusercontent.com/spiceai/spiceai/trunk/docs/criteria/definitions.md"
}
}
],
"duration_ms": 745
}
Use spice chat
CLI command to query information using natural language
spice chat
Using model: chat-model
Perform test queries:
chat> what datasets you have access to
I have access to the following dataset:
- **Dataset Name:** spice.spiceai.docs
- **Description:** Spice.ai project documentation (github.com/spiceai/spiceai)
- **Can Search Documents:** Yes
This dataset contains documentation related to the Spice.ai project.
chat> What are release criterias?
The release criteria for Spice.ai components, such as models, data accelerators, and catalog connectors, are divided into stages, including Release Candidate (RC) and Stable release criteria. Here are the details for RC Criteria:
### RC Release Criteria
- **Beta Criteria**: All beta release criteria must pass.
- **Performance and Latency**: The model or component must handle consistent requests from several clients without adverse impacts on latency.
- Example: 8 clients sending consistent requests for 60 minutes.
...