Skip to content

Commit

Permalink
Add deepseek cookbook recipe (#64)
Browse files Browse the repository at this point in the history
* Add deepseek cookbook recipe

* update line
  • Loading branch information
Sevenannn authored Jan 22, 2025
1 parent cd8532b commit 631a54e
Show file tree
Hide file tree
Showing 4 changed files with 125 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ The Spice.ai OSS Cookbook is a collection of recipes for building and deploying
- [Nvidia NIM on AWS EC2](./nvidia-nim/ec2/README.md) - Deploy Nvidia NIM on AWS GPU-optimized EC2 instances connected to Spice.
- [Searching GitHub Files](./search_github_files/README.md) - Search GitHub files with embeddings and vector similarity search.
- [xAI Models](./models/xai/README.md) - Use xAI models such as Grok.
- [DeepSeek Model](./deepseek/README.md) - Use DeepSeek model through Spice.

### Data Acceleration - Materializing & accelerating data locally with Data Accelerators

Expand All @@ -51,7 +52,8 @@ The Spice.ai OSS Cookbook is a collection of recipes for building and deploying
- [AWS RDS Aurora (MySQL Compatible)](./mysql/rds-aurora/README.md)
- [PlanetScale](./mysql/planetscale/README.md)
- [Clickhouse Data Connector](./clickhouse/README.md)
- [Databricks Connector](./databricks/README.md) - Delta Lake and Spark Connect
- [Databricks Connector](./databricks/README.md) - Delta Lake and Spark Connect.
- [Delta Lake Connector](./delta-lake/README.md) - Query data from Delta Lake tables.
- [Debezium Change Data Capture (CDC) Data Connector from Postgres](./cdc-debezium/README.md) - Stream changes from a Postgres database to Spice.
- [Debezium CDC SASL/SCRAM Authentication from MySQL](./cdc-debezium/sasl-scram/README.md) - Stream changes from a MySQL database to Spice using SASL/SCRAM authentication.
- [Dremio Data Connector](./dremio/README.md)
Expand Down
1 change: 1 addition & 0 deletions deepseek/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DEEPSEEK_API_KEY=
101 changes: 101 additions & 0 deletions deepseek/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# DeepSeek Model

This recipe demonstrates how to use DeepSeek model in Spice.ai.

## Prerequisites

- Ensure you have the Spice CLI installed. Follow the [Getting Started](https://docs.spiceai.org/getting-started) guide if you haven't done so yet.

## Populate `.env` and Configure Spicepod

Clone this cookbook repo locally:

```bash
git clone https://github.com/spiceai/cookbook.git
cd cookbook/deepseek
```

Populate `.env` with the following:

- `DEEPSEEK_API_KEY`: A valid DeepSeek API key.

Verify that the `spicepod.yaml` is configured as follows:

```yaml
datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
description: taxi trips in s3
params:
file_format: parquet
acceleration:
enabled: true

models:
- from: openai:deepseek-chat
name: deepseek
params:
tools: auto
endpoint: https://api.deepseek.com
openai_api_key: ${secrets:DEEPSEEK_API_KEY}
```
## Run Spice
```shell
spice run
```

Result:

```shell
2025/01/21 14:48:39 INFO Checking for latest Spice runtime release...
2025/01/21 14:48:40 INFO Spice.ai runtime starting...
2025-01-21T22:48:40.569250Z INFO runtime::init::dataset: Initializing dataset taxi_trips
2025-01-21T22:48:40.569580Z INFO runtime::init::model: Loading model [deepseek] from openai:deepseek-chat...
2025-01-21T22:48:40.569646Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2025-01-21T22:48:40.569701Z INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090
2025-01-21T22:48:40.570139Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2025-01-21T22:48:40.572365Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2025-01-21T22:48:40.769265Z INFO runtime::init::results_cache: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2025-01-21T22:48:41.380306Z INFO runtime::init::dataset: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), acceleration (arrow), results cache enabled.
2025-01-21T22:48:41.381620Z INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2025-01-21T22:48:44.001483Z INFO runtime::init::model: Model [deepseek] deployed, ready for inferencing
```

## Utilizing a natural language query

Use `spice chat` CLI command to query information using natural language

```shell
>> spice chat
Using model: deepseek
```

Perform test queries:

```shell
chat> what datasets you have access to
Currently, I have access to the following dataset:

- **Dataset Name**: `spice.public.taxi_trips`
- **Description**: taxi trips in s3
- **Can Search Documents**: No

This dataset contains information about taxi trips stored in S3. If you need more details or want to perform specific queries on this dataset, feel free to ask!

Time: 5.58s (first token 1.09s). Tokens: 1532. Prompt: 1517. Completion: 15 (3.34/s).
```

```shell
chat> how many records in taxi trips dataset
The `taxi_trips` dataset contains **2,964,624** records.

Time: 9.13s (first token 0.93s). Tokens: 1545. Prompt: 1518. Completion: 27 (3.29/s).
```

```shell
The longest taxi trip distance recorded in the dataset is **312,722.3 miles**.

Time: 5.44s (first token 0.90s). Tokens: 1584. Prompt: 1548. Completion: 36 (7.93/s).
```
20 changes: 20 additions & 0 deletions deepseek/spicepod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
version: v1
kind: Spicepod
name: deepseek

datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
description: taxi trips in s3
params:
file_format: parquet
acceleration:
enabled: true

models:
- from: openai:deepseek-chat
name: deepseek
params:
tools: auto
endpoint: https://api.deepseek.com
openai_api_key: ${secrets:DEEPSEEK_API_KEY}

0 comments on commit 631a54e

Please sign in to comment.