The Spice.ai Cloud Platform has many datasets that can be used within Spice. A valid login for the Spice.ai Cloud Platform is required to access the datasets. Before beginning this recipe, link your GitHub account to Spice.ai to get access to the platform.
Step 1. Initialize a Spice project:
spice init spiceai-demo
cd spiceai-demo
Step 2. Use spice login
to store the Spice.ai Cloud Platform API Key and Token.
spice login
A browser window will open displaying a code that will appear in the terminal. Select Approve if the authorization codes match.
There will be a confirmation in the terminal that login was successful:
Successfully logged in to Spice.ai as your_user ([email protected])
Using app your_user/your_app
A .env
file is created in the spiceai-demo
directory with the following content:
SPICE_SPICEAI_API_KEY=<api_key>
SPICE_SPICEAI_TOKEN=<api_token>
Step 3. Start the Spice runtime.
spice run
Step 4. Configure the dataset to connect to Spice.ai:
Open a new terminal window in the spiceai-demo directory.
spice dataset configure
Enter the name of the dataset:
dataset name: (spiceai-demo) taxi_trips
Enter the description of the dataset:
description: Taxi trips in New York City
Specify the location of the dataset:
from: spice.ai/spiceai/quickstart/datasets/taxi_trips
Select "n" when prompted whether to locally accelerate the dataset:
Locally accelerate (y/n)? n
The CLI will confirm the dataset has been configured with the following output:
Saved datasets/taxi_trips/dataset.yaml
The content of dataset.yaml is the following:
cat datasets/taxi_trips/dataset.yaml
from: spice.ai/spiceai/quickstart/datasets/taxi_trips
name: taxi_trips
description: Taxi trips in New York City
The Spice runtime terminal will show that the dataset has been loaded:
2024-12-16T14:40:29.181034Z INFO runtime::init::dataset: Dataset taxi_trips registered (spice.ai/spiceai/quickstart/datasets/taxi_trips), results cache enabled.
Step 5. Run queries against the dataset using the Spice SQL REPL.
In a new terminal, start the Spice SQL REPL
spice sql
You can now now query taxi_trips
in the runtime.
SELECT tpep_pickup_datetime, passenger_count, trip_distance FROM taxi_trips ORDER BY tpep_pickup_datetime LIMIT 10;
+----------------------+-----------------+---------------+
| tpep_pickup_datetime | passenger_count | trip_distance |
+----------------------+-----------------+---------------+
| 2002-12-31T22:59:39 | 1 | 0.63 |
| 2002-12-31T22:59:39 | 1 | 0.63 |
| 2009-01-01T00:24:09 | 2 | 10.88 |
| 2009-01-01T23:30:39 | 1 | 10.99 |
| 2009-01-01T23:58:40 | 1 | 0.46 |
| 2023-12-31T23:39:17 | 2 | 0.47 |
| 2023-12-31T23:41:02 | 1 | 0.4 |
| 2023-12-31T23:47:28 | 2 | 1.44 |
| 2023-12-31T23:49:12 | 1 | 3.14 |
| 2023-12-31T23:54:27 | 1 | 7.7 |
+----------------------+-----------------+---------------+
Time: 0.852775583 seconds. 10 rows.
Next Steps This recipe queries the Spice.ai Cloud Platform directly without any acceleration. Experiment with different acceleration options using Spice Data Accelerators.
View the Spice.ai documentation and search on spicerack.org to explore and experiment with retrieving and accelerating multiple datasets to use with Spice.