Skip to content

Latest commit

 

History

History
147 lines (116 loc) · 6.19 KB

README.md

File metadata and controls

147 lines (116 loc) · 6.19 KB

In-Memory Arrow Data Accelerator

Create a connector instance using sample data and accelerate it using In-Memory Arrow Data Accelerator.

Requirements

Follow these steps

Step 1. Initialize a new Spice app.

spice init arrow-acceleration-qs
cd arrow-acceleration-qs

Step 2. Configure s3 dataset: copy and paste the YAML below to spicepod.yaml in the Spice app.

version: v1
kind: Spicepod
name: arrow-acceleration-qs
datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    description: taxi trips in s3
    params:
      file_format: parquet

Step 3. Start the Spice runtime.

spice run

Confirm in the terminal output the taxi_trips dataset has been loaded:

2024/10/22 12:27:07 INFO Checking for latest Spice runtime release...
2024/10/22 12:27:07 INFO Spice.ai runtime starting...
2024-10-22T19:27:08.372600Z  INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-10-22T19:27:08.372667Z  INFO runtime::metrics_server: Spice Runtime Metrics listening on 127.0.0.1:9090
2024-10-22T19:27:08.372790Z  INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:8090
2024-10-22T19:27:08.373723Z  INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
2024-10-22T19:27:08.572674Z  INFO runtime: Initialized results cache; max size: 128.00 MiB, item ttl: 1s
2024-10-22T19:27:08.582201Z  INFO runtime: Tool [document_similarity] ready to use
2024-10-22T19:27:08.582226Z  INFO runtime: Tool [table_schema] ready to use
2024-10-22T19:27:08.582231Z  INFO runtime: Tool [sql] ready to use
2024-10-22T19:27:08.582236Z  INFO runtime: Tool [list_datasets] ready to use
2024-10-22T19:27:08.582242Z  INFO runtime: Tool [random_sample] ready to use
2024-10-22T19:27:08.582245Z  INFO runtime: Tool [sample_distinct_columns] ready to use
2024-10-22T19:27:08.582251Z  INFO runtime: Tool [top_n_sample] ready to use
2024-10-22T19:27:09.991363Z  INFO runtime: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), results cache enabled.

Step 4. Run queries against the dataset using the Spice SQL REPL.

In a new terminal, start the Spice SQL REPL

spice sql

Query the taxi_trips dataset, observing the long query time.

select "VendorID", tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count from taxi_trips limit 10;
+----------+----------------------+-----------------------+-----------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count |
+----------+----------------------+-----------------------+-----------------+
| 2        | 2024-01-29T19:28:41  | 2024-01-29T19:36:46   | 2               |
| 1        | 2024-01-29T19:22:21  | 2024-01-29T19:28:45   | 2               |
| 1        | 2024-01-29T19:50:24  | 2024-01-29T20:09:21   | 2               |
| 1        | 2024-01-29T19:43:52  | 2024-01-29T20:01:40   | 2               |
| 1        | 2024-01-29T19:09:57  | 2024-01-29T19:55:36   | 2               |
| 1        | 2024-01-29T19:51:28  | 2024-01-29T20:09:16   | 2               |
| 1        | 2024-01-29T19:23:46  | 2024-01-29T19:31:06   | 2               |
| 2        | 2024-01-29T19:01:27  | 2024-01-29T19:09:07   | 2               |
| 1        | 2024-01-29T19:13:53  | 2024-01-29T19:23:09   | 2               |
| 1        | 2024-01-29T19:53:55  | 2024-01-29T20:06:56   | 2               |
+----------+----------------------+-----------------------+-----------------+

Time: 4.291336125 seconds. 10 rows.

Step 5. Update the spicepod.yaml to enable In-Memory Arrow acceleration.

version: v1
kind: Spicepod
name: arrow-acceleration-qs
datasets:
  - from: s3://spiceai-demo-datasets/taxi_trips/2024/
    name: taxi_trips
    description: taxi trips in s3
    params:
      file_format: parquet
    acceleration:
      enabled: true

Step 6. Save the changes in Spice app and observe the dataset updating and accelerating.

2024-10-22T19:28:24.204608Z  INFO runtime: Updating accelerated dataset taxi_trips...
2024-10-22T19:28:25.202828Z  INFO runtime::accelerated_table::refresh_task: Loading data for dataset taxi_trips
2024-10-22T19:29:07.729346Z  INFO runtime::accelerated_table::refresh_task: Loaded 2,964,624 rows (398.86 MiB) for dataset taxi_trips in 42s 525ms.
2024-10-22T19:29:09.217425Z  INFO runtime: Dataset taxi_trips registered (s3://spiceai-demo-datasets/taxi_trips/2024/), acceleration (arrow), results cache enabled.

Step 7. Run a query against the taxi_trips dataset again, observing the fast query time.

select "VendorID", tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count from taxi_trips limit 10;
+----------+----------------------+-----------------------+-----------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count |
+----------+----------------------+-----------------------+-----------------+
| 1        | 2024-01-12T19:14:53  | 2024-01-12T19:28:34   | 2               |
| 2        | 2024-01-12T19:03:12  | 2024-01-12T19:17:19   | 2               |
| 2        | 2024-01-12T19:34:22  | 2024-01-12T19:38:11   | 2               |
| 2        | 2024-01-12T19:44:51  | 2024-01-12T19:49:40   | 2               |
| 1        | 2024-01-12T19:31:54  | 2024-01-12T19:38:43   | 2               |
| 2        | 2024-01-12T19:54:37  | 2024-01-12T19:59:55   | 2               |
| 1        | 2024-01-12T19:02:32  | 2024-01-12T19:12:25   | 2               |
| 1        | 2024-01-12T19:22:38  | 2024-01-12T19:37:30   | 2               |
| 2        | 2024-01-12T19:34:30  | 2024-01-12T20:31:18   | 2               |
| 2        | 2024-01-12T19:54:06  | 2024-01-12T20:07:29   | 2               |
+----------+----------------------+-----------------------+-----------------+

Time: 0.013083584 seconds. 10 rows.

Learn more