diff --git a/README.md b/README.md index f1ca9f4..dab2ee2 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ The Spice.ai OSS Cookbook is a collection of recipes for building and deploying - [xAI Models](./models/xai/README.md) - Use xAI models such as Grok. - [DeepSeek Model](./deepseek/README.md) - Use DeepSeek model through Spice. - [Filesystem Hosted Model](./models/filesystem/README.md) - Use models hosted directly on filesystems. +- [Web Search Tools using Perplexity)[./websearch/README.md) - Provide LLMs with web search access for more informed answers. ### Data Acceleration - Materializing & accelerating data locally with Data Accelerators @@ -81,6 +82,8 @@ The Spice.ai OSS Cookbook is a collection of recipes for building and deploying - [Deploying to Kubernetes](./kubernetes/README.md) - [Running in Docker](./docker/README.md) +- [Sidecar Deployment Architecture](./architectures/sidecar/README.md) +- [Microservice Deployment Architecture](./architectures/microservice/README.md) ### Performance diff --git a/architectures/microservice/README.md b/architectures/microservice/README.md new file mode 100644 index 0000000..3a9dedd --- /dev/null +++ b/architectures/microservice/README.md @@ -0,0 +1,188 @@ +# Microservice Deployment Architecture + +The microservice deployment pattern runs the Spice.ai Runtime as an independent service, optionally with multiple replicas behind a load balancer. This architecture provides scalability and flexibility in serving multiple applications while maintaining high availability. + +## Architecture Overview + +In a microservice deployment, one or more Spice Runtime instances operate independently from the applications they serve. Applications communicate with the runtime through HTTP/gRPC over the network, typically via a load balancer that distributes requests across available runtime instances. + +```mermaid +graph TD + subgraph cluster["Kubernetes Cluster"] + LB[Load Balancer] + subgraph "Application Pods" + A1[Application 1] + A2[Application 2] + A3[Application 3] + end + subgraph "Spice Runtime Pods" + S1[Spice Runtime 1] + S2[Spice Runtime 2] + S3[Spice Runtime 3] + end + A1 -->|HTTP/gRPC| LB + A2 -->|HTTP/gRPC| LB + A3 -->|HTTP/gRPC| LB + LB --> S1 + LB --> S2 + LB --> S3 + end + S1 -->|Pull| D[(External Data Sources)] + S2 -->|Pull| D + S3 -->|Pull| D + style A1 fill:#2d5a88,stroke:#c9def1,color:#c9def1 + style A2 fill:#2d5a88,stroke:#c9def1,color:#c9def1 + style A3 fill:#2d5a88,stroke:#c9def1,color:#c9def1 + style S1 fill:#4a769c,stroke:#c9def1,color:#c9def1 + style S2 fill:#4a769c,stroke:#c9def1,color:#c9def1 + style S3 fill:#4a769c,stroke:#c9def1,color:#c9def1 + style LB fill:#6b93b8,stroke:#c9def1,color:#c9def1 + style D fill:#6b93b8,stroke:#c9def1,color:#c9def1 + style cluster fill:#1e3f66,stroke:#c9def1,color:#c9def1 +``` + +## Key Benefits and Considerations + +The microservice architecture offers centralized management of data acceleration and caching while enabling independent scaling of both applications and the runtime. This approach efficiently serves multiple applications and teams, reducing duplication of data and resources across the organization. + +However, network communication introduces additional latency compared to sidecar deployments. The architecture requires careful consideration of service discovery, load balancing, and network security. Resource allocation must account for the combined needs of all consuming applications. + +## Configuration Examples + +> [!TIP] +> Start off with the simplest configuration (i.e. full refresh) and then move to more complex configurations (i.e. append mode, CDC) as the dataset size and refresh requirements increase. + +### Simple Full Refresh + +This example demonstrates a basic configuration for a product catalog, suitable for smaller datasets that change periodically: + +```yaml +version: v1 +kind: Spicepod +name: product-catalog + +datasets: + - from: https://api.company.com/v1/products + name: products + description: Product catalog data for active electronics category + params: + http_username: api-user + http_password: ${secrets:API_KEY} + acceleration: + enabled: true + engine: duckdb + refresh_mode: full # Replace entire dataset on each refresh + refresh_sql: | # Accelerate specific product subset + SELECT * FROM products + WHERE category = 'electronics' + AND status = 'active' + refresh_check_interval: 1h # Refresh hourly or via API +``` + +### Time-Based Append Mode + +This example shows a configuration for customer interaction data, optimized for a dataset that only appends data or updates data with a timestamp column to indicate when the data was updated. + +```yaml +version: v1 +kind: Spicepod +name: customer-portal + +datasets: + - from: https://customer-events.company.com/v1/interactions + name: customer-interactions + description: Customer support interactions and engagement history + time_column: interaction_timestamp # Column used to track when data is updated + params: + http_username: customer-service + http_password: ${secrets:CUSTOMER_API_KEY} + client_timeout: 30s + acceleration: + enabled: true + engine: duckdb # Persist the accelerated data to a DuckDB file + mode: file + refresh_mode: append # Append only the data that has changed since the last refresh + refresh_sql: | # Configure the initial load of the dataset to only load data from the last 90 days + SELECT * FROM customer_interactions + WHERE interaction_timestamp >= NOW() - INTERVAL '90 days' + primary_key: interaction_id # Primary key is required if data is updated in place as opposed to only appending new data + on_conflict: + interaction_id: upsert # Tell the runtime how to handle conflicts when updating data in place, i.e. update the existing row with the new data + refresh_check_interval: 30s # Refresh the data every 30 seconds + refresh_retry_enabled: true # Retry the refresh if it fails + refresh_retry_max_attempts: 3 # Retry the refresh up to 3 times + retention_check_enabled: true # Check if the data is older than the retention period + retention_period: 90d # Retain the data for 90 days + retention_check_interval: 24h # Run a cleanup of old data every 24 hours +``` + +### Kubernetes Deployment Configuration + +This example demonstrates how to configure the Spice Runtime deployment in Kubernetes with proper resource management and scaling: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: spice-runtime + namespace: spice-system +spec: + replicas: 3 # Initial number of replicas, scale up/down as needed + template: + spec: + containers: + ... +``` + +By default the Spice runtime only listens on the localhost interface, meaning it is not accessible from outside the pod. The following PodSpec configuration exposes the runtime APIs on all interfaces and the default ports. + +```yaml +containers: + - name: spiceai + image: spiceai/spiced:latest + imagePullPolicy: Always + workingDir: /app + command: + [ + "/usr/local/bin/spiced", + "--http", + "0.0.0.0:8090", + "--metrics", + "0.0.0.0:9090", + "--flight", + "0.0.0.0:50051", + "--open_telemetry", + "0.0.0.0:50052" + ] +``` + +> [!WARNING] +> The above configuration exposes the runtime on all interfaces and the default ports over insecure HTTP/gRPC and without authentication. Consider securing the runtime with [TLS](https://spiceai.org/docs/api/tls) and adding [API key authentication](https://spiceai.org/docs/api/auth) for production environments. + +## Operational Considerations + +Consider using an autoscaler such as the [Kubernetes Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) to automatically scale the number of runtime replicas based on CPU utilization, memory usage, request latency, and active concurrent connections. + +High availability is achieved by running multiple replicas in Kubernetes. Node selectors and taints can be used to ensure that the runtime pods are scheduled across specific nodes to improve fault tolerance. Consistent health checks and readiness probes verify that each replica contains the correct data and is ready to serve requests. Rolling updates combined with specific resource requests and limits help maintain uninterrupted service during maintenance and outages. + +Monitoring and alerting form a vital part of sustaining system stability. Spice provides several [metrics](https://spiceai.org/docs/features/observability#metrics) that can be used to monitor the runtime. + +Network security relies on secure communication channels and access controls. Transport Layer Security (TLS) secures data in transit, while authentication and network policies restrict access to sensitive APIs. In some environments, a service mesh provides further security measures. Regular audits and updates address emerging vulnerabilities. For more information about network security in Kubernetes, consult the [Kubernetes Network Policies documentation](https://kubernetes.io/docs/concepts/services-networking/network-policies/). + +## Use Case Evaluation + +The microservice pattern is ideal for: + +- Organizations with multiple teams or applications requiring data acceleration +- Scenarios requiring independent scaling of the runtime +- Cases where centralized management of data and resources is preferred +- Applications that can tolerate some network latency + +Consider alternative architectures when: + +- Ultra-low latency is required (consider sidecar pattern) +- Network bandwidth is constrained +- Applications have strict data isolation requirements +- The overhead of managing a distributed system outweighs the benefits + +For additional deployment patterns, refer to the [Deployment Architectures Overview](https://spiceai.org/docs/deployment/architectures). diff --git a/architectures/sidecar/README.md b/architectures/sidecar/README.md new file mode 100644 index 0000000..717dc7c --- /dev/null +++ b/architectures/sidecar/README.md @@ -0,0 +1,132 @@ +# Sidecar Deployment Architecture + +The sidecar deployment pattern runs the Spice.ai Runtime as a companion container or process alongside the main application on the same host. This architecture provides low-latency access to accelerated data through localhost communication. + +## Architecture Overview + +In a sidecar deployment, each application pod includes both the primary application container and a Spice Runtime container. The containers communicate over localhost. The Spice Runtime container manages data acceleration and caching, pulling data from external sources based on configured refresh strategies. + +```mermaid +graph TD + subgraph cluster["Kubernetes Cluster"] + subgraph "Application Pod 1" + A1[Primary Application 1] -->|localhost| B1[Spice Runtime] + end + subgraph "Application Pod 2" + A2[Primary Application 2] -->|localhost| B2[Spice Runtime] + end + subgraph "Application Pod N" + A3[Primary Application N] -->|localhost| B3[Spice Runtime] + end + end + B1 -->|Pull| D[(External Data Sources)] + B2 -->|Pull| D + B3 -->|Pull| D + style A1 fill:#2d5a88,stroke:#c9def1,color:#c9def1 + style A2 fill:#2d5a88,stroke:#c9def1,color:#c9def1 + style A3 fill:#2d5a88,stroke:#c9def1,color:#c9def1 + style B1 fill:#4a769c,stroke:#c9def1,color:#c9def1 + style B2 fill:#4a769c,stroke:#c9def1,color:#c9def1 + style B3 fill:#4a769c,stroke:#c9def1,color:#c9def1 + style D fill:#6b93b8,stroke:#c9def1,color:#c9def1 + style cluster fill:#1e3f66,stroke:#c9def1,color:#c9def1 +``` + +## Key Benefits and Considerations + +The sidecar architecture minimizes latency between the application and runtime through direct localhost communication. Co-located containers share lifecycle management, eliminating the need for additional service discovery or complex networking. Data remains close to the application that needs it, and each application instance scales independently with its own data acceleration. + +However, this approach requires dedicated runtime resources for each application instance, leading to data replication across instances. Applications must handle runtime initialization, and runtime updates necessitate updating all application pods and vice versa. The sidecar pattern is best suited for low-latency applications with small to moderate scaling requirements. + +## Configuration Examples + +> [!TIP] +> Start off with the simplest configuration (i.e. full refresh) and then move to more complex configurations (i.e. append mode, CDC) as the dataset size and refresh requirements increase. + +### Simple Full Refresh + +This example demonstrates a basic configuration for a product catalog, suitable for smaller datasets that change periodically: + +```yaml +version: v1 +kind: Spicepod +name: product-catalog + +datasets: + - from: https://api.company.com/v1/products + name: products + description: Product catalog data for active electronics category + params: + http_username: api-user + http_password: ${secrets:API_KEY} + acceleration: + enabled: true + engine: duckdb + refresh_mode: full # Replace entire dataset on each refresh + refresh_sql: | # Accelerate specific product subset + SELECT * FROM products + WHERE category = 'electronics' + AND status = 'active' + refresh_check_interval: 1h # Refresh hourly or via API +``` + +### Time-Based Append Mode + +This example shows a configuration for customer interaction data, optimized for a dataset that only appends data or updates data with a timestamp column to indicate when the data was updated. + +```yaml +version: v1 +kind: Spicepod +name: customer-portal + +datasets: + - from: https://customer-events.company.com/v1/interactions + name: customer-interactions + description: Customer support interactions and engagement history + time_column: interaction_timestamp # Column used to track when data is updated + params: + http_username: customer-service + http_password: ${secrets:CUSTOMER_API_KEY} + client_timeout: 30s + acceleration: + enabled: true + engine: duckdb # Persist the accelerated data to a DuckDB file + mode: file + refresh_mode: append # Append only the data that has changed since the last refresh + refresh_sql: | # Configure the initial load of the dataset to only load data from the last 90 days + SELECT * FROM customer_interactions + WHERE interaction_timestamp >= NOW() - INTERVAL '90 days' + primary_key: interaction_id # Primary key is required if data is updated in place as opposed to only appending new data + on_conflict: + interaction_id: upsert # Tell the runtime how to handle conflicts when updating data in place, i.e. update the existing row with the new data + refresh_check_interval: 30s # Refresh the data every 30 seconds + refresh_retry_enabled: true # Retry the refresh if it fails + refresh_retry_max_attempts: 3 # Retry the refresh up to 3 times + retention_check_enabled: true # Check if the data is older than the retention period + retention_period: 90d # Retain the data for 90 days + retention_check_interval: 24h # Run a cleanup of old data every 24 hours +``` + +## Operational Considerations + +### Dataset Size Management + +Small datasets perform well with in-memory acceleration using Arrow/DuckDB. Medium-sized datasets benefit from file-mode DuckDB for persistence between restarts and improved startup times. Large datasets may require investigating an alternative architecture if performance or startup times are not acceptable. + +The choice between these approaches depends on host machine performance and network speed between the runtime and data source. Starting with a simple full refresh configuration and progressing to more complex configurations (i.e. append mode) as requirements evolve is recommended. + +### Refresh Strategy Selection + +The full refresh strategy works effectively for small datasets with periodic updates and cases requiring strict data consistency. Append mode suits time-series data and continuous data streams with reliable timestamp columns. + +### Resource Management and Monitoring + +Effective resource management requires monitoring both memory and CPU usage patterns. Key metrics to track include runtime memory utilization, refresh operation duration, query response times, and cache effectiveness. Setting appropriate resource limits and requests helps prevent resource contention. + +Common operational challenges include slow startup times, memory pressure, and data freshness concerns. These can be addressed through dataset optimization, appropriate resource allocation, and monitoring of refresh operations. + +## Use Case Evaluation + +The sidecar pattern is most effective for applications requiring minimal data access latency, with small to medium dataset sizes and limited deployment instances. Alternative architectures should be considered when dataset sizes grow too large, deployments require many instances, or complex data sharing patterns exist. + +For additional deployment patterns, refer to the [Deployment Architectures Overview](https://spiceai.org/docs/deployment/architectures). diff --git a/websearch/.env b/websearch/.env new file mode 100644 index 0000000..f280538 --- /dev/null +++ b/websearch/.env @@ -0,0 +1,2 @@ +SPICE_PERPLEXITY_AUTH_TOKEN="" +SPICE_OPENAI_API_KEY="" diff --git a/websearch/README.md b/websearch/README.md new file mode 100644 index 0000000..42fbdd8 --- /dev/null +++ b/websearch/README.md @@ -0,0 +1,160 @@ +# Web Search Using Perplexity +This recipe demonstrates how to configure and use Perplexity web search within Spice AI. + +## Prerequisites + +- Ensure you have the Spice CLI installed. Follow the [Getting Started](https://docs.spiceai.org/getting-started) guide if you haven't done so. +- Populate `.env` in your working directory with the required secrets: + - `SPICE_PERPLEXITY_AUTH_TOKEN`: A valid authentication token for the Perplexity API. Obtain it from [Perplexity's Getting Started guide](https://docs.perplexity.ai/guides/getting-started). + - `SPICE_OPENAI_API_KEY`: A valid OpenAI API key (or equivalent). + +## Using Perplexity for internet-informed conversations. +1. Start the Spice runtime: + +```shell +spice run +``` + +2. In a separate terminal, start a chat session: + +```shell +spice chat +``` + +3. Select `perp` +```shell +Use the arrow keys to navigate: ↓ ↑ → ← +? Select model: + openai-w-internet + ▸ perp +``` + +4. Ask a question + +```shell +spice chat +``` +```shell +Using model: perp +chat> What's the weather in Korea this week? +Here's a summary of the weather in Seoul, South Korea, for this week: + +- **Current Weather**: Light snow with mostly cloudy conditions. The temperature is around 0°C (32°F) with a feels-like temperature of -2°C (28°F)[1]. + +- **Forecast for the Week**: + | Day | High Temperature | Low Temperature | Conditions | + |------------|------------------|-----------------|-----------------------------| + | Fri, Feb 7 | -6°C (21°F) | -13°C (9°F) | Light snow early, morning clouds[2] | + | Sat, Feb 8 | -4°C (25°F) | -11°C (12°F) | Sunny | + | Sun, Feb 9 | -1°C (30°F) | -9°C (16°F) | Sunny | + | Mon, Feb 10| 4°C (39°F) | -3°C (27°F) | Scattered clouds | + | Tue, Feb 11| 1°C (34°F) | -5°C (23°F) | Afternoon clouds | + | Wed, Feb 12| 4°C (39°F) | -2°C (28°F) | Snow changing to rain | + +The weather will transition from cold and snowy to clearer conditions by the weekend, with temperatures gradually rising towards the end of the week[2][3][5]. + +Time: 5.02s (first token 1.50s). Tokens: 345. Prompt: 35. Completion: 310 (88.28/s). +``` + +5. Check the citations used by the model +```shell +spice sql +``` +```sql +SELECT captured_output +FROM runtime.task_history +WHERE task='citations' +ORDER BY end_time DESC +LIMIT 1; +``` +```json +[ + "https://www.timeanddate.com/weather/south-korea/seoul", + "https://www.timeanddate.com/weather/south-korea/seoul/ext", + "https://www.accuweather.com/en/kr/seoul/226081/weather-forecast/226081", + "https://www.theweathernetwork.com/en/city/kr/seoul/seoul/14-days", + "https://www.accuweather.com/en/kr/seoul/226081/daily-weather-forecast/226081" +] +``` + + +## Using the Web search tool +Web search engines, like Perplexity, can be used by other models to enable internet-informed conversations. + +1. Start the Spice runtime: + +```shell +spice run +``` + +2. Call the web search tool directly (named `the_internet` in the `spicepod.yaml`). +```shell +curl -XPOST http://127.0.0.1:8090/v1/tool/the_internet \ + --data '{"query": "What is the surf like today?"}' +``` + +## Let the LLM use the web search tool + +1. Start the Spice runtime: + +```shell +spice run +``` + +2. In a separate terminal, start a chat session: + +```shell +spice chat +``` +3. Select `openai-w-internet` +```shell +Use the arrow keys to navigate: ↓ ↑ → ← +? Select model: + ▸ openai-w-internet + perp +``` + +4. Ask a question + +```shell +>>> spice chat +Using model: openai-w-internet +chat> Where should I surf in California today? +``` +```shell +Today's surf conditions in California appear to be quite minimal. Here's a summary from a few key locations: + +### Ventura County +- **Surf Height**: Ankle high or less +- **Conditions**: Semi glassy in the morning; light sideshore texture in the afternoon +- **Tides**: Low at 5:43 AM, High at 12:18 PM, Low at 4:47 PM, High at 10:49 PM +- **Sunrise/Sunset**: 5:52 AM / 8:12 PM + +### Trestles +- **Surf Height**: Currently reported as 1 ft (some sources indicate it could be 2-4 ft) +- **Conditions**: Clean with nearby winds at NNE +- **Tides**: High at 12:05 PM, Low at 3:51 AM + +Overall, surf conditions are generally flat across these regions, making it less than ideal for surfing today. You may wish to check more specific spots or potentially consider inland activities or relaxing on the beach. + +For more detailed local conditions, you can check: +- [Ventura Surf Forecast](https://www.swellinfo.com/surf-forecast/ventura-california) +- [Orange County Surf Forecast](https://www.swellinfo.com/surf-forecast/orange-county-california-south) +- [Trestles Forecast](https://surfcaptain.com/forecast/trestles-california) + +Time: 18.28s (first token 10.02s). Tokens: 183. Prompt: 160. Completion: 23 (2.78/s). +``` + +5. Check that the LLM did use the internet +```shell +spice trace ai_chat +``` +```shell +[0c239b47bcb03b01] (18276.11ms) ai_chat + ├── [1f4e0a007afdbbd7] (18274.71ms) ai_completion + ├── [f21d61726384a63f] ( 0.33ms) tool_use::websearch + ├── [cce69b405aeb931b] (17364.89ms) ai_completion + ├── [a47dfed093941255] ( 6299.55ms) tool_use::websearch + │ └── [4632753af5cbb6be] ( 6299.06ms) citations + └── [5020a8af033160e0] (10066.56ms) ai_completion +``` diff --git a/websearch/spicepod.yaml b/websearch/spicepod.yaml new file mode 100644 index 0000000..561fbbf --- /dev/null +++ b/websearch/spicepod.yaml @@ -0,0 +1,30 @@ +version: v1 +kind: Spicepod +name: websearch + +runtime: + task_history: + captured_output: truncated + +models: + - name: openai-w-internet + from: openai:gpt-4o-mini + params: + openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY } + tools: the_internet + system_prompt: | + You have access to the internet. You don't know what day it is today. + + - name: perp + from: perplexity:sonar + params: + perplexity_auth_token: ${ secrets:SPICE_PERPLEXITY_AUTH_TOKEN } + system_prompt: | + When asked a question with dates, locations or numbers, use an ASCII table when appropriate. Don't overuse it. +tools: + - name: the_internet + from: websearch + description: "Search the web for information" + params: + engine: perplexity + perplexity_auth_token: ${ secrets:SPICE_PERPLEXITY_AUTH_TOKEN }