The Databricks Unity Catalog Connector makes querying Databricks Unity Catalog tables in Spice simple.
- A Databricks account with a Unity Catalog configured with 1 or more tables. (see the Databricks documentation for more information).
- Spice is installed (see the Getting Started documentation).
Create a Databricks personal access token by following the Databricks documentation.
mkdir databricks-catalog-demo
cd databricks-catalog-demo
spice init
Add the following configuration to your spicepod.yaml
:
catalogs:
- from: databricks:<CATALOG_NAME>
name: db_uc
params:
mode: spark_connect # or delta_lake
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_endpoint: <instance-id>.cloud.databricks.com
databricks_cluster_id: <cluster-id>
For mode
you can choose between spark_connect
or delta_lake
. spark_connect
is the default mode and requires an All-Purpose Compute Cluster to be available. delta_lake
mode queries directly against Delta Lake tables in object storage, and requires Spice to have the necessary permissions to access the object storage directly.
Set the DATABRICKS_TOKEN
environment variable to the Databricks personal access token created in Step 1. A .env
file created in the same directory as spicepod.yaml
can be used to set the variable, i.e.:
echo "DATABRICKS_TOKEN=<token>" > .env
Visit the documentation for more information configuring the Databricks Unity Catalog Connector.
When using the delta_lake
mode, the object storage credentials must be set for Spice to access the data.
params:
mode: delta_lake
databricks_endpoint: <instance-id>.cloud.databricks.com
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_aws_access_key_id: ${env:AWS_ACCESS_KEY_ID}
databricks_aws_secret_access_key: ${env:AWS_SECRET_ACCESS_KEY}
databricks_aws_region: <region> # E.g. us-east-1, us-west-2
databricks_aws_endpoint: <endpoint> # If using an S3-compatible service, like Minio
Set the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variables to the AWS access key and secret key, respectively.
params:
mode: delta_lake
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_azure_storage_account_name: ${env:AZURE_ACCOUNT_NAME}
databricks_azure_account_key: ${env:AZURE_ACCOUNT_KEY}
Set the AZURE_ACCOUNT_NAME
and AZURE_ACCOUNT_KEY
environment variables to the Azure storage account name and account key, respectively.
params:
mode: delta_lake
databricks_token: ${env:DATABRICKS_TOKEN}
databricks_google_service_account: </path/to/service-account.json>
spice run
spice sql
SELECT * FROM db_uc.<SCHEMA_NAME>.<TABLE_NAME> LIMIT 10;
Example:
sql> select trace_id, block_number from db_uc.default.traces limit 10;
+-------------------------------------------------------------------------------+--------------+
| trace_id | block_number |
+-------------------------------------------------------------------------------+--------------+
| call_0x0e981c555b68e4f7847155e348e75de70729d06a5f3f238dd4e7d4e062a62eed_ | 16876417 |
| call_0xf2e97e476aaba415ad6793e5d09e82d7ef52d7d595db956306c44dc4e08d1f72_ | 16876417 |
| call_0x79be0ec50306a78a79eed5c368a38d17ff7d3d51d0c8331e36914b95d8635ef3_5 | 16876417 |
| call_0x79be0ec50306a78a79eed5c368a38d17ff7d3d51d0c8331e36914b95d8635ef3_5_0 | 16876417 |
| call_0x7997d1a4ea8a7ece16d3a306c1e820de66744b7f340e7a51b78a35fad5d789d0_ | 16876417 |
| call_0x7997d1a4ea8a7ece16d3a306c1e820de66744b7f340e7a51b78a35fad5d789d0_0 | 16876417 |
| call_0x7997d1a4ea8a7ece16d3a306c1e820de66744b7f340e7a51b78a35fad5d789d0_0_0 | 16876417 |
| call_0x7997d1a4ea8a7ece16d3a306c1e820de66744b7f340e7a51b78a35fad5d789d0_0_0_0 | 16876417 |
| call_0x7997d1a4ea8a7ece16d3a306c1e820de66744b7f340e7a51b78a35fad5d789d0_0_1 | 16876417 |
| call_0x7997d1a4ea8a7ece16d3a306c1e820de66744b7f340e7a51b78a35fad5d789d0_0_2 | 16876417 |
+-------------------------------------------------------------------------------+--------------+
Time: 1.5179735 seconds. 10 rows.
When datasets are dynamically added or removed in the Databricks catalog, Spice runtime will refresh the catalog accordingly.
Create a new Delta Lake table: In Databricks SQL Editor, run the following query (replace <CATALOG_NAME>
and <SCHEMA_NAME>
with your catalog and schema names):
CREATE TABLE <CATALOG_NAME>.<SCHEMA_NAME>.test_table_no_v2checkpoint
TBLPROPERTIES (
'delta.minReaderVersion' = '1',
'delta.minWriterVersion' = '2'
)
AS
SELECT 1 AS id;
Verify table addition in Spice: Check the Spice runtime log for the new table:
2025-01-18T00:58:48.747333Z INFO data_components::unity_catalog::provider: Refreshed schema <CATALOG_NAME>.<SCHEMA_NAME>. Tables added: test_table_no_v2checkpoint.
Query the new table: In another terminal, run spice sql
and query the new table:
>> spice sql
Welcome to the Spice.ai SQL REPL! Type 'help' for help.
sql> select * from db_uc.<SCHEMA_NAME>.test_table_no_v2checkpoint;
+----+
| id |
+----+
| 1 |
+----+
Delete the table: Run the following command in Databricks to delete the table:
drop table <CATALOG_NAME>.<SCHEMA_NAME>.test_table_no_v2checkpoint;
Verify table removal in Spice: Observe that the table has beem removed in spice runtime log
2025-01-18T00:59:49.121835Z INFO data_components::unity_catalog::provider: Refreshed schema <CATALOG_NAME>.<SCHEMA_NAME>. Tables removed: test_table_no_v2checkpoint.