-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to query Iceberg topics using Snowflake and Open Catalog #957
Draft
kbatuigas
wants to merge
4
commits into
main
Choose a base branch
from
DOC-898-Iceberg-Snowflake-Open-Catalog-doc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
178 changes: 178 additions & 0 deletions
178
modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,178 @@ | ||
= Query Iceberg Topics using Snowflake and Open Catalog | ||
:description: | ||
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration | ||
:page-beta: true | ||
|
||
[NOTE] | ||
==== | ||
include::shared:partial$enterprise-license.adoc[] | ||
==== | ||
|
||
This guide walks you through querying Redpanda topics as Iceberg tables in https://docs.snowflake.com/en/user-guide/tables-iceberg[Snowflake^], with AWS S3 as object storage and a catalog integration using https://other-docs.snowflake.com/en/opencatalog/overview[Open Catalog^]. | ||
|
||
== Prerequisites | ||
|
||
* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-tiered-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables. | ||
** You'll need the S3 bucket's URI to configure it as external storage for Open Catalog. | ||
* A Snowflake account. | ||
* An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account], you need ORGADMIN access in Snowflake. | ||
* An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage. | ||
** Follow this guide to https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[create a catalog] with the S3 bucket configured as external storage. You'll need admin permissions to carry out these steps in AWS: | ||
. Create an IAM policy that grants Open Catalog read and write access to your S3 bucket. | ||
. Create an IAM role and attach the IAM policy to the role. | ||
. After you create a new catalog in Open Catalog, grant the catalog's AWS IAM user access to the S3 bucket. | ||
* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume] set up using the Tiered Storage bucket. | ||
** Follow this guide to https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[configure the external volume with S3]. You can use the same IAM policy as the catalog for the external volume's IAM role and user. | ||
|
||
== Set up catalog integration using Open Catalog | ||
|
||
=== Create a new Open Catalog service connection for Redpanda | ||
|
||
In this step, you'll create a new service connection to integrate the Iceberg-enabled topics into Open Catalog. | ||
|
||
. In Open Catalog, select *Connections*, then *+ Connection*. | ||
. In *Configure Service Connection*, provide a name. Open Catalog will also create a new principal with this name. | ||
. Make sure *Create new principal role* is toggled on. | ||
. Enter a name for the principal role. Then, click *Create*. | ||
|
||
After you create the connection, you'll be provided the client ID and client secret. Save these credentials to add to your cluster configuration in a later step. | ||
|
||
=== Create a catalog role | ||
|
||
Grant privileges to the principal created in the previous step: | ||
|
||
. In Open Catalog, select *Catalogs*, and select your catalog. | ||
. On the *Roles* tab of your catalog, click *+ Catalog Role*. | ||
. Give the catalog role a name. | ||
. Under *Privileges*, select `CATALOG_MANAGE_CONTENT`. This provides full management https://other-docs.snowflake.com/en/opencatalog/access-control#catalog-privileges[privileges] for the catalog. Then, click *Create*. | ||
. On the *Roles* tab of the catalog, click *Grant to Principal Role*. | ||
. Select the catalog role you just created. | ||
. Select the principal role you created earlier. Click *Grant*. | ||
|
||
=== Update cluster configuration | ||
|
||
In this step, you'll configure your Redpanda cluster to enable Iceberg on a topic, as well as set up the integration with Open Catalog. | ||
|
||
. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. You can run `rpk cluster config edit` to update these properties: | ||
+ | ||
[,bash] | ||
---- | ||
iceberg_enabled: true | ||
iceberg_catalog_type: rest | ||
iceberg_rest_catalog_endpoint: <open-catalog-uri> | ||
iceberg_rest_catalog_client_id: <open-catalog-connection-client-id> | ||
iceberg_rest_catalog_client_secret: <open-catalog-connection-client-secret> | ||
iceberg_rest_catalog_prefix: <open-catalog-name> | ||
|
||
# Optional | ||
iceberg_translation_interval_ms_default: 1000 | ||
iceberg_catalog_commit_interval_ms: 1000 | ||
---- | ||
+ | ||
Use your own values for the following placeholders: | ||
+ | ||
-- | ||
- `<open-catalog-uri>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `\https://<snowflake-orgname>-<account-name>.snowflakecomputing.com/polaris/api/catalog`. | ||
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step. | ||
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step. | ||
- `<open-catalog-name>`: The name of your catalog in Open Catalog. | ||
-- | ||
+ | ||
[,bash,role=no-copy] | ||
---- | ||
Successfully updated configuration. New configuration version is 2. | ||
---- | ||
|
||
. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following sets the `key_value` Iceberg mode, which creates the Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record's value. See xref:manage:topic-iceberg-integration.adoc#enable-iceberg-integration[Enable Iceberg integration] for more details on Iceberg modes. | ||
+ | ||
[,bash] | ||
---- | ||
rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value | ||
---- | ||
|
||
. Produce to the topic. For example, | ||
+ | ||
[,bash] | ||
---- | ||
echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n' | ||
---- | ||
|
||
You should see the topic as a table in Open Catalog. | ||
|
||
. In Open Catalog, select *Catalogs*, then open your catalog. | ||
. Under your catalog, you should see the `redpanda` namespace, and a table with the name of your topic. The `redpanda` namespace and the table are automatically created for you. | ||
|
||
== Query Iceberg table in Snowflake | ||
|
||
To query the topic in Snowflake, you'll need to create a https://docs.snowflake.com/en/user-guide/tables-iceberg#catalog-integration[catalog integration^] so that Snowflake has access to the table data and metadata. | ||
|
||
=== Configure catalog integration with Snowflake | ||
|
||
. Run the https://docs.snowflake.com/sql-reference/sql/create-catalog-integration-open-catalog[`CREATE CATALOG INTEGRATION`] command in Snowflake: | ||
+ | ||
[,sql] | ||
---- | ||
CREATE CATALOG INTEGRATION <catalog-integration-name> | ||
CATALOG_SOURCE = POLARIS | ||
TABLE_FORMAT = ICEBERG | ||
CATALOG_NAMESPACE = 'redpanda' | ||
REST_CONFIG = ( | ||
CATALOG_URI = '<open-catalog-uri>' | ||
WAREHOUSE = '<open-catalog-name>' | ||
) | ||
REST_AUTHENTICATION = ( | ||
TYPE = OAUTH | ||
OAUTH_CLIENT_ID = '<open-catalog-connection-client-id>' | ||
OAUTH_CLIENT_SECRET = '<open-catalog-connection-client-secret>' | ||
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL') | ||
) | ||
ENABLED = TRUE; | ||
---- | ||
+ | ||
Use your own values for the following placeholders: | ||
+ | ||
- `<catalog-integration-name>`: Provide a name for your Iceberg catalog integration in Snowflake. | ||
- `<open-catalog-uri>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `https://<snowflake-orgname>-<account-name>.snowflakecomputing.com/polaris/api/catalog` | ||
- `<open-catalog-name>`: The name of your catalog in Open Catalog. | ||
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step. | ||
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step. | ||
|
||
=== Create Iceberg table in Snowflake | ||
|
||
After creating the catalog integration, you must create an externally-managed table in Snowflake. You'll run your Snowflake queries against this table. | ||
|
||
. Run the https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-rest[CREATE ICEBERG TABLE] command in Snowflake. The following exampe also specifies that the table should automatically refresh metadata: | ||
+ | ||
[,sql] | ||
---- | ||
CREATE ICEBERG TABLE <table-name> | ||
CATALOG = '<catalog-integration-name>' | ||
EXTERNAL_VOLUME = '<iceberg-external-volume-name>' | ||
CATALOG_TABLE_NAME = '<topic-name>' | ||
AUTO_REFRESH = TRUE | ||
---- | ||
+ | ||
Use your own values for the following placeholders: | ||
+ | ||
- `<table-name>`: Provide a name for your table in Snowflake. | ||
- `<catalog-integration-name>`: The name of the catalog integration you configured in an earlier step. | ||
- `<iceberg-external-volume-name>`: The name of the external volume you configured using the Tiered Storage bucket. | ||
- `<topic-name>`: The name of the table in your catalog, which is the same as your Redpanda topic name. | ||
|
||
=== Query table | ||
|
||
To verify that Snowflake has successfully created the table containing the topic data, run the following: | ||
|
||
[,sql] | ||
---- | ||
SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog-integration-name>'); | ||
|
||
SELECT * FROM <table-name>; | ||
SELECT * FROM <table-name> WHERE redpanda:offset > 100; | ||
---- | ||
|
||
// Query results example | ||
[,bash] | ||
---- | ||
|
||
---- |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "polaris" or "open" be part of the page URL? I guess wondering for SEO purposes