Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to query Iceberg topics using Snowflake and Open Catalog #957

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "polaris" or "open" be part of the page URL? I guess wondering for SEO purposes

Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
= Query Iceberg Topics using Snowflake and Open Catalog
:description:
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
:page-beta: true

[NOTE]
====
include::shared:partial$enterprise-license.adoc[]
====

This guide walks you through querying Redpanda topics as Iceberg tables in https://docs.snowflake.com/en/user-guide/tables-iceberg[Snowflake^], with AWS S3 as object storage and a catalog integration using https://other-docs.snowflake.com/en/opencatalog/overview[Open Catalog^].

== Prerequisites

* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-tiered-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables.
** You'll need the S3 bucket's URI to configure it as external storage for Open Catalog.
* A Snowflake account.
* An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account], you need ORGADMIN access in Snowflake.
* An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage.
** Follow this guide to https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[create a catalog] with the S3 bucket configured as external storage. You'll need admin permissions to carry out these steps in AWS:
. Create an IAM policy that grants Open Catalog read and write access to your S3 bucket.
. Create an IAM role and attach the IAM policy to the role.
. After you create a new catalog in Open Catalog, grant the catalog's AWS IAM user access to the S3 bucket.
* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume] set up using the Tiered Storage bucket.
** Follow this guide to https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[configure the external volume with S3]. You can use the same IAM policy as the catalog for the external volume's IAM role and user.

== Set up catalog integration using Open Catalog

=== Create a new Open Catalog service connection for Redpanda

In this step, you'll create a new service connection to integrate the Iceberg-enabled topics into Open Catalog.

. In Open Catalog, select *Connections*, then *+ Connection*.
. In *Configure Service Connection*, provide a name. Open Catalog will also create a new principal with this name.
. Make sure *Create new principal role* is toggled on.
. Enter a name for the principal role. Then, click *Create*.

After you create the connection, you'll be provided the client ID and client secret. Save these credentials to add to your cluster configuration in a later step.

=== Create a catalog role

Grant privileges to the principal created in the previous step:

. In Open Catalog, select *Catalogs*, and select your catalog.
. On the *Roles* tab of your catalog, click *+ Catalog Role*.
. Give the catalog role a name.
. Under *Privileges*, select `CATALOG_MANAGE_CONTENT`. This provides full management https://other-docs.snowflake.com/en/opencatalog/access-control#catalog-privileges[privileges] for the catalog. Then, click *Create*.
. On the *Roles* tab of the catalog, click *Grant to Principal Role*.
. Select the catalog role you just created.
. Select the principal role you created earlier. Click *Grant*.

=== Update cluster configuration

In this step, you'll configure your Redpanda cluster to enable Iceberg on a topic, as well as set up the integration with Open Catalog.

. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. You can run `rpk cluster config edit` to update these properties:
+
[,bash]
----
iceberg_enabled: true
iceberg_catalog_type: rest
iceberg_rest_catalog_endpoint: <open-catalog-uri>
iceberg_rest_catalog_client_id: <open-catalog-connection-client-id>
iceberg_rest_catalog_client_secret: <open-catalog-connection-client-secret>
iceberg_rest_catalog_prefix: <open-catalog-name>

# Optional
iceberg_translation_interval_ms_default: 1000
iceberg_catalog_commit_interval_ms: 1000
----
+
Use your own values for the following placeholders:
+
--
- `<open-catalog-uri>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `\https://<snowflake-orgname>-<account-name>.snowflakecomputing.com/polaris/api/catalog`.
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step.
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step.
- `<open-catalog-name>`: The name of your catalog in Open Catalog.
--
+
[,bash,role=no-copy]
----
Successfully updated configuration. New configuration version is 2.
----

. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following sets the `key_value` Iceberg mode, which creates the Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record's value. See xref:manage:topic-iceberg-integration.adoc#enable-iceberg-integration[Enable Iceberg integration] for more details on Iceberg modes.
+
[,bash]
----
rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value
----

. Produce to the topic. For example,
+
[,bash]
----
echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n'
----

You should see the topic as a table in Open Catalog.

. In Open Catalog, select *Catalogs*, then open your catalog.
. Under your catalog, you should see the `redpanda` namespace, and a table with the name of your topic. The `redpanda` namespace and the table are automatically created for you.

== Query Iceberg table in Snowflake

To query the topic in Snowflake, you'll need to create a https://docs.snowflake.com/en/user-guide/tables-iceberg#catalog-integration[catalog integration^] so that Snowflake has access to the table data and metadata.

=== Configure catalog integration with Snowflake

. Run the https://docs.snowflake.com/sql-reference/sql/create-catalog-integration-open-catalog[`CREATE CATALOG INTEGRATION`] command in Snowflake:
+
[,sql]
----
CREATE CATALOG INTEGRATION <catalog-integration-name>
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'redpanda'
REST_CONFIG = (
CATALOG_URI = '<open-catalog-uri>'
WAREHOUSE = '<open-catalog-name>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<open-catalog-connection-client-id>'
OAUTH_CLIENT_SECRET = '<open-catalog-connection-client-secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;
----
+
Use your own values for the following placeholders:
+
- `<catalog-integration-name>`: Provide a name for your Iceberg catalog integration in Snowflake.
- `<open-catalog-uri>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `https://<snowflake-orgname>-<account-name>.snowflakecomputing.com/polaris/api/catalog`
- `<open-catalog-name>`: The name of your catalog in Open Catalog.
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step.
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step.

=== Create Iceberg table in Snowflake

After creating the catalog integration, you must create an externally-managed table in Snowflake. You'll run your Snowflake queries against this table.

. Run the https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-rest[CREATE ICEBERG TABLE] command in Snowflake. The following exampe also specifies that the table should automatically refresh metadata:
+
[,sql]
----
CREATE ICEBERG TABLE <table-name>
CATALOG = '<catalog-integration-name>'
EXTERNAL_VOLUME = '<iceberg-external-volume-name>'
CATALOG_TABLE_NAME = '<topic-name>'
AUTO_REFRESH = TRUE
----
+
Use your own values for the following placeholders:
+
- `<table-name>`: Provide a name for your table in Snowflake.
- `<catalog-integration-name>`: The name of the catalog integration you configured in an earlier step.
- `<iceberg-external-volume-name>`: The name of the external volume you configured using the Tiered Storage bucket.
- `<topic-name>`: The name of the table in your catalog, which is the same as your Redpanda topic name.

=== Query table

To verify that Snowflake has successfully created the table containing the topic data, run the following:

[,sql]
----
SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog-integration-name>');

SELECT * FROM <table-name>;
SELECT * FROM <table-name> WHERE redpanda:offset > 100;
----

// Query results example
[,bash]
----

----
Loading