From 26a59cc7b6cfd4b26e161c16deab9a5a921089f3 Mon Sep 17 00:00:00 2001 From: kbatuigas <36839689+kbatuigas@users.noreply.github.com> Date: Fri, 17 Jan 2025 15:01:58 -0500 Subject: [PATCH 1/4] Initial draft --- ...anda-topics-iceberg-snowflake-catalog.adoc | 176 ++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc diff --git a/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc new file mode 100644 index 000000000..55b266ad4 --- /dev/null +++ b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc @@ -0,0 +1,176 @@ += Query Iceberg Topics using Snowflake and Open Catalog +:description: +:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration +:page-beta: true + +[NOTE] +==== +include::shared:partial$enterprise-license.adoc[] +==== + +This guide walks you through querying Redpanda topics as Iceberg tables in https://docs.snowflake.com/en/user-guide/tables-iceberg[Snowflake^], with AWS S3 as object storage and a catalog integration using https://other-docs.snowflake.com/en/opencatalog/overview[Open Catalog^]. + +== Prerequisites + +* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-object-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables. +** You'll need the S3 bucket's URI to configure it as external storage for Open Catalog. +* A Snowflake account. +* An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account], you need ORGADMIN access in Snowflake. +* An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage. +** Follow this guide to https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[create a catalog] with the S3 bucket configured as external storage. You'll need admin permissions to carry out these steps in AWS: +. Create an IAM policy that grants Open Catalog read and write access to your S3 bucket. +. Create an IAM role and attach the IAM policy to the role. +. After you create a new catalog in Open Catalog, grant the catalog's AWS IAM user access to the S3 bucket. +* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume] set up using the Tiered Storage bucket. +** Follow this guide to https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[configure the external volume with S3]. You can use the same IAM policy as the catalog for the external volume's IAM role and user. + +== Set up catalog integration using Open Catalog + +=== Create a new Open Catalog service connection for Redpanda + +In this step, you'll create a new service connection to integrate the Iceberg-enabled topics into Open Catalog. + +. In Open Catalog, select *Connections*, then *+ Connection*. +. In *Configure Service Connection*, provide a name. Open Catalog will also create a new principal with this name. +. Make sure *Create new principal role* is toggled on. +. Enter a name for the principal role. Then, click *Create*. + +After you create the connection, you'll be provided the client ID and client secret. Save these credentials to add to your cluster configuration in a later step. + +=== Create a catalog role + +Grant privileges to the principal created in the previous step: + +. In Open Catalog, select *Catalogs*, and select your catalog. +. On the *Roles* tab of your catalog, click *+ Catalog Role*. +. Give the catalog role a name. +. Under *Privileges*, select `CATALOG_MANAGE_CONTENT`. This provides full management https://other-docs.snowflake.com/en/opencatalog/access-control#catalog-privileges[privileges] for the catalog. Then, click *Create*. +. On the *Roles* tab of the catalog, click *Grant to Principal Role*. +. Select the catalog role you just created. +. Select the principal role you created earlier. Click *Grant*. + +=== Update cluster configuration + +In this step, you'll configure your Redpanda cluster to enable Iceberg on a topic, as well as set up the integration with Open Catalog. + +. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. For example, you can run `rpk cluster config edit` to update these properties ++ +[,bash] +---- +iceberg_enabled: true +iceberg_rest_catalog_type: rest +iceberg_rest_catalog_endpoint: +iceberg_rest_catalog_client_id: +iceberg_rest_catalog_client_secret: +iceberg_rest_catalog_prefix: + +# Optional +iceberg_translation_interval_ms_default: 1000 +iceberg_catalog_commit_interval_ms: 1000 +---- ++ +Use your own values for the following placeholders: ++ +- ``: +- ``: +- ``: +- ``: ++ +[,bash,role=no-copy] +---- +Successfully updated configuration. New configuration version is 2. +---- + +. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. The following sets the `key_value` Iceberg mode, which creates the Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record's value. See xref:manage:topic-iceberg-integration.adoc#enable-iceberg-integration[Enable Iceberg integration] for more details on Iceberg modes. ++ +[,bash] +---- +rpk topic alter-config --set redpanda.iceberg.mode=key_value +---- + +. Produce to the topic. For example, ++ +[,bash] +---- +echo "hello world\nfoo bar\nbaz qux" | rpk topic produce --format='%k %v\n' +---- + +You should see the topic as a table in Open Catalog. + +. In Open Catalog, select *Catalogs*, then open your catalog. +. Under your catalog, you should see the `redpanda` namespace, and a table with the name of your topic. The `redpanda` namespace and the table are automatically created for you. + +== Query Iceberg table in Snowflake + +To query the topic in Snowflake, you'll need to create a https://docs.snowflake.com/en/user-guide/tables-iceberg#catalog-integration[catalog integration^] so that Snowflake has access to the table data and metadata. + +=== Configure catalog integration with Snowflake + +. Run the https://docs.snowflake.com/sql-reference/sql/create-catalog-integration-open-catalog[`CREATE CATALOG INTEGRATION`] command in Snowflake: ++ +[,sql] +---- +CREATE CATALOG INTEGRATION + CATALOG_SOURCE = POLARIS + TABLE_FORMAT = ICEBERG + CATALOG_NAMESPACE = 'redpanda' + REST_CONFIG = ( + CATALOG_URI = '' + WAREHOUSE = '' + ) + REST_AUTHENTICATION = ( + TYPE = OAUTH + OAUTH_CLIENT_ID = '' + OAUTH_CLIENT_SECRET = '' + OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL') + ) + ENABLED = TRUE; +---- ++ +Use your own values for the following placeholders: ++ +- ``: +- ``: +- ``: +- ``: +- ``: + +=== Create Iceberg table in Snowflake + +After creating the catalog integration, you must create an externally-managed table in Snowflake. You'll run your Snowflake queries against this table. + +. Run the https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-rest[CREATE ICEBERG TABLE] command in Snowflake. The following exampe also specifies that the table should automatically refresh metadata: ++ +[,sql] +---- +CREATE ICEBERG TABLE + CATALOG = '' + EXTERNAL_VOLUME = '' + CATALOG_TABLE_NAME = '' + AUTO_REFRESH = TRUE +---- ++ +Use your own values for the following placeholders: ++ +- ``: +- ``: +- ``: +- ``: + +=== Query table + +To verify that Snowflake has successfully created the table containing the topic data, run the following: + +[,sql] +---- +SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG(''); + +SELECT * FROM ; +SELECT * FROM WHERE redpanda:offset > 100; +---- + +// Query results example +[,bash] +---- + +---- From 65c97b395e2c9b4872f93d8ac17dbc25d5dfdf99 Mon Sep 17 00:00:00 2001 From: kbatuigas <36839689+kbatuigas@users.noreply.github.com> Date: Fri, 17 Jan 2025 15:56:02 -0500 Subject: [PATCH 2/4] Add parameter details --- ...anda-topics-iceberg-snowflake-catalog.adoc | 32 ++++++++++--------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc index 55b266ad4..f93c7a84b 100644 --- a/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc +++ b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc @@ -12,7 +12,7 @@ This guide walks you through querying Redpanda topics as Iceberg tables in https == Prerequisites -* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-object-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables. +* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-tiered-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables. ** You'll need the S3 bucket's URI to configure it as external storage for Open Catalog. * A Snowflake account. * An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account], you need ORGADMIN access in Snowflake. @@ -53,7 +53,7 @@ Grant privileges to the principal created in the previous step: In this step, you'll configure your Redpanda cluster to enable Iceberg on a topic, as well as set up the integration with Open Catalog. -. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. For example, you can run `rpk cluster config edit` to update these properties +. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. You can run `rpk cluster config edit` to update these properties: + [,bash] ---- @@ -71,10 +71,12 @@ iceberg_catalog_commit_interval_ms: 1000 + Use your own values for the following placeholders: + -- ``: -- ``: -- ``: -- ``: +---- +- ``: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `https://-.snowflakecomputing.com/polaris/api/catalog`. +- ``: The client ID of the service connection you created in an earlier step. +- ``: The client secret of the service connection you created in an earlier step. +- ``: The name of your catalog in Open Catalog. +---- + [,bash,role=no-copy] ---- @@ -129,11 +131,11 @@ CREATE CATALOG INTEGRATION + Use your own values for the following placeholders: + -- ``: -- ``: -- ``: -- ``: -- ``: +- ``: Provide a name for your Iceberg catalog integration in Snowflake. +- ``: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `https://-.snowflakecomputing.com/polaris/api/catalog` +- ``: The name of your catalog in Open Catalog. +- ``: The client ID of the service connection you created in an earlier step. +- ``: The client secret of the service connection you created in an earlier step. === Create Iceberg table in Snowflake @@ -152,10 +154,10 @@ CREATE ICEBERG TABLE + Use your own values for the following placeholders: + -- ``: -- ``: -- ``: -- ``: +- ``: Provide a name for your table in Snowflake. +- ``: The name of the catalog integration you configured in an earlier step. +- ``: The name of the external volume you configured using the Tiered Storage bucket. +- ``: The name of the table in your catalog, which is the same as your Redpanda topic name. === Query table From b420b79a8a96df626cf586e08bceac4f14bed7b1 Mon Sep 17 00:00:00 2001 From: kbatuigas <36839689+kbatuigas@users.noreply.github.com> Date: Fri, 17 Jan 2025 17:14:00 -0500 Subject: [PATCH 3/4] Minor formatting fix --- .../pages/redpanda-topics-iceberg-snowflake-catalog.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc index f93c7a84b..bbcaa6cbe 100644 --- a/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc +++ b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc @@ -71,12 +71,12 @@ iceberg_catalog_commit_interval_ms: 1000 + Use your own values for the following placeholders: + ----- -- ``: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `https://-.snowflakecomputing.com/polaris/api/catalog`. +-- +- ``: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI], for example `\https://-.snowflakecomputing.com/polaris/api/catalog`. - ``: The client ID of the service connection you created in an earlier step. - ``: The client secret of the service connection you created in an earlier step. - ``: The name of your catalog in Open Catalog. ----- +-- + [,bash,role=no-copy] ---- From e952ae249557b58db40c30fc5b02dcef0bbc10ad Mon Sep 17 00:00:00 2001 From: Kat Batuigas <36839689+kbatuigas@users.noreply.github.com> Date: Mon, 27 Jan 2025 12:26:42 -0500 Subject: [PATCH 4/4] Update modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc --- .../manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc index bbcaa6cbe..0ef3c6d88 100644 --- a/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc +++ b/modules/manage/pages/redpanda-topics-iceberg-snowflake-catalog.adoc @@ -58,7 +58,7 @@ In this step, you'll configure your Redpanda cluster to enable Iceberg on a topi [,bash] ---- iceberg_enabled: true -iceberg_rest_catalog_type: rest +iceberg_catalog_type: rest iceberg_rest_catalog_endpoint: iceberg_rest_catalog_client_id: iceberg_rest_catalog_client_secret: