From 32df91fa3925d0269cf7b29cc7d6fa6dc29dcd62 Mon Sep 17 00:00:00 2001 From: FANNG Date: Tue, 7 Jan 2025 14:17:30 +0800 Subject: [PATCH] [#6070][#5649] docs(core): add credential vending document (#6071) ### What changes were proposed in this pull request? move credential vending related document from iceberg-rest-server part to a separate file, then fileset could refer to it. ### Why are the changes needed? Fix: #6070 Fix: #5649 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? just document --- docs/hadoop-catalog.md | 27 ++++- docs/iceberg-rest-service.md | 55 +++------ docs/security/credential-vending.md | 178 ++++++++++++++++++++++++++++ 3 files changed, 215 insertions(+), 45 deletions(-) create mode 100644 docs/security/credential-vending.md diff --git a/docs/hadoop-catalog.md b/docs/hadoop-catalog.md index 9048556ffa5..99e1dd7854e 100644 --- a/docs/hadoop-catalog.md +++ b/docs/hadoop-catalog.md @@ -23,9 +23,12 @@ Hadoop 3. If there's any compatibility issue, please create an [issue](https://g Besides the [common catalog properties](./gravitino-server-config.md#apache-gravitino-catalog-properties-configuration), the Hadoop catalog has the following properties: -| Property Name | Description | Default Value | Required | Since Version | -|---------------|-------------------------------------------------|---------------|----------|---------------| -| `location` | The storage location managed by Hadoop catalog. | (none) | No | 0.5.0 | +| Property Name | Description | Default Value | Required | Since Version | +|------------------------|----------------------------------------------------|---------------|----------|------------------| +| `location` | The storage location managed by Hadoop catalog. | (none) | No | 0.5.0 | +| `credential-providers` | The credential provider types, separated by comma. | (none) | No | 0.8.0-incubating | + +Please refer to [Credential vending](./security/credential-vending.md) for more details about credential vending. Apart from the above properties, to access fileset like HDFS, S3, GCS, OSS or custom fileset, you need to configure the following extra properties. @@ -50,6 +53,8 @@ Apart from the above properties, to access fileset like HDFS, S3, GCS, OSS or cu | `s3-access-key-id` | The access key of the AWS S3. | (none) | Yes if it's a S3 fileset. | 0.7.0-incubating | | `s3-secret-access-key` | The secret key of the AWS S3. | (none) | Yes if it's a S3 fileset. | 0.7.0-incubating | +Please refer to [S3 credentials](./security/credential-vending.md#s3-credentials) for credential related configurations. + At the same time, you need to place the corresponding bundle jar [`gravitino-aws-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/gravitino-aws-bundle/) in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`. #### GCS fileset @@ -60,6 +65,8 @@ At the same time, you need to place the corresponding bundle jar [`gravitino-aws | `default-filesystem-provider` | The name default filesystem providers of this Hadoop catalog if users do not specify the scheme in the URI. Default value is `builtin-local`, for GCS, if we set this value, we can omit the prefix 'gs://' in the location. | `builtin-local` | No | 0.7.0-incubating | | `gcs-service-account-file` | The path of GCS service account JSON file. | (none) | Yes if it's a GCS fileset. | 0.7.0-incubating | +Please refer to [GCS credentials](./security/credential-vending.md#gcs-credentials) for credential related configurations. + In the meantime, you need to place the corresponding bundle jar [`gravitino-gcp-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/gravitino-gcp-bundle/) in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`. #### OSS fileset @@ -72,6 +79,8 @@ In the meantime, you need to place the corresponding bundle jar [`gravitino-gcp- | `oss-access-key-id` | The access key of the Aliyun OSS. | (none) | Yes if it's a OSS fileset. | 0.7.0-incubating | | `oss-secret-access-key` | The secret key of the Aliyun OSS. | (none) | Yes if it's a OSS fileset. | 0.7.0-incubating | +Please refer to [OSS credentials](./security/credential-vending.md#oss-credentials) for credential related configurations. + In the meantime, you need to place the corresponding bundle jar [`gravitino-aliyun-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/gravitino-aliyun-bundle/) in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`. @@ -84,6 +93,8 @@ In the meantime, you need to place the corresponding bundle jar [`gravitino-aliy | `azure-storage-account-name ` | The account name of Azure Blob Storage. | (none) | Yes if it's a Azure Blob Storage fileset. | 0.8.0-incubating | | `azure-storage-account-key` | The account key of Azure Blob Storage. | (none) | Yes if it's a Azure Blob Storage fileset. | 0.8.0-incubating | +Please refer to [ADLS credentials](./security/credential-vending.md#adls-credentials) for credential related configurations. + Similar to the above, you need to place the corresponding bundle jar [`gravitino-azure-bundle-${version}.jar`](https://repo1.maven.org/maven2/org/apache/gravitino/gravitino-azure-bundle/) in the directory `${GRAVITINO_HOME}/catalogs/hadoop/libs`. :::note @@ -146,7 +157,8 @@ The Hadoop catalog supports creating, updating, deleting, and listing schema. | `authentication.impersonation-enable` | Whether to enable impersonation for this schema of the Hadoop catalog. | The parent(catalog) value | No | 0.6.0-incubating | | `authentication.type` | The type of authentication for this schema of Hadoop catalog , currently we only support `kerberos`, `simple`. | The parent(catalog) value | No | 0.6.0-incubating | | `authentication.kerberos.principal` | The principal of the Kerberos authentication for this schema. | The parent(catalog) value | No | 0.6.0-incubating | -| `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication for this scheam. | The parent(catalog) value | No | 0.6.0-incubating | +| `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication for this schema. | The parent(catalog) value | No | 0.6.0-incubating | +| `credential-providers` | The credential provider types, separated by comma. | (none) | No | 0.8.0-incubating | ### Schema operations @@ -166,6 +178,13 @@ Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema- | `authentication.type` | The type of authentication for Hadoop catalog fileset, currently we only support `kerberos`, `simple`. | The parent(schema) value | No | 0.6.0-incubating | | `authentication.kerberos.principal` | The principal of the Kerberos authentication for the fileset. | The parent(schema) value | No | 0.6.0-incubating | | `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication for the fileset. | The parent(schema) value | No | 0.6.0-incubating | +| `credential-providers` | The credential provider types, separated by comma. | (none) | No | 0.8.0-incubating | + +Credential providers can be specified in several places, as listed below. Gravitino checks the `credential-provider` setting in the following order of precedence: + +1. Fileset properties +2. Schema properties +3. Catalog properties ### Fileset operations diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md index 5adc75ad835..d42fc98b4dd 100644 --- a/docs/iceberg-rest-service.md +++ b/docs/iceberg-rest-service.md @@ -27,9 +27,9 @@ The Apache Gravitino Iceberg REST Server follows the [Apache Iceberg REST API sp ## Server management There are three deployment scenarios for Gravitino Iceberg REST server: -- A standalone server in a standalone Gravitino Iceberg REST server package. -- A standalone server in the Gravitino server package. -- An auxiliary service embedded in the Gravitino server. +- A standalone server in a standalone Gravitino Iceberg REST server package, the classpath is `libs`. +- A standalone server in the Gravitino server package, the classpath is `iceberg-rest-server/libs`. +- An auxiliary service embedded in the Gravitino server, the classpath is `iceberg-rest-server/libs`. For detailed instructions on how to build and install the Gravitino server package, please refer to [How to build](./how-to-build.md) and [How to install](./how-to-install.md). To build the Gravitino Iceberg REST server package, use the command `./gradlew compileIcebergRESTServer -x test`. Alternatively, to create the corresponding compressed package in the distribution directory, use `./gradlew assembleIcebergRESTServer -x test`. The Gravitino Iceberg REST server package includes the following files: @@ -100,29 +100,23 @@ The detailed configuration items are as follows: | `gravitino.iceberg-rest.authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`. | 60 | No | 0.7.0-incubating | +### Credential vending + +Please refer to [Credential vending](./security/credential-vending.md) for more details. + ### Storage #### S3 configuration -Gravitino Iceberg REST service supports using static S3 secret key or generating temporary token to access S3 data. - | Configuration item | Description | Default value | Required | Since Version | |----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|------------------------------------------------|------------------| | `gravitino.iceberg-rest.io-impl` | The IO implementation for `FileIO` in Iceberg, use `org.apache.iceberg.aws.s3.S3FileIO` for S3. | (none) | No | 0.6.0-incubating | -| `gravitino.iceberg-rest.credential-provider-type` | Deprecated, please use `gravitino.iceberg-rest.credential-providers` instead. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.credential-providers` | Supports `s3-token` and `s3-secret-key` for S3. `s3-token` generates a temporary token according to the query data path while `s3-secret-key` using the s3 secret access key to access S3 data. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.s3-access-key-id` | The static access key ID used to access S3 data. | (none) | No | 0.6.0-incubating | -| `gravitino.iceberg-rest.s3-secret-access-key` | The static secret access key used to access S3 data. | (none) | No | 0.6.0-incubating | | `gravitino.iceberg-rest.s3-endpoint` | An alternative endpoint of the S3 service, This could be used for S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud. | (none) | No | 0.6.0-incubating | | `gravitino.iceberg-rest.s3-region` | The region of the S3 service, like `us-west-2`. | (none) | No | 0.6.0-incubating | -| `gravitino.iceberg-rest.s3-role-arn` | The ARN of the role to access the S3 data. | (none) | Yes, when `credential-providers` is `s3-token` | 0.7.0-incubating | -| `gravitino.iceberg-rest.s3-external-id` | The S3 external id to generate token, only used when `credential-providers` is `s3-token`. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.s3-token-expire-in-secs` | The S3 session token expire time in secs, it couldn't exceed the max session time of the assumed role, only used when `credential-providers` is `s3-token`. | 3600 | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.s3-token-service-endpoint` | An alternative endpoint of the S3 token service, This could be used with s3-compatible object storage service like MINIO that has a different STS endpoint. | (none) | No | 0.8.0-incubating | For other Iceberg s3 properties not managed by Gravitino like `s3.sse.type`, you could config it directly by `gravitino.iceberg-rest.s3.sse.type`. -If you set `credential-providers` explicitly, please downloading [Gravitino AWS bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/aws-bundle), and place it to the classpath of Iceberg REST server. +Please refer to [S3 credentials](./security/credential-vending.md#s3-credentials) for credential related configurations. :::info To configure the JDBC catalog backend, set the `gravitino.iceberg-rest.warehouse` parameter to `s3://{bucket_name}/${prefix_name}`. For the Hive catalog backend, set `gravitino.iceberg-rest.warehouse` to `s3a://{bucket_name}/${prefix_name}`. Additionally, download the [Iceberg AWS bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle) and place it in the classpath of Iceberg REST server. @@ -130,24 +124,15 @@ To configure the JDBC catalog backend, set the `gravitino.iceberg-rest.warehouse #### OSS configuration -Gravitino Iceberg REST service supports using static access-key-id and secret-access-key or generating temporary token to access OSS data. - | Configuration item | Description | Default value | Required | Since Version | |---------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|------------------------------------------------------|------------------| | `gravitino.iceberg-rest.io-impl` | The IO implementation for `FileIO` in Iceberg, use `org.apache.iceberg.aliyun.oss.OSSFileIO` for OSS. | (none) | No | 0.6.0-incubating | -| `gravitino.iceberg-rest.credential-provider-type` | Deprecated, please use `gravitino.iceberg-rest.credential-providers` instead. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.credential-providers` | Supports `oss-token` and `oss-secret-key` for OSS. `oss-token` generates a temporary token according to the query data path while `oss-secret-key` using the oss secret access key to access S3 data. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.oss-access-key-id` | The static access key ID used to access OSS data. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.oss-secret-access-key` | The static secret access key used to access OSS data. | (none) | No | 0.7.0-incubating | | `gravitino.iceberg-rest.oss-endpoint` | The endpoint of Aliyun OSS service. | (none) | No | 0.7.0-incubating | | `gravitino.iceberg-rest.oss-region` | The region of the OSS service, like `oss-cn-hangzhou`, only used when `credential-providers` is `oss-token`. | (none) | No | 0.8.0-incubating | -| `gravitino.iceberg-rest.oss-role-arn` | The ARN of the role to access the OSS data, only used when `credential-providers` is `oss-token`. | (none) | Yes, when `credential-provider-type` is `oss-token`. | 0.8.0-incubating | -| `gravitino.iceberg-rest.oss-external-id` | The OSS external id to generate token, only used when `credential-providers` is `oss-token`. | (none) | No | 0.8.0-incubating | -| `gravitino.iceberg-rest.oss-token-expire-in-secs` | The OSS security token expire time in secs, only used when `credential-providers` is `oss-token`. | 3600 | No | 0.8.0-incubating | For other Iceberg OSS properties not managed by Gravitino like `client.security-token`, you could config it directly by `gravitino.iceberg-rest.client.security-token`. -If you set `credential-providers` explicitly, please downloading [Gravitino Aliyun bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/aliyun-bundle), and place it to the classpath of Iceberg REST server. +Please refer to [OSS credentials](./security/credential-vending.md#oss-credentials) for credential related configurations. :::info Please set the `gravitino.iceberg-rest.warehouse` parameter to `oss://{bucket_name}/${prefix_name}`. Additionally, download the [Aliyun OSS SDK](https://gosspublic.alicdn.com/sdks/java/aliyun_java_sdk_3.10.2.zip) and copy `aliyun-sdk-oss-3.10.2.jar`, `hamcrest-core-1.1.jar`, `jdom2-2.0.6.jar` in the classpath of Iceberg REST server, `iceberg-rest-server/libs` for the auxiliary server, `libs` for the standalone server. @@ -160,16 +145,14 @@ Supports using static GCS credential file or generating GCS token to access GCS | Configuration item | Description | Default value | Required | Since Version | |---------------------------------------------------|----------------------------------------------------------------------------------------------------|---------------|----------|------------------| | `gravitino.iceberg-rest.io-impl` | The io implementation for `FileIO` in Iceberg, use `org.apache.iceberg.gcp.gcs.GCSFileIO` for GCS. | (none) | No | 0.6.0-incubating | -| `gravitino.iceberg-rest.credential-provider-type` | Deprecated, please use `gravitino.iceberg-rest.credential-providers` instead. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.credential-providers` | Supports `gcs-token`, generates a temporary token according to the query data path. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.gcs-credential-file-path` | Deprecated, please use `gravitino.iceberg-rest.gcs-service-account-file` instead. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.gcs-service-account-file` | The location of GCS credential file, only used when `credential-provider-type` is `gcs-token`. | (none) | No | 0.8.0-incubating | For other Iceberg GCS properties not managed by Gravitino like `gcs.project-id`, you could config it directly by `gravitino.iceberg-rest.gcs.project-id`. -If you set `credential-providers` explicitly, please downloading [Gravitino GCP bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/gcp-bundle), and place it to the classpath of Iceberg REST server. +Please refer to [GCS credentials](./security/credential-vending.md#gcs-credentials) for credential related configurations. -Please make sure the credential file is accessible by Gravitino, like using `export GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json` before Gravitino Iceberg REST server is started. +:::note +Please ensure that the credential file can be accessed by the Gravitino server. For example, if the server is running on a GCE machine, or you can set the environment variable as `export GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json`, even when the `gcs-service-account-file` has already been configured. +::: :::info Please set `gravitino.iceberg-rest.warehouse` to `gs://{bucket_name}/${prefix_name}`, and download [Iceberg gcp bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle) and place it to the classpath of Gravitino Iceberg REST server, `iceberg-rest-server/libs` for the auxiliary server, `libs` for the standalone server. @@ -177,23 +160,13 @@ Please set `gravitino.iceberg-rest.warehouse` to `gs://{bucket_name}/${prefix_na #### ADLS -Gravitino Iceberg REST service supports generating SAS token to access ADLS data. - | Configuration item | Description | Default value | Required | Since Version | |-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------| | `gravitino.iceberg-rest.io-impl` | The IO implementation for `FileIO` in Iceberg, use `org.apache.iceberg.azure.adlsv2.ADLSFileIO` for ADLS. | (none) | Yes | 0.8.0-incubating | -| `gravitino.iceberg-rest.credential-provider-type` | Deprecated, please use `gravitino.iceberg-rest.credential-providers` instead. | (none) | No | 0.7.0-incubating | -| `gravitino.iceberg-rest.credential-providers` | Supports `adls-token` and `azure-account-key`. `adls-token` generates a temporary token according to the query data path while `azure-account-key` uses a storage account key to access ADLS data. | (none) | Yes | 0.8.0-incubating | -| `gravitino.iceberg-rest.azure-storage-account-name` | The static storage account name used to access ADLS data. | (none) | Yes | 0.8.0-incubating | -| `gravitino.iceberg-rest.azure-storage-account-key` | The static storage account key used to access ADLS data. | (none) | Yes | 0.8.0-incubating | -| `gravitino.iceberg-rest.azure-tenant-id` | Azure Active Directory (AAD) tenant ID, only used when `credential-providers` is `adls-token`. | (none) | Yes | 0.8.0-incubating | -| `gravitino.iceberg-rest.azure-client-id` | Azure Active Directory (AAD) client ID used for authentication, only used when `credential-providers` is `adls-token`. | (none) | Yes | 0.8.0-incubating | -| `gravitino.iceberg-rest.azure-client-secret` | Azure Active Directory (AAD) client secret used for authentication, only used when `credential-providers` is `adls-token`. | (none) | Yes | 0.8.0-incubating | -| `gravitino.iceberg-rest.adls-token-expire-in-secs` | The ADLS SAS token expire time in secs, only used when `credential-providers` is `adls-token`. | 3600 | No | 0.8.0-incubating | For other Iceberg ADLS properties not managed by Gravitino like `adls.read.block-size-bytes`, you could config it directly by `gravitino.iceberg-rest.adls.read.block-size-bytes`. -If you set `credential-providers` explicitly, please downloading [Gravitino Azure bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/azure-bundle), and place it to the classpath of Iceberg REST server. +Please refer to [ADLS credentials](./security/credential-vending.md#adls-credentials) for credential related configurations. :::info Please set `gravitino.iceberg-rest.warehouse` to `abfs[s]://{container-name}@{storage-account-name}.dfs.core.windows.net/{path}`, and download the [Iceberg Azure bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-azure-bundle) and place it in the classpath of Iceberg REST server. diff --git a/docs/security/credential-vending.md b/docs/security/credential-vending.md new file mode 100644 index 00000000000..92370f4315d --- /dev/null +++ b/docs/security/credential-vending.md @@ -0,0 +1,178 @@ +--- +title: "Gravitino credential vending" +slug: /security/credential-vending +keyword: security credential vending +license: "This software is licensed under the Apache License version 2." +--- + +## Background + +Gravitino credential vending is used to generate temporary or static credentials for accessing data. With credential vending, Gravitino provides an unified way to control the access to diverse data sources in different platforms. + +### Capabilities + +- Supports Gravitino Iceberg REST server. +- Supports Gravitino server, only support Hadoop catalog. +- Supports pluggable credentials with build-in credentials: + - S3: `S3TokenCredential`, `S3SecretKeyCredential` + - GCS: `GCSTokenCredential` + - ADLS: `ADLSTokenCredential`, `AzureAccountKeyCredential` + - OSS: `OSSTokenCredential`, `OSSSecretKeyCredential` +- No support for Spark/Trino/Flink connector yet. + +## General configurations + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|--------------------------------------------------------|--------------------------------------------------------------------------------------------|---------------|----------|------------------| +| `credential-provider-type` | `gravitino.iceberg-rest.credential-provider-type` | Deprecated, please use `credential-providers` instead. | (none) | Yes | 0.7.0-incubating | +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | The credential provider types, separated by comma. | (none) | Yes | 0.8.0-incubating | +| `credential-cache-expire-ratio` | `gravitino.iceberg-rest.credential-cache-expire-ratio` | Ratio of the credential's expiration time when Gravitino remove credential from the cache. | 0.15 | No | 0.8.0-incubating | +| `credential-cache-max-size` | `gravitino.iceberg-rest.cache-max-size` | Max size for the credential cache. | 10000 | No | 0.8.0-incubating | + +## Build-in credentials configurations + +### S3 credentials + +#### S3 secret key credential + +A credential with static S3 access key id and secret access key. + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|---------------------------------------------------|--------------------------------------------------------|---------------|----------|------------------| +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | `s3-secret-key` for S3 secret key credential provider. | (none) | Yes | 0.8.0-incubating | +| `s3-access-key-id` | `gravitino.iceberg-rest.s3-access-key-id` | The static access key ID used to access S3 data. | (none) | Yes | 0.6.0-incubating | +| `s3-secret-access-key` | `gravitino.iceberg-rest.s3-secret-access-key` | The static secret access key used to access S3 data. | (none) | Yes | 0.6.0-incubating | + +#### S3 token credential + +An S3 token is a token credential with scoped privileges, by leveraging STS [Assume Role](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html). To use an S3 token credential, you should create a role and grant it proper privileges. + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------| +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | `s3-token` for S3 token credential provider. | (none) | Yes | 0.8.0-incubating | +| `s3-access-key-id` | `gravitino.iceberg-rest.s3-access-key-id` | The static access key ID used to access S3 data. | (none) | Yes | 0.6.0-incubating | +| `s3-secret-access-key` | `gravitino.iceberg-rest.s3-secret-access-key` | The static secret access key used to access S3 data. | (none) | Yes | 0.6.0-incubating | +| `s3-role-arn` | `gravitino.iceberg-rest.s3-role-arn` | The ARN of the role to access the S3 data. | (none) | Yes | 0.7.0-incubating | +| `s3-external-id` | `gravitino.iceberg-rest.s3-external-id` | The S3 external id to generate token. | (none) | No | 0.7.0-incubating | +| `s3-token-expire-in-secs` | `gravitino.iceberg-rest.s3-token-expire-in-secs` | The S3 session token expire time in secs, it couldn't exceed the max session time of the assumed role. | 3600 | No | 0.7.0-incubating | +| `s3-token-service-endpoint` | `gravitino.iceberg-rest.s3-token-service-endpoint` | An alternative endpoint of the S3 token service, This could be used with s3-compatible object storage service like MINIO that has a different STS endpoint. | (none) | No | 0.8.0-incubating | + +### OSS credentials + +#### OSS secret key credential + +A credential with static OSS access key id and secret access key. + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|---------------------------------------------------|-------------------------------------------------------------------------------|---------------|----------|------------------| +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | `oss-secret-key` for OSS secret credential. | (none) | Yes | 0.8.0-incubating | +| `oss-access-key-id` | `gravitino.iceberg-rest.oss-access-key-id` | The static access key ID used to access OSS data. | (none) | Yes | 0.7.0-incubating | +| `oss-secret-access-key` | `gravitino.iceberg-rest.oss-secret-access-key` | The static secret access key used to access OSS data. | (none) | Yes | 0.7.0-incubating | + +#### OSS token credential + +An OSS token is a token credential with scoped privileges, by leveraging STS [Assume Role](https://www.alibabacloud.com/help/en/oss/developer-reference/use-temporary-access-credentials-provided-by-sts-to-access-oss). To use an OSS token credential, you should create a role and grant it proper privileges. + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|---------------------------------------------------|-------------------------------------------------------------------------------|---------------|----------|------------------| +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | `oss-token` for s3 token credential. | (none) | Yes | 0.8.0-incubating | +| `oss-access-key-id` | `gravitino.iceberg-rest.oss-access-key-id` | The static access key ID used to access OSS data. | (none) | Yes | 0.7.0-incubating | +| `oss-secret-access-key` | `gravitino.iceberg-rest.oss-secret-access-key` | The static secret access key used to access OSS data. | (none) | Yes | 0.7.0-incubating | +| `oss-role-arn` | `gravitino.iceberg-rest.oss-role-arn` | The ARN of the role to access the OSS data. | (none) | Yes | 0.8.0-incubating | +| `oss-external-id` | `gravitino.iceberg-rest.oss-external-id` | The OSS external id to generate token. | (none) | No | 0.8.0-incubating | +| `oss-token-expire-in-secs` | `gravitino.iceberg-rest.oss-token-expire-in-secs` | The OSS security token expire time in secs. | 3600 | No | 0.8.0-incubating | + +### ADLS credentials + +#### Azure account key credential + +A credential with static Azure storage account name and key. + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|-----------------------------------------------------|-----------------------------------------------------------|---------------|----------|------------------| +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | `azure-account-key` for Azure account key credential. | (none) | Yes | 0.8.0-incubating | +| `azure-storage-account-name` | `gravitino.iceberg-rest.azure-storage-account-name` | The static storage account name used to access ADLS data. | (none) | Yes | 0.8.0-incubating | +| `azure-storage-account-key` | `gravitino.iceberg-rest.azure-storage-account-key` | The static storage account key used to access ADLS data. | (none) | Yes | 0.8.0-incubating | + +#### ADLS token credential + +An ADLS token is a token credential with scoped privileges, by leveraging Azure [User Delegation Sas](https://learn.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas). To use an ADLS token credential, you should create a Microsoft Entra ID service principal and grant it proper privileges. + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|-----------------------------------------------------|---------------------------------------------------------------------|---------------|----------|------------------| +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | `adls-token` for ADLS token credential. | (none) | Yes | 0.8.0-incubating | +| `azure-storage-account-name` | `gravitino.iceberg-rest.azure-storage-account-name` | The static storage account name used to access ADLS data. | (none) | Yes | 0.8.0-incubating | +| `azure-storage-account-key` | `gravitino.iceberg-rest.azure-storage-account-key` | The static storage account key used to access ADLS data. | (none) | Yes | 0.8.0-incubating | +| `azure-tenant-id` | `gravitino.iceberg-rest.azure-tenant-id` | Azure Active Directory (AAD) tenant ID. | (none) | Yes | 0.8.0-incubating | +| `azure-client-id` | `gravitino.iceberg-rest.azure-client-id` | Azure Active Directory (AAD) client ID used for authentication. | (none) | Yes | 0.8.0-incubating | +| `azure-client-secret` | `gravitino.iceberg-rest.azure-client-secret` | Azure Active Directory (AAD) client secret used for authentication. | (none) | Yes | 0.8.0-incubating | +| `adls-token-expire-in-secs` | `gravitino.iceberg-rest.adls-token-expire-in-secs` | The ADLS SAS token expire time in secs. | 3600 | No | 0.8.0-incubating | + +### GCS credentials + +#### GCS token credential + +An GCS token is a token credential with scoped privileges, by leveraging GCS [Credential Access Boundaries](https://cloud.google.com/iam/docs/downscoping-short-lived-credentials). To use an GCS token credential, you should create an GCS service account and grant it proper privileges. + +| Gravitino server catalog properties | Gravitino Iceberg REST server configurations | Description | Default value | Required | Since Version | +|-------------------------------------|---------------------------------------------------|------------------------------------------------------------|-------------------------------------|----------|------------------| +| `credential-providers` | `gravitino.iceberg-rest.credential-providers` | `gcs-token` for GCS token credential. | (none) | Yes | 0.8.0-incubating | +| `gcs-credential-file-path` | `gravitino.iceberg-rest.gcs-credential-file-path` | Deprecated, please use `gcs-service-account-file` instead. | GCS Application default credential. | No | 0.7.0-incubating | +| `gcs-service-account-file` | `gravitino.iceberg-rest.gcs-service-account-file` | The location of GCS credential file. | GCS Application default credential. | No | 0.8.0-incubating | + +:::note +For Gravitino Iceberg REST server, please ensure that the credential file can be accessed by the server. For example, if the server is running on a GCE machine, or you can set the environment variable as `export GOOGLE_APPLICATION_CREDENTIALS=/xx/application_default_credentials.json`, even when the `gcs-service-account-file` has already been configured. +::: + +## Custom credentials + +Gravitino supports custom credentials, you can implement the `org.apache.gravitino.credential.CredentialProvider` interface to support custom credentials, and place the corresponding jar to the classpath of Iceberg catalog server or Hadoop catalog. + +## Deployment + +Besides setting credentials related configuration, please download Gravitino cloud bundle jar and place it in the classpath of Iceberg REST server or Hadoop catalog. + +Gravitino cloud bundle jar: + +- [Gravitino AWS bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aws-bundle) +- [Gravitino Aliyun bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aliyun-bundle) +- [Gravitino GCP bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-gcp-bundle) +- [Gravitino Azure bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-azure-bundle) + +The classpath of the server: + +- Iceberg REST server: the classpath differs in different deploy mode, please refer to [Server management](../iceberg-rest-service.md#server-management) part. +- Hadoop catalog: `catalogs/hadoop/libs/` + +## Usage example + +### Credential vending for Iceberg REST server + +Suppose the Iceberg table data is stored in S3, follow the steps below: + +1. Download the [Gravitino AWS bundle jar](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aws-bundle), and place it to the classpath of Iceberg REST server. + +2. Add s3 token credential configurations. + +``` +gravitino.iceberg-rest.warehouse = s3://{bucket_name}/{warehouse_path} +gravitino.iceberg-rest.io-impl= org.apache.iceberg.aws.s3.S3FileIO +gravitino.iceberg-rest.credential-providers = s3-token +gravitino.iceberg-rest.s3-access-key-id = xxx +gravitino.iceberg-rest.s3-secret-access-key = xxx +gravitino.iceberg-rest.s3-region = {region_name} +gravitino.iceberg-rest.s3-role-arn = {role_arn} +``` + +3. Exploring the Iceberg table with Spark client with credential vending enabled. + +```shell +./bin/spark-sql -v \ +--packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.1 \ +--conf spark.jars={path}/iceberg-aws-bundle-1.5.2.jar \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ +--conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \ +--conf spark.sql.catalog.rest.type=rest \ +--conf spark.sql.catalog.rest.uri=http://127.0.0.1:9001/iceberg/ \ +--conf spark.sql.catalog.rest.header.X-Iceberg-Access-Delegation=vended-credentials +```