Hadoop Catalog: Hadoop warehouse path fails on s3 bucket #12124

guykhazma · 2025-01-28T20:40:09Z

Apache Iceberg version

None

Query engine

Spark

Please describe the bug 🐞

When using Hadoop catalog and s3 bucket as the warehouse location, for example:

spark.sql.catalog.local.warehouse=s3a://bucket (also with /)
spark.sql.catalog.local.type=hadoop

Then operations against the catalog fail because the path is not set correctly.
For example, when using:

USE database

The following exception appears:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3a://bucketdatabase
	at org.apache.hadoop.fs.Path.initialize(Path.java:263)
	at org.apache.hadoop.fs.Path.<init>(Path.java:161)
	at org.apache.hadoop.fs.Path.<init>(Path.java:119)
	at org.apache.iceberg.hadoop.HadoopCatalog.loadNamespaceMetadata(HadoopCatalog.java:367)
	at org.apache.iceberg.spark.SparkCatalog.loadNamespaceMetadata(SparkCatalog.java:462)
	at org.apache.spark.sql.connector.catalog.SupportsNamespaces.namespaceExists(SupportsNamespaces.java:98)
	at org.apache.spark.sql.connector.catalog.CatalogManager.setCurrentNamespace(CatalogManager.scala:112)
	at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2(SetCatalogAndNamespaceExec.scala:36)
	at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.$anonfun$run$2$adapted(SetCatalogAndNamespaceExec.scala:36)
	at scala.Option.foreach(Option.scala:274)
	at org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec.run(SetCatalogAndNamespaceExec.scala:36)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:218)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:95)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:640)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:671)
	at com.ibm.spark.Runner$.main(Runner.scala:69)
	at com.ibm.spark.Runner.main(Runner.scala)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: s3a://bucketdatabase
	at java.base/java.net.URI.checkPath(URI.java:1940)
	at java.base/java.net.URI.<init>(URI.java:757)
	at org.apache.hadoop.fs.Path.initialize(Path.java:260)
	... 43 more

I can contribute a fix if needed.

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

The text was updated successfully, but these errors were encountered:

guykhazma added the bug Something isn't working label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadoop Catalog: Hadoop warehouse path fails on s3 bucket #12124

Hadoop Catalog: Hadoop warehouse path fails on s3 bucket #12124

guykhazma commented Jan 28, 2025

Hadoop Catalog: Hadoop warehouse path fails on s3 bucket #12124

Hadoop Catalog: Hadoop warehouse path fails on s3 bucket #12124

Comments

guykhazma commented Jan 28, 2025

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute