Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop table failed when metadata.json file is missing #12062

Open
1 of 3 tasks
SGITLOGIN opened this issue Jan 23, 2025 · 3 comments
Open
1 of 3 tasks

Drop table failed when metadata.json file is missing #12062

SGITLOGIN opened this issue Jan 23, 2025 · 3 comments
Labels
improvement PR that improves existing functionality

Comments

@SGITLOGIN
Copy link

SGITLOGIN commented Jan 23, 2025

Feature Request / Improvement

Spark configuration

spark.sql.catalog.spark_catalog = org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type = hive
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Question:

drop table failed when metadata.json file is missing

When the metadata.json file is lost, I want to delete this table record from the hive metastore. How to solve this situation?
Image

Query engine

Spark

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time
@SGITLOGIN SGITLOGIN added the improvement PR that improves existing functionality label Jan 23, 2025
@RussellSpitzer
Copy link
Member

The difficulty with this is In Spark is that spark will always call "load" before drop.

So essentially DROP in Spark looks like

Load Catalog
Load Table
Call Table.Drop

This makes it difficult to implement a "force drop" because the call to Table.Drop happens after we have the error at LoadTable. We've had several threads and Issues about this previously, maybe the best thing to do is add a Java Utillity method which just calls the underlying Catalog Drop Function on the Spark Table isntance without going through the load pathway?

For Hive catalogs in Spark we can already work around the Spark interface by calling Drop directly on the hive catalog,
You can do this with a

spark.sharedState.externalCatalog.dropTable("db", "table", false, false)

Which uses's Spark's hive client to call drop directly on the catalog without going through the Spark machinery.

If you have a better solution to fixing SparkCatalog.java please let me know.

@ebyhr
Copy link
Contributor

ebyhr commented Jan 24, 2025

How about adding a new procedure that drops a table without "load" or deleting files?
Trino Iceberg connector has unregister_table procedure as you may already know:

CALL example.system.unregister_table(schema_name => 'testdb', table_name => 'customer_orders')

@SGITLOGIN
Copy link
Author

OK,I will test this method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants