Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set tblproperties, spark action expireSnapshots is not work. #12078

Open
cosen-wu opened this issue Jan 24, 2025 · 2 comments
Open

set tblproperties, spark action expireSnapshots is not work. #12078

cosen-wu opened this issue Jan 24, 2025 · 2 comments
Labels
hive question Further information is requested

Comments

@cosen-wu
Copy link

cosen-wu commented Jan 24, 2025

Query engine

hive,flink,spark

Question

create iceberg table in hive:
create table test.iceberg_v1(
a int,
b string,
c string,
d string
)
partitioned by (par_dt string)
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
TBLPROPERTIES (
'format-version'='1'
);

add properties:
alter table test.iceberg_v1 SET TBLPROPERTIES('history.expire.max-snapshot-age-ms'='7200000');

check TBLPROPERTIES:
show TBLPROPERTIES test.iceberg_v1;
+-------------------------------------+----------------------------------------------------+
| prpt_name | prpt_value |
+-------------------------------------+----------------------------------------------------+
| current-schema | {"type":"struct","schema-id":0,"fields":[{"id":1,"name":"a","required":false,"type":"int"},{"id":2,"name":"b","required":false,"type":"string"},{"id":3,"name":"c","required":false,"type":"string"},{"id":4,"name":"d","required":false,"type":"string"},{"id":5,"name":"par_dt","required":false,"type":"string"}]} |
| default-partition-spec | {"spec-id":0,"fields":[{"name":"par_dt","transform":"identity","source-id":5,"field-id":1000}]} |
| engine.hive.enabled | true |
| external.table.purge | TRUE |
| format-version | 1 |
| history.expire.max-snapshot-age-ms | 7200000 |
| last_modified_by | impala |
| last_modified_time | 1737685889 |
| metadata_location | hdfs://xxxxxx/usr/hive/warehouse/test.db/iceberg_v1/metadata/00000-e634e661-3978-4259-b737-fbb81c35d5a3.metadata.json |
| snapshot-count | 0 |
| storage_handler | org.apache.iceberg.mr.hive.HiveIcebergStorageHandler |
| table_type | ICEBERG |
| transient_lastDdlTime | 1737685889 |
| uuid | 9cec09d8-3b72-4568-b70b-2aeed8db96cd |
+-------------------------------------+----------------------------------------------------+

1.write new file and create new metadata.json with flink, the properties of metadata file is also without property: 'history.expire.max-snapshot-age-ms'.
2.execute spark expireSnapshots,but it seems that 'history.expire.max-snapshot-age-ms' is not work.i find action get table properties from metadata.json,that is normal?why not from hms?

iceberg version:1.2.1
flink version: 1.14.5
spark version: 3.3.2
hive version: 3.1.3

@cosen-wu cosen-wu added the question Further information is requested label Jan 24, 2025
@Fokko Fokko added the hive label Jan 24, 2025
@RussellSpitzer
Copy link
Member

I'm not sure what you are asking here.

Are you saying that the table properties are not being set correctly in metadata json when set in flink?
Or are you saying the Spark Expire Snapshots action is not respecting history.expire.max-snapshot-age-ms?

@cosen-wu
Copy link
Author

so sorry. I didn't express myself clearly.

What I'm confused about is why the new properties haven't been updated in the metadata file after they were added in hive.
This causes that when executing Spark Expire Snapshots, the newly added properties cannot be used to control the expired snapshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hive question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants