Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Parquet field_id writing #6381

Merged

Conversation

devinrsmith
Copy link
Member

This allows for the writing of Parquet column field_ids.

This is an extraction from #6156 (which will need to expose deeper hooks to allow Iceberg to fully control field_id resolution during reading).

This is to ensure we can correctly write down Iceberg tables which must write field_ids which is necessary for #5989

In addition, it was noticed that Parquet ColumnMetaData encoding was written down in a non-deterministic order due to the use of a HashSet; it has been updated to an EnumSet to support a more consistent Parquet serialization. This was necessary to test out field_id writing.

This allows for the writing of Parquet column field_ids.

This is an extraction from deephaven#6156 (which will need to expose deeper hooks to allow Iceberg to fully control field_id resolution during reading).

This is to ensure we can correctly write down Iceberg tables which must write field_ids which is necessary for deephaven#5989

In addition, it was noticed that Parquet ColumnMetaData encoding was written down in a non-deterministic order due to the use of a HashSet; it has been updated to an EnumSet to support a more consistent Parquet serialization. This was necessary to test out field_id writing.
@devinrsmith devinrsmith added parquet Related to the Parquet integration NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. labels Nov 15, 2024
@devinrsmith devinrsmith added this to the 0.37.0 milestone Nov 15, 2024
@devinrsmith devinrsmith self-assigned this Nov 15, 2024
Copy link
Contributor

@malhotrashivam malhotrashivam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

malhotrashivam
malhotrashivam previously approved these changes Nov 18, 2024
@devinrsmith devinrsmith merged commit e096c1f into deephaven:main Nov 19, 2024
17 checks passed
@devinrsmith devinrsmith deleted the nightly/parquet-field-id-writing branch November 19, 2024 16:31
@github-actions github-actions bot locked and limited conversation to collaborators Nov 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. parquet Related to the Parquet integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants