Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: add variant type support #11831

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

aihuaxu
Copy link
Contributor

@aihuaxu aihuaxu commented Dec 19, 2024

This is to add some required changes in API and core module for Variant support, including:

  • Add isVariantType() method for variant type
  • Add variant support in schema visitors such as AssignIdFreshIds, ReassignIds, PruneColumns, etc. and TypeUtil.
  • Add variant support in avro projection.
  • Add test coverage to test out schema visitors, TypeUtil and Avro projection.

Part of: #10392

@aihuaxu aihuaxu marked this pull request as ready for review December 19, 2024 23:00
@aihuaxu
Copy link
Contributor Author

aihuaxu commented Dec 19, 2024

cc @rdblue, @RussellSpitzer, @flyrain and @JonasJ-ap. This is to add the changes in core to support variant type.

@aihuaxu aihuaxu requested a review from rdblue December 21, 2024 04:45
@aihuaxu aihuaxu force-pushed the variant-type-core branch 2 times, most recently from b276d3f to fe6038a Compare December 21, 2024 16:42
@shohamyamin
Copy link

@aihuaxu very important feature that will allow a lot more options for iceberg, thank you for your contribution

@@ -56,6 +56,10 @@ class BuildAvroProjection extends AvroCustomOrderSchemaVisitor<Schema, Schema.Fi
@Override
@SuppressWarnings("checkstyle:CyclomaticComplexity")
public Schema record(Schema record, List<String> names, Iterable<Schema.Field> schemaIterable) {
if (current.isVariantType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the corresponding visit method should be updated to call a new visitor variant method. That will be cleaner.

The visitor should look for the variant logical type, so we will need to implement the logical type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This is to workaround before the logical type is added in Avro by using Iceberg variant type. Let me add a comment for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are updating the visitor implementations, I think we should also update the visit method. We can also delay making thes changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean handle variant in the following visit method? As I mentioned that we don't have variant logical type in Schema.getType() from avro, we can't handle that for now. Let me know if understand correctly. I think this is needed for metadata change and I can defer until then.

abstract class AvroCustomOrderSchemaVisitor<T, F> {
public static <T, F> T visit(Schema schema, AvroCustomOrderSchemaVisitor<T, F> visitor) {
switch (schema.getType()) {

@aihuaxu aihuaxu force-pushed the variant-type-core branch 3 times, most recently from e0a430f to 1b56bf5 Compare January 30, 2025 05:12
@github-actions github-actions bot added the data label Jan 30, 2025
@aihuaxu aihuaxu requested a review from rdblue January 30, 2025 05:20
@aihuaxu aihuaxu force-pushed the variant-type-core branch 3 times, most recently from 760ed7d to eb11b53 Compare January 30, 2025 17:20
@aihuaxu aihuaxu force-pushed the variant-type-core branch 3 times, most recently from 6427c06 to fc10750 Compare February 1, 2025 00:49
final org.apache.avro.Schema actual = testSubject.record(expected, List.of(), null);
assertThat(actual)
.as("Variant projection produced undesired variant schema")
.isEqualTo(expected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is going to exercise the path that we want to when the visitor and visit method are updated. This should create a record schema and validate that visiting a file schema gives the correct projection to request, by name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will delay the change in BuildAvroProjection as mentioned above so I removed the test. Will add this back when making the code change.

@aihuaxu aihuaxu requested a review from rdblue February 10, 2025 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants