Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deterministic behavior for elements that are union of type T and list of type T #67

Open
rudolfix opened this issue Sep 23, 2022 · 0 comments
Assignees

Comments

@rudolfix
Copy link
Collaborator

Often json document contain inconsistently typed elements that may occur 1 or more times. In case of 1 they are inserted directly as type T, if > 1 they are inserted as list of type T.

ie. first_name may be "Juan" or ["Juan", "Pablo"]

in that case DLT builds bot flattened representation and table representation for T and inserts in both place.

Fix the json normalizer such that if table is present in the schema, the table is always used.
in relational.py

if isinstance(v, (dict, list)):
                if not _is_complex_type(schema, table, child_name, __r_lvl):
                    # TODO: if schema contains table {table}__{child_name} then convert v into single element list
                    if isinstance(v, dict):
                        # flatten the dict more
                        norm_row_dicts(v, __r_lvl + 1, parent_name=child_name)
                    else:
                        # pass the list to out_rec_list
                        out_rec_list[child_name] = v

Note: in case of automatic schema inference, all values until first list of T is detected will be added flattened. To assure consistency such tables should be added to the schema before first data item is normalized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant