Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement include and exclude filters in Schema as data item transformations #64

Open
rudolfix opened this issue Sep 14, 2022 · 1 comment
Labels
good first issue Good for newcomers

Comments

@rudolfix
Copy link
Collaborator

rudolfix commented Sep 14, 2022

row filtering can be done in item transform and be added optionally to any resource. implementation in Schema is slow and only solves problems it has created :)

here's a GPT-4 prompt that writes correct function

Write me a function in python that takes a nested dictionary as input. the dictionary can contain dictionaries, lists and basic types as values. the keys are string. the function takes two more arguments: a list of exclude regexes and a list of include regexes. the regex is matching paths in the dictionary. the paths are similar to json path but a separator is __ if given element has a path matching exclude regex it is removed from dictionary. however if any of the nested (child) elements of that element matches include path it should stay but other elements should be removed

testing:
normalize tests are using various advanced modes of filtering. they must pass with new function

@rudolfix rudolfix changed the title [core] optimize filter_row of 'Schema` implement include and exclude filters in Schema as data item transformations Apr 9, 2023
@rudolfix rudolfix added the good first issue Good for newcomers label Apr 9, 2023
@amentee
Copy link

amentee commented Dec 3, 2023

@rudolfix I am a complete beginner in python but want to give a try to implement this function. Can you guide me which file I need to refer . I am assuming this - https://github.com/dlt-hub/dlt/blob/master/tests/common/schema/test_filtering.py . But please correct me and which function shall I work on in that file to implement the requirement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
Status: Todo
Development

No branches or pull requests

2 participants