Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inherit tags from nodes for modular pipelines #1878
Inherit tags from nodes for modular pipelines #1878
Changes from 17 commits
26b8900
15db91b
2e15a47
6e8dadc
218cfc8
1d8486c
aab6d36
2288268
70e18e0
0e9d51f
42d466c
7a205b0
c8f4983
b510696
398929d
89d467a
4c78489
f032b47
79e5c52
76f9ea6
d575a3f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going through the modular pipelines code and found that we update the modular_pipeline
pipelines
set withnode.pipelines
. May be we can update the tags at that point too.Like at line 239 in this file inside
extract_from_node
function having something like -modular_pipeline.tags.update(node.tags)
. In that case we do not need this function and also do not need a call at line 203 inmanagers.py
. What do you think ? @rashidakanchwalaThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how I believe @jitu5 approached it previously.
He had some code in the
extract_from_node
function, which I found very confusing because the function was originally defined to extract namespaces, but now it's getting tags.I think it's important to have clean functions that handle one task each, rather than one function doing too many things.
I'm also personally confused by the
extract_from_node
function being called three times here - link to GitHub code.This is the exactly code refactor/simplification that Ivan mentioned to us the other day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with having clean functions. If you see our
add_pipeline
function, it does all the below tasks -Now if we add updating tags of the created modularPipeline, the function
add_pipeline
is already complex and we are adding this bit which only belongs to modularPipeline.Yes,
extract_from_node
also does more than just extracting namespace, it does the below tasks but everything here belongs to modularPipeline -We should probably rename this to something like
add_modular_pipeline
. Why I feel tags of the modular pipeline should be updated at this place, because this is where we create the modular pipeline using the node namespace and update the required attributes/properties. It is fine to haveadd_tags
as a separate function which updates the tags set (not recursively though, how @jitu5 did initially) and call that fromextract_from_node
.Also, I feel having clean functions is also about grouping tasks which belong to a particular section (in this case modularPipelines). We did this when adding
resolve_dataset_factory_patterns
which does the task of resolving dataset patterns in the catalog. But since this belongs to catalog we added it insideadd_catalog
. Againadd_catalog
is doing more than just adding catalog, it also adds tracking datasets (another function to refactor 🗡️ ).Thank you