-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inherit tags from nodes for modular pipelines #1878
Conversation
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: <>
Signed-off-by: <>
Signed-off-by: <>
Signed-off-by: <>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Did a slightly different approach here main...inherit-tags .let me know what you think |
I think this updates the tags at the same time we add inputs/outputs to the immediate parent. We are anyway updating the grandparents here - |
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. thanks!!
Signed-off-by: Jitendra Gundaniya <[email protected]>
@@ -161,6 +162,27 @@ def add_output(self, modular_pipeline_id: str, output_node: GraphNode): | |||
else: | |||
self.tree[modular_pipeline_id].external_outputs.add(output_node.id) | |||
|
|||
def add_tags(self, modular_pipeline_id: str, node_tags: set): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going through the modular pipelines code and found that we update the modular_pipeline pipelines
set with node.pipelines
. May be we can update the tags at that point too.
Like at line 239 in this file inside extract_from_node
function having something like - modular_pipeline.tags.update(node.tags)
. In that case we do not need this function and also do not need a call at line 203 in managers.py
. What do you think ? @rashidakanchwala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how I believe @jitu5 approached it previously.
He had some code in the extract_from_node
function, which I found very confusing because the function was originally defined to extract namespaces, but now it's getting tags.
I think it's important to have clean functions that handle one task each, rather than one function doing too many things.
I'm also personally confused by the extract_from_node
function being called three times here - link to GitHub code.
This is the exactly code refactor/simplification that Ivan mentioned to us the other day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with having clean functions. If you see our add_pipeline
function, it does all the below tasks -
- Populates the RegisteredPipelinesRepository
- Populates the ModularPipelinesRepository
- Populates the GraphNodesRepository
- Apart from the above tasks it also creates the taskNode, DataNode, assigns inputs/outputs to the created nodes
Now if we add updating tags of the created modularPipeline, the function add_pipeline
is already complex and we are adding this bit which only belongs to modularPipeline.
Yes, extract_from_node
also does more than just extracting namespace, it does the below tasks but everything here belongs to modularPipeline -
- Creates the modularPipeline
- Updates the pipelines set with the node's pipelines
- Adds the node as a child of the createdModularPipeline
We should probably rename this to something like add_modular_pipeline
. Why I feel tags of the modular pipeline should be updated at this place, because this is where we create the modular pipeline using the node namespace and update the required attributes/properties. It is fine to have add_tags
as a separate function which updates the tags set (not recursively though, how @jitu5 did initially) and call that from extract_from_node
.
Also, I feel having clean functions is also about grouping tasks which belong to a particular section (in this case modularPipelines). We did this when adding resolve_dataset_factory_patterns
which does the task of resolving dataset patterns in the catalog. But since this belongs to catalog we added it inside add_catalog
. Again add_catalog
is doing more than just adding catalog, it also adds tracking datasets (another function to refactor 🗡️ ).
Thank you
Signed-off-by: Jitendra Gundaniya <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 !!
Description
Related to #1822
Incase of expandModularPipelines = false and the filtered nodes happened to be inside a modular pipeline, it should show the modular pipeline node (collapsed view) which seems to be missing on refresh. This PR aims to resolve this issue.
Development notes
Screen.Recording.2024-04-26.at.10.27.21.a.m.mov
QA notes
Checklist
RELEASE.md
file