Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inherit tags from nodes for modular pipelines #1878

Merged
merged 21 commits into from
May 7, 2024
Merged

Conversation

jitu5
Copy link
Contributor

@jitu5 jitu5 commented Apr 25, 2024

Description

Related to #1822

Incase of expandModularPipelines = false and the filtered nodes happened to be inside a modular pipeline, it should show the modular pipeline node (collapsed view) which seems to be missing on refresh. This PR aims to resolve this issue.

Development notes

Screen.Recording.2024-04-26.at.10.27.21.a.m.mov

QA notes

  • Added tests to cover changes.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@jitu5 jitu5 requested a review from SajidAlamQB April 25, 2024 13:37
jitu5 and others added 4 commits April 25, 2024 14:37
@jitu5 jitu5 self-assigned this Apr 25, 2024
@jitu5 jitu5 marked this pull request as ready for review April 26, 2024 15:15
@jitu5 jitu5 requested a review from rashidakanchwala as a code owner April 26, 2024 15:15
jitu5 and others added 3 commits April 26, 2024 16:47
@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Apr 30, 2024

Did a slightly different approach here main...inherit-tags .let me know what you think

@ravi-kumar-pilla
Copy link
Contributor

Did a slightly different approach here main...inherit-tags .let me know what you think

I think this updates the tags at the same time we add inputs/outputs to the immediate parent. We are anyway updating the grandparents here - expanded_tree[parent_id].tags.update(modular_pipeline_node.tags). Looks good to me. Thank you

jitu5 added 6 commits May 1, 2024 16:09
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Signed-off-by: Jitendra Gundaniya <[email protected]>
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks!!

Signed-off-by: Jitendra Gundaniya <[email protected]>
@@ -161,6 +162,27 @@ def add_output(self, modular_pipeline_id: str, output_node: GraphNode):
else:
self.tree[modular_pipeline_id].external_outputs.add(output_node.id)

def add_tags(self, modular_pipeline_id: str, node_tags: set):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going through the modular pipelines code and found that we update the modular_pipeline pipelines set with node.pipelines. May be we can update the tags at that point too.

Like at line 239 in this file inside extract_from_node function having something like - modular_pipeline.tags.update(node.tags). In that case we do not need this function and also do not need a call at line 203 in managers.py. What do you think ? @rashidakanchwala

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I believe @jitu5 approached it previously.
He had some code in the extract_from_node function, which I found very confusing because the function was originally defined to extract namespaces, but now it's getting tags.
I think it's important to have clean functions that handle one task each, rather than one function doing too many things.

I'm also personally confused by the extract_from_node function being called three times here - link to GitHub code.
This is the exactly code refactor/simplification that Ivan mentioned to us the other day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with having clean functions. If you see our add_pipeline function, it does all the below tasks -

  1. Populates the RegisteredPipelinesRepository
  2. Populates the ModularPipelinesRepository
  3. Populates the GraphNodesRepository
  4. Apart from the above tasks it also creates the taskNode, DataNode, assigns inputs/outputs to the created nodes

Now if we add updating tags of the created modularPipeline, the function add_pipeline is already complex and we are adding this bit which only belongs to modularPipeline.

Yes, extract_from_node also does more than just extracting namespace, it does the below tasks but everything here belongs to modularPipeline -

  1. Creates the modularPipeline
  2. Updates the pipelines set with the node's pipelines
  3. Adds the node as a child of the createdModularPipeline

We should probably rename this to something like add_modular_pipeline. Why I feel tags of the modular pipeline should be updated at this place, because this is where we create the modular pipeline using the node namespace and update the required attributes/properties. It is fine to have add_tags as a separate function which updates the tags set (not recursively though, how @jitu5 did initially) and call that from extract_from_node.

Also, I feel having clean functions is also about grouping tasks which belong to a particular section (in this case modularPipelines). We did this when adding resolve_dataset_factory_patterns which does the task of resolving dataset patterns in the catalog. But since this belongs to catalog we added it inside add_catalog. Again add_catalog is doing more than just adding catalog, it also adds tracking datasets (another function to refactor 🗡️ ).

Thank you

Signed-off-by: Jitendra Gundaniya <[email protected]>
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 !!

@jitu5 jitu5 merged commit 9e661f6 into main May 7, 2024
44 checks passed
@jitu5 jitu5 deleted the feature/modular_pipeline_tag branch May 7, 2024 09:19
@rashidakanchwala rashidakanchwala mentioned this pull request May 29, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants