Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Flowchart bug with datasets within a nested modular pipeline. #1863

Closed
wants to merge 6 commits into from

Conversation

rashidakanchwala
Copy link
Contributor

@rashidakanchwala rashidakanchwala commented Apr 16, 2024

Description

Resolves #1814

Development notes

There was a bug in the logic of modular pipelines. Essentially, datasets that served as inputs and outputs to nested modular pipelines (internal_inputs/internal_outputs to modular pipeline) were mistakenly treated as external_inputs/external_outputs to the modular pipeline. This occurred because we were only checking if datasets were internal by comparing them against the nested modular pipeline, neglecting to verify against the parent modular pipeline. Now, we check against both the nested modular pipeline and the parent modular pipeline to determine whether the dataset is either an internal input/output to the modular popular or external input/output.

QA notes

You can verify that the issue is resolved by comparing it to the pipeline example shared in issue #1814.

Additionally, if you examine example number 2 in issue #1651, which includes a nested pipeline, you'll notice that the problem with 'main_pipeline.dataset_1' is fixed. Now, it's hidden inside the 'main_pipeline' node when viewed in collapsed mode.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@@ -14,6 +14,28 @@
)


def _check_is_internal_input_output(
modular_pipeline_id: str, input_node_modular_pipeline: List
) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition is a bit confusing to me, can we have -

def _is_internal_node(
    modular_pipeline_id: str, internal_modular_pipelines: List
) -> Optional[str]:

@rashidakanchwala rashidakanchwala marked this pull request as draft April 17, 2024 08:07
@yury-fedotov
Copy link
Contributor

Thanks for addressing this issue! Hope the example I provided helped localize it. I think it's a great enabler for kedro viz adoption in large projects.

@rashidakanchwala
Copy link
Contributor Author

@yuryfedotov-mck - We've started addressing the issue you raised, but unfortunately, the solution we implemented doesn't work with deeply nested pipelines. We are doing further investigation to find a fix. Since modular pipelines are quite complex, this may take some time. Please bear with us as we work through it

@rashidakanchwala rashidakanchwala deleted the fix-nested-mod-pipelines-issue branch May 30, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

While nesting namespace pipelines, intermediary datasets get exposed to top level of the viz
3 participants