Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While nesting namespace pipelines, intermediary datasets get exposed to top level of the viz #1814

Closed
1 task
yury-fedotov opened this issue Mar 19, 2024 · 1 comment
Assignees

Comments

@yury-fedotov
Copy link
Contributor

Description

I found out that there might be a potential bug in how kedro viz visualizes nested namespace pipelines. In short, if there is an outer namespace (I will use processing in my example going forward) that has a single input and a single free output, instead of visually collapsing everything in between this input and output in the namespace, kedro viz also exposes datasets shared by the inner namespaced pipelines to the top level of the viz.

Context

I encountered this issue in my project and, as discussed with @rashidakanchwala , opening an issue so the team can have a more detailed look. Also while writing this issue, as you'll see below, I created a very compact example of how you can reproduce this situation.

Steps to Reproduce

Create a new kedro project with viz installed, and make the following pipeline:

from kedro.pipeline import Pipeline, node
from kedro.pipeline.modular_pipeline import pipeline


def _get_generic_pipe() -> Pipeline:
    return Pipeline([
        node(
            func=lambda x: x,
            inputs="input_df",
            outputs="output_df",
        ),
    ])


def create_pipeline(**kwargs) -> Pipeline:
    pipe = Pipeline([
        pipeline(
            pipe=_get_generic_pipe(),
            inputs={"input_df": "input_to_processing"},
            outputs={"output_df": "post_first_pipe"},
            namespace="first_processing_step",
        ),
        pipeline(
            pipe=_get_generic_pipe(),
            inputs={"input_df": "post_first_pipe"},
            outputs={"output_df": "output_from_processing"},
            namespace="second_processing_step",
        ),
    ])
    return pipeline(
        pipe=pipe,
        inputs="input_to_processing",
        outputs="output_from_processing",
        namespace="processing",
    )

Then kedro viz run and see that post_first_pipe dataset, which should be fully encapsulated within processing namespace, gets exposed to the top level of viz.

Expected Result

Since post_first_pipe dataset is fully internal to processing namespace, it should be visually encapsulated there and not exposed to the top level of the viz.

Actual Result

What I actually see in the viz is this:

Screenshot 2024-03-19 at 6 08 48 PM

Let me highlight a few things here:

  • So the problem is that Post First Pipe is visualized as if it is a "free output" of the processing namespace, while it is not. It should be fully encapsulated.
  • An argument in favor of that is actually on the left: it is not mentioned under the "Search" bar.
  • The other argument is: if I hover over this Post First Pipe, the tooltip is processing.post_first_pipe, so a logic that generates this name is aware that it's internal to processing
  • The other argument is: if I do kedro run and see the logs, it actually captures those nested namespaces accurately. Meaning that prefixes are added where needed and not added where not needed.
  • Another argument why that shouldn't be the case: this page of the docs mentions that namespaces can be nested an arbitrary number of times. Which is definetely true from the kedro run perspective, but then to support it in the viz, this exposure of internal datasets should probably not be happening.

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

  • Web browser system and version: Google Chrome
  • Operating system and version: Tried on both Win and Mac
  • NodeJS version used (if relevant): No idea
  • Kedro version used (if relevant): ~=0.19.2
  • Kedro viz version used (if relevant): Tried both 7 and 8
  • Python version used (if relevant): ~=3.10.0

Checklist

  • Include labels so that we can categorise your issue
@rashidakanchwala rashidakanchwala moved this to Inbox in Kedro-Viz Mar 25, 2024
@NeroOkwa NeroOkwa moved this from Inbox to Backlog in Kedro-Viz Mar 25, 2024
@rashidakanchwala rashidakanchwala moved this from Backlog to Todo in Kedro-Viz Apr 15, 2024
@rashidakanchwala rashidakanchwala self-assigned this Apr 15, 2024
@rashidakanchwala rashidakanchwala moved this from Todo to In Review in Kedro-Viz Apr 16, 2024
@rashidakanchwala rashidakanchwala moved this from In Review to In Progress in Kedro-Viz Apr 18, 2024
@ravi-kumar-pilla ravi-kumar-pilla moved this from In Progress to In Review in Kedro-Viz Jun 21, 2024
rashidakanchwala added a commit that referenced this issue Jul 2, 2024
Major refactoring of namespace pipelines. Resolves #1899 , #1814
@ravi-kumar-pilla ravi-kumar-pilla moved this from In Review to Done in Kedro-Viz Jul 2, 2024
@rashidakanchwala
Copy link
Contributor

This is done as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants