While nesting namespace pipelines, intermediary datasets get exposed to top level of the `viz` #1814

yury-fedotov · 2024-03-19T23:14:34Z

Description

I found out that there might be a potential bug in how kedro viz visualizes nested namespace pipelines. In short, if there is an outer namespace (I will use processing in my example going forward) that has a single input and a single free output, instead of visually collapsing everything in between this input and output in the namespace, kedro viz also exposes datasets shared by the inner namespaced pipelines to the top level of the viz.

Context

I encountered this issue in my project and, as discussed with @rashidakanchwala , opening an issue so the team can have a more detailed look. Also while writing this issue, as you'll see below, I created a very compact example of how you can reproduce this situation.

Steps to Reproduce

Create a new kedro project with viz installed, and make the following pipeline:

from kedro.pipeline import Pipeline, node
from kedro.pipeline.modular_pipeline import pipeline


def _get_generic_pipe() -> Pipeline:
    return Pipeline([
        node(
            func=lambda x: x,
            inputs="input_df",
            outputs="output_df",
        ),
    ])


def create_pipeline(**kwargs) -> Pipeline:
    pipe = Pipeline([
        pipeline(
            pipe=_get_generic_pipe(),
            inputs={"input_df": "input_to_processing"},
            outputs={"output_df": "post_first_pipe"},
            namespace="first_processing_step",
        ),
        pipeline(
            pipe=_get_generic_pipe(),
            inputs={"input_df": "post_first_pipe"},
            outputs={"output_df": "output_from_processing"},
            namespace="second_processing_step",
        ),
    ])
    return pipeline(
        pipe=pipe,
        inputs="input_to_processing",
        outputs="output_from_processing",
        namespace="processing",
    )

Then kedro viz run and see that post_first_pipe dataset, which should be fully encapsulated within processing namespace, gets exposed to the top level of viz.

Expected Result

Since post_first_pipe dataset is fully internal to processing namespace, it should be visually encapsulated there and not exposed to the top level of the viz.

Actual Result

What I actually see in the viz is this:

Let me highlight a few things here:

So the problem is that Post First Pipe is visualized as if it is a "free output" of the processing namespace, while it is not. It should be fully encapsulated.
An argument in favor of that is actually on the left: it is not mentioned under the "Search" bar.
The other argument is: if I hover over this Post First Pipe, the tooltip is processing.post_first_pipe, so a logic that generates this name is aware that it's internal to processing
The other argument is: if I do kedro run and see the logs, it actually captures those nested namespaces accurately. Meaning that prefixes are added where needed and not added where not needed.
Another argument why that shouldn't be the case: this page of the docs mentions that namespaces can be nested an arbitrary number of times. Which is definetely true from the kedro run perspective, but then to support it in the viz, this exposure of internal datasets should probably not be happening.

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

Web browser system and version: Google Chrome
Operating system and version: Tried on both Win and Mac
NodeJS version used (if relevant): No idea
Kedro version used (if relevant): ~=0.19.2
Kedro viz version used (if relevant): Tried both 7 and 8
Python version used (if relevant): ~=3.10.0

Checklist

Include labels so that we can categorise your issue

The text was updated successfully, but these errors were encountered:

Major refactoring of namespace pipelines. Resolves #1899 , #1814

rashidakanchwala · 2024-11-04T13:17:54Z

This is done as well.

yury-fedotov added the Issue: Bug Report label Mar 19, 2024

rashidakanchwala added this to Kedro-Viz Mar 25, 2024

rashidakanchwala moved this to Inbox in Kedro-Viz Mar 25, 2024

NeroOkwa moved this from Inbox to Backlog in Kedro-Viz Mar 25, 2024

rashidakanchwala moved this from Backlog to Todo in Kedro-Viz Apr 15, 2024

rashidakanchwala self-assigned this Apr 15, 2024

rashidakanchwala mentioned this issue Apr 16, 2024

Fix Flowchart bug with datasets within a nested modular pipeline. #1863

Closed

5 tasks

rashidakanchwala moved this from Todo to In Review in Kedro-Viz Apr 16, 2024

rashidakanchwala moved this from In Review to In Progress in Kedro-Viz Apr 18, 2024

rashidakanchwala assigned ravi-kumar-pilla and unassigned rashidakanchwala Apr 29, 2024

yury-fedotov mentioned this issue May 24, 2024

Refactor Modular Pipelines in the Kedro-viz backend #1899

Closed

1 task

This was referenced Jun 10, 2024

Refactor data access manager for modular pipelines change #1939

Closed

Refactor api tests for modular pipelines change #1940

Closed

Refactor modular pipelines #1941

Closed

ravi-kumar-pilla moved this from In Progress to In Review in Kedro-Viz Jun 21, 2024

yury-fedotov mentioned this issue Jun 30, 2024

Revise modular pipelines docs kedro-org/kedro#3948

Merged

7 tasks

rashidakanchwala mentioned this issue Jul 2, 2024

Refactor Namespace Pipelines #1897

Merged

9 tasks

rashidakanchwala added a commit that referenced this issue Jul 2, 2024

Refactor Namespace Pipelines (#1897)

35d351f

Major refactoring of namespace pipelines. Resolves #1899 , #1814

ravi-kumar-pilla moved this from In Review to Done in Kedro-Viz Jul 2, 2024

rashidakanchwala closed this as completed Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

While nesting namespace pipelines, intermediary datasets get exposed to top level of the `viz` #1814

While nesting namespace pipelines, intermediary datasets get exposed to top level of the `viz` #1814

yury-fedotov commented Mar 19, 2024

rashidakanchwala commented Nov 4, 2024

While nesting namespace pipelines, intermediary datasets get exposed to top level of the viz #1814

While nesting namespace pipelines, intermediary datasets get exposed to top level of the viz #1814

Comments

yury-fedotov commented Mar 19, 2024

Description

Context

Steps to Reproduce

Expected Result

Actual Result

Your Environment

Checklist

rashidakanchwala commented Nov 4, 2024

While nesting namespace pipelines, intermediary datasets get exposed to top level of the `viz` #1814

While nesting namespace pipelines, intermediary datasets get exposed to top level of the `viz` #1814