Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise modular pipelines docs #3948

Merged
merged 33 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
18a46ee
Docs update
DimedS Jun 11, 2024
8a7d370
Docs update
DimedS Jun 11, 2024
9fbf6b3
Merge branch 'main' into 1998-revise-the-modular-pipelines-docs
DimedS Jun 11, 2024
653dcde
Fix header
DimedS Jun 11, 2024
53f94fe
Change filename
DimedS Jun 11, 2024
8678eb7
Fix external link
DimedS Jun 11, 2024
23b0989
Fix external link
DimedS Jun 11, 2024
682efb1
Fix external links
DimedS Jun 11, 2024
c8e75a1
Apply suggestions from code review
DimedS Jun 17, 2024
9ba96fc
Merge branch 'main' into 1998-revise-the-modular-pipelines-docs
DimedS Jun 17, 2024
7c93b58
Address review comments
DimedS Jun 17, 2024
bdae1ec
Address review comments
DimedS Jun 18, 2024
3d37b5c
Apply suggestions from code review
DimedS Jun 20, 2024
c907c28
Address review comments
DimedS Jun 20, 2024
5999491
Merge branch 'main' into 1998-revise-the-modular-pipelines-docs
DimedS Jun 20, 2024
e235a24
Fix old links
DimedS Jun 20, 2024
7b06d9c
Apply suggestions from code review
DimedS Jun 21, 2024
952c4fa
Address review comments
DimedS Jun 21, 2024
8fcc3c4
Address review comments
DimedS Jun 24, 2024
a58d31e
Apply suggestions from code review
DimedS Jun 26, 2024
77ed3aa
Revise namespaces
DimedS Jun 28, 2024
4e2c8ac
Revise namespaces examples
DimedS Jun 28, 2024
faa08b2
Apply suggestions from code review
DimedS Jun 28, 2024
84f444f
Address review comments
DimedS Jun 28, 2024
aa4a3c2
Apply suggestions from code review
DimedS Jul 1, 2024
0ac37be
Address review comments
DimedS Jul 1, 2024
2fbf15e
Address review comments
DimedS Jul 1, 2024
0c14ce2
Address review comments
DimedS Jul 1, 2024
6b36ab9
Address review comments
DimedS Jul 1, 2024
bc04d86
Remove cooking example
DimedS Jul 1, 2024
c46dab3
Apply suggestions from code review
DimedS Jul 2, 2024
1cfc4d7
Add comments to create_pipeline func
DimedS Jul 2, 2024
cf63f67
Merge branch 'main' into 1998-revise-the-modular-pipelines-docs
DimedS Jul 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/faq/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@

## Nodes and pipelines

* [How do I create a modular pipeline](../nodes_and_pipelines/modular_pipelines.md#how-do-i-create-a-modular-pipeline)?

* [How can I create a new blank pipeline](../nodes_and_pipelines/modular_pipelines.md#how-to-create-a-new-blank-pipeline-using-the-kedro-pipeline-create-command)?

Check warning on line 55 in docs/source/faq/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/faq/faq.md#L55

[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.", "location": {"path": "docs/source/faq/faq.md", "range": {"start": {"line": 55, "column": 12}}}, "severity": "WARNING"}
* [How can I reuse my pipelines](../nodes_and_pipelines/namespaces.md)?

Check warning on line 56 in docs/source/faq/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/faq/faq.md#L56

[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'I'.", "location": {"path": "docs/source/faq/faq.md", "range": {"start": {"line": 56, "column": 12}}}, "severity": "WARNING"}

Check warning on line 56 in docs/source/faq/faq.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/faq/faq.md#L56

[Kedro.pronouns] Avoid first-person singular pronouns such as 'my'.
Raw output
{"message": "[Kedro.pronouns] Avoid first-person singular pronouns such as 'my'.", "location": {"path": "docs/source/faq/faq.md", "range": {"start": {"line": 56, "column": 20}}}, "severity": "WARNING"}
* [Can I use generator functions in a node](../nodes_and_pipelines/nodes.md#how-to-use-generator-functions-in-a-node)?

## What is data engineering convention?
Expand Down
Binary file modified docs/source/meta/images/cook_disjointed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/meta/images/cook_joined.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/meta/images/cook_no_namespace.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/source/meta/images/cook_params.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/nodes_and_pipelines/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
nodes
pipeline_introduction
modular_pipelines
namespaces
pipeline_registry
micro_packaging
run_a_pipeline
Expand Down
335 changes: 84 additions & 251 deletions docs/source/nodes_and_pipelines/modular_pipelines.md

Large diffs are not rendered by default.

166 changes: 166 additions & 0 deletions docs/source/nodes_and_pipelines/namespaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Reuse pipelines with namespaces

## How to reuse your pipelines

If you want to create a new pipeline that performs similar tasks with different inputs/outputs/parameters as your existing_pipeline, you can use the same `pipeline()` creation function as described in [How to structure your pipeline creation](modular_pipelines.md#how-to-structure-your-pipeline-creation). This function allows you to overwrite inputs, outputs, and parameters. Your new pipeline creation code should look like this:

Check notice on line 5 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L5

[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.
Raw output
{"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 5, "column": 1}}}, "severity": "INFO"}

Check warning on line 5 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L5

[Kedro.Spellings] Did you really mean 'existing_pipeline'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'existing_pipeline'?", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 5, "column": 115}}}, "severity": "WARNING"}
DimedS marked this conversation as resolved.
Show resolved Hide resolved

```python
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
existing_pipeline, # Name of the existing Pipeline object
inputs = {"old_input_df_name" : "new_input_df_name"}, # Mapping existing Pipeline input to new input
outputs = {"old_output_df_name" : "new_output_df_name"}, # Mapping existing Pipeline output to new output
parameters = {"params: model_options": "params: new_model_options"}, # Updating parameters
)
```

This means you can create multiple pipelines based on the `existing_pipeline` pipeline to test different approaches with various input datasets and model training parameters. For example, for the `data_science` pipeline from our [Spaceflights tutorial](../tutorial/add_another_pipeline.md#data-science-pipeline), you can restructure the `src/project_name/pipelines/data_science/pipeline.py` file by separating the `data_science` pipeline creation code into a separate `base_data_science` pipeline object, then reusing it inside the `create_pipeline()` function:

Check warning on line 17 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L17

[Kedro.toowordy] 'multiple' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'multiple' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 17, "column": 27}}}, "severity": "WARNING"}

Check notice on line 17 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L17

[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.
Raw output
{"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 17, "column": 176}}}, "severity": "INFO"}

```python
#src/project_name/pipelines/data_science/pipeline.py

from kedro.pipeline import Pipeline, node, pipeline
from .nodes import evaluate_model, split_data, train_model

base_data_science = pipeline(
[
node(
func=split_data,
inputs=["model_input_table", "params:model_options"],
outputs=["X_train", "X_test", "y_train", "y_test"],
name="split_data_node",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And similarly in other nodes, because in all the places this name is used, it is clear that it's a node.

Suggested change
name="split_data_node",
name="split_data",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave this as it is because it's currently how it's written in our starters, and I don't want to confuse users with different naming since our example is based on that starter.

),
node(
func=train_model,
inputs=["X_train", "y_train"],
outputs="regressor",
name="train_model_node",
),
node(
func=evaluate_model,
inputs=["regressor", "X_test", "y_test"],
outputs=None,
name="evaluate_model_node",
),
]
) # Creating a base data science pipeline that will be reused with different model training parameters


def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
[base_data_science], # Creating a new data_science pipeline based on base_data_science pipeline
parameters={"params:model_options": "params:model_options_1"}, # Using a new set of parameters to train model
)
Comment on lines +49 to +53
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not using namespace? Isn't this page supposed to be about that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the point of the first section titled "How to reuse your pipelines" is to show that you can reuse your pipelines without needing namespaces. However, by adding namespaces, you can reuse your pipelines multiple times, with namespaces providing the necessary support. So my point is: reuse is the primary idea, and namespaces are an enhancement.

```

To use a new set of parameters, you should create a second parameters file to ovewrite parameters specified in `conf/base/parameters.yml`. To overwrite the parameter `model_options`, create a file `conf/base/parameters_data_science.yml` and add a parameter called `model_options_1`:

Check warning on line 56 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L56

[Kedro.Spellings] Did you really mean 'ovewrite'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'ovewrite'?", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 56, "column": 79}}}, "severity": "WARNING"}

```python
#conf/base/parameters.yml
model_options_1:
test_size: 0.15
random_state: 3
features:
- passenger_capacity
- crew
- d_check_complete
- moon_clearance_complete
- company_rating
```

> In Kedro, you cannot run pipelines with the same node names. In this example, both pipelines have nodes with the same names, so it's impossible to execute them together. However, `base_data_science` is not registered and will not be executed with the `kedro run` command. The `data_science` pipeline, on the other hand, will be executed during `kedro run` because it will be autodiscovered by Kedro, as it was created inside the `create_pipeline()` function.

Check warning on line 71 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L71

[Kedro.toowordy] 'However' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'However' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 71, "column": 173}}}, "severity": "WARNING"}

Check warning on line 71 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L71

[Kedro.toowordy] 'on the other hand' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'on the other hand' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 71, "column": 304}}}, "severity": "WARNING"}

Check warning on line 71 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L71

[Kedro.Spellings] Did you really mean 'autodiscovered'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'autodiscovered'?", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 71, "column": 378}}}, "severity": "WARNING"}

Check warning on line 71 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L71

[Kedro.toowordy] 'it was' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'it was' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 71, "column": 406}}}, "severity": "WARNING"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this confusing because in the code snippet above :

def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [base_data_science],  # Creating a new data_science pipeline based on base_data_science pipeline
        parameters={"params:model_options": "params:model_options_1"},  # Using a new set of parameters to train model
    )

base_data_science is created inside create_pipeline()...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than in the explanation there's no actual reference to data_science anywhere in the code examples on this page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it by adding the comment # data_science pipeline creation function before the create_pipeline() function. However, as I understand it, I have to use create_pipeline() name to ensure the pipeline will be autodiscovered. Do you have other ideas on how to highlight the data_science name inside the code?


If you want to execute `base_data_science` and `data_science` pipelines together or reuse `base_data_science` a few more times, you need to modify the node names. The easiest way to do this is by using namespaces.

Check warning on line 73 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L73

[Kedro.weaselwords] 'few' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'few' is a weasel word!", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 73, "column": 113}}}, "severity": "WARNING"}

Check warning on line 73 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L73

[Kedro.toowordy] 'modify' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'modify' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 73, "column": 141}}}, "severity": "WARNING"}

## What is a namespace

A namespace is a way to isolate nodes, inputs, outputs, and parameters inside your pipeline. If you put `namespace="namespace_name"` attribute inside the `pipeline()` creation function, it will add the `namespace_name.` prefix to all nodes, inputs, outputs, and parameters inside your new pipeline.

> If you don't want to change the names of your inputs, outputs, or parameters with the `namespace_name.` prefix while using a namespace, you should list these objects inside the corresponding parameters of the `pipeline()` creation function like this:
> `inputs={"input_that_should_not_be_prefixed"}`

Let's extend our previous example and try to reuse the `base_data_science` pipeline one more time by creating another pipeline based on it. First, we should use the `kedro pipeline create` command to create a new blank pipeline named `data_science_2`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is very very unclear. Why to reuse an existing pipeline that's designed to be modular you're running kedro pipeline create? Isn't a more intuitive flow to define this abstract, unregistered pipeline once and then initialize it's actual instances to be registed within the same project/pipelines/pipeline folder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that proposal, @yury-fedotov. When I wrote that manual, I also thought about how I can reuse a pipeline created with kedro pipeline create command since the pipeline object is created inside the function. Should I create an additional pipeline with the kedro pipeline create command to reuse the first one? I believe it's important to provide our users with best practices, and I would be happy to hear more opinions on that. Maybe @astrojuanlu and @merelcht have some insights.

Copy link
Member

@merelcht merelcht Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view it doesn't matter how you create the pipelines, what's important is how you re-use them in the end. It's up to the user if they want the pipelines to be in the same folder or separated out (which is what happens when you do kedro pipeline create).


```python
kedro pipeline create data_science_2
```
Then, we need to modify the `src/project_name/pipelines/data_science_2/pipeline.py` file to create a pipeline in a similar way to the example above. We will import `base_data_science` from the code above and use a namespace to isolate our nodes:

Check warning on line 87 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L87

[Kedro.toowordy] 'modify' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'modify' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 87, "column": 18}}}, "severity": "WARNING"}

```python
#src/project_name/pipelines/data_science_2/pipeline.py
from kedro.pipeline import Pipeline, pipeline
from ..data_science.pipeline import base_data_science # Import pipeline to create a new one based on it

def create_pipeline() -> Pipeline:
return pipeline(
base_data_science, # Creating a new data_science_2 pipeline based on base_data_science pipeline
namespace = "ds_2", # With that namespace, "ds_2." prefix will be added to inputs, outputs, params, and node names
parameters={"params:model_options": "params:model_options_2"}, # Using a new set of parameters to train model
inputs={"model_input_table"}, # Inputs remain the same, without namespace prefix
)
```

To use a new set of parameters, copy `model_options` from `conf/base/parameters_data_science.yml` to `conf/base/parameters_data_science_2.yml` and modify it slightly to try new model training parameters, such as test size and a different feature set. Call it `model_options_2`:

Check warning on line 103 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L103

[Kedro.toowordy] 'modify' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'modify' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 103, "column": 148}}}, "severity": "WARNING"}

```python
#conf/base/parameters.yml
model_options_2:
test_size: 0.3
random_state: 3
features:
- d_check_complete
- moon_clearance_complete
- iata_approved
- company_rating
```

In this example, all nodes inside the `data_science_2` pipeline will be prefixed with `ds_2`: `ds_2.split_data`, `ds_2.train_model`, `ds_2.evaluate_model`. Parameters will be used from `model_options_2` because we overwrite `model_options` with them. The input for that pipeline will be `model_input_table` as it was previously, because we mentioned that in the inputs parameter (without that, the input would be modified to `ds_2.model_input_table`, but we don't have that table in the pipeline).

Check notice on line 117 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L117

[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.
Raw output
{"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 117, "column": 252}}}, "severity": "INFO"}

Check warning on line 117 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L117

[Kedro.toowordy] 'it was' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'it was' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 117, "column": 311}}}, "severity": "WARNING"}

Check warning on line 117 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L117

[Kedro.toowordy] 'previously' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'previously' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 117, "column": 318}}}, "severity": "WARNING"}

Check warning on line 117 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L117

[Kedro.weaselwords] 'previously' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'previously' is a weasel word!", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 117, "column": 318}}}, "severity": "WARNING"}

Since the node names are unique now, we can run the project with:

```python
kedro run
```

Logs show that `data_science` and `data_science_2` pipelines were executed successfully with different R2 results. Now, we can see how Kedro-viz renders namespaced pipelines in collapsible "super nodes":

Check warning on line 125 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L125

[Kedro.weaselwords] 'successfully' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'successfully' is a weasel word!", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 125, "column": 76}}}, "severity": "WARNING"}

```python
kedro viz run
```

After running viz, we can see two equal pipelines: `data_science` and `data_science_2`:

![namespaces uncollapsed](../meta/images/namespaces_uncollapsed.png)

Check warning on line 133 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L133

[Kedro.Spellings] Did you really mean 'uncollapsed'?
Raw output
{"message": "[Kedro.Spellings] Did you really mean 'uncollapsed'?", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 133, "column": 14}}}, "severity": "WARNING"}

We can collapse all namespaced pipelines (in our case, it's only `data_science_2`) with a special button and see that the `data_science_2` pipeline was collapsed into one super node called `Ds 2`:

Check warning on line 135 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L135

[Kedro.weaselwords] 'only' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'only' is a weasel word!", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 135, "column": 61}}}, "severity": "WARNING"}

![namespaces collapsed](../meta/images/namespaces_collapsed.png)

> Tip: You can use `kedro run --namespace = namespace_name` to run only the specific namespace

Check warning on line 139 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L139

[Kedro.weaselwords] 'only' is a weasel word!
Raw output
{"message": "[Kedro.weaselwords] 'only' is a weasel word!", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 139, "column": 68}}}, "severity": "WARNING"}
DimedS marked this conversation as resolved.
Show resolved Hide resolved


### How to namespace all pipelines in a project

If we want to make all pipelines in this example fully namespaced, we should:

Modify the `data_processing` pipeline by adding to the `pipeline()` creation function in `src/project_name/pipelines/data_processing/pipeline.py` with the following code:

Check warning on line 146 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L146

[Kedro.toowordy] 'Modify' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'Modify' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 146, "column": 1}}}, "severity": "WARNING"}
```python
namespace="data_processing",
inputs={"companies", "shuttles", "reviews"}, # Inputs remain the same, without namespace prefix
outputs={"model_input_table"}, # Outputs remain the same, without namespace prefix
```
Modify the `data_science` pipeline by adding namespace and inputs in the same way as it was done in `data_science_2` pipeline:

Check warning on line 152 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L152

[Kedro.toowordy] 'Modify' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'Modify' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 152, "column": 1}}}, "severity": "WARNING"}

Check warning on line 152 in docs/source/nodes_and_pipelines/namespaces.md

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/source/nodes_and_pipelines/namespaces.md#L152

[Kedro.toowordy] 'it was' is too wordy
Raw output
{"message": "[Kedro.toowordy] 'it was' is too wordy", "location": {"path": "docs/source/nodes_and_pipelines/namespaces.md", "range": {"start": {"line": 152, "column": 86}}}, "severity": "WARNING"}

```python
def create_pipeline(**kwargs) -> Pipeline:
return pipeline(
base_data_science,
namespace="ds_1",
parameters={"params:model_options": "params:model_options_1"},
inputs={"model_input_table"},
)
```

After executing the pipeline with `kedro run`, the visualisation with `kedro viz run` after collapsing will look like this:

![namespaces collapsed all](../meta/images/namespaces_collapsed_all.png)
Loading