Improve error messages for modular pipelines #2633

astrojuanlu · 2023-06-02T13:41:39Z

Description

Some errors that happen when using modular pipelines could be more helpful.

For example: if a node has a non-existing dataset as input, the ModularPipelineError will actually include the name of the existing dataset in the catalog. You can reproduce that by taking the code of our tutorial:

kedro/docs/source/tutorial/add_another_pipeline.md

Line 149 in f8230cd

inputs=["model_input_table", "params:model_options"],

And changing one of the inputs to model_input_table_NOT_FOUND will raise this error:

ModularPipelineError: Failed to map datasets and/or parameters: model_input_table

(source: https://www.linen.dev/s/kedro/t/12314014/hi-everyone-i-am-trying-to-use-the-modular-pipeline-module-b#27abe516-0718-43d4-8924-2bd965f64d22)

Another one that recently confused an internal user: the "Inputs should be free inputs to the pipeline". A free input is "not an output from another node, thus unbound or free" (@idanov).

https://github.com/kedro-org/kedro/blob/f8230cdbc653f4c66194c34b91fd74b919ae7183/kedro/pipeline/modular_pipeline.py#L54C1-L55

Possible Implementation

In the first case, maybe the error checking code should first check the nodes inputs, to give a more helpful error message.

In the second case, the text could say for example "Inputs must not be outputs from another node".

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

The text was updated successfully, but these errors were encountered:

merelcht · 2023-07-17T13:22:20Z

@astrojuanlu Can you add some steps to reproduce these confusing error message(s)?

astrojuanlu · 2023-07-19T15:05:00Z

I haven't worked on a reproducer yet but here's another user puzzled by the error message https://www.linen.dev/s/kedro/t/13226590/hi-everyone-i-m-having-a-bit-of-hard-time-understanding-what#055add79-fc39-450f-98eb-b8a8746cd2e7

Could some kindly unpack / explain it ?

datajoely · 2023-08-31T11:02:21Z

So I have lived experience teaching people to use the feature and this not being straightforward - a fuzzy suggestion workflow would go a long way I think.

astrojuanlu · 2023-11-25T10:16:53Z

So, the way to reproduce this error is, starting from the spaceflights tutorial, to create a data_science/pipeline.py as follows:

from kedro.pipeline import Pipeline, node, pipeline

from .nodes import evaluate_model, split_data, train_model


def create_pipeline(**kwargs) -> Pipeline:
    pipeline_instance = pipeline(
        [
            node(
                func=split_data,
                # Note wrong input name
                inputs=["model_input_table_NOT_FOUND", "params:model_options"],
                outputs=["X_train", "X_test", "y_train", "y_test"],
                name="split_data_node",
            ),
            node(
                func=train_model,
                inputs=["X_train", "y_train"],
                outputs="regressor",
                name="train_model_node",
            ),
            node(
                func=evaluate_model,
                inputs=["regressor", "X_test", "y_test"],
                outputs=None,
                name="evaluate_model_node",
            ),
        ]
    )

    ds_pipeline_1 = pipeline(
        pipe=pipeline_instance,
        inputs="model_input_table",
        namespace="active_modelling_pipeline",
    )
    ds_pipeline_2 = pipeline(
        pipe=pipeline_instance,
        inputs="model_input_table",
        namespace="candidate_modelling_pipeline",
    )

    return ds_pipeline_1 + ds_pipeline_2

And then the error will be

ModularPipelineError: Failed to map datasets and/or parameters: model_input_table

Why is this confusing? Because model_input_table is a well-defined dataset in the catalog. But the error actually means that there's a mismatch between the input declared in the namepaced pipeline and the one originally declared in the pipeline instance.

datajoely · 2023-11-26T21:11:00Z

I think it would be very helpful to suggest:

partial matches which helps solves the namespace mismatch issue
fuzzy matches for typos

astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Jun 2, 2023

astrojuanlu mentioned this issue Jul 20, 2023

Improve documentation about namespaces #2825

Closed

merelcht added this to the Improve Developer Experience milestone Jan 12, 2024

AhdraMeraliQB self-assigned this Mar 12, 2024

AhdraMeraliQB mentioned this issue Mar 14, 2024

Improve error messages for modular pipelines #3716

Merged

7 tasks

AhdraMeraliQB closed this as completed in #3716 Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error messages for modular pipelines #2633

Improve error messages for modular pipelines #2633

astrojuanlu commented Jun 2, 2023

merelcht commented Jul 17, 2023

astrojuanlu commented Jul 19, 2023

datajoely commented Aug 31, 2023

astrojuanlu commented Nov 25, 2023

datajoely commented Nov 26, 2023

Improve error messages for modular pipelines #2633

Improve error messages for modular pipelines #2633

Comments

astrojuanlu commented Jun 2, 2023

Description

Possible Implementation

Possible Alternatives

merelcht commented Jul 17, 2023

astrojuanlu commented Jul 19, 2023

datajoely commented Aug 31, 2023

astrojuanlu commented Nov 25, 2023

datajoely commented Nov 26, 2023