-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error messages for modular pipelines #2633
Comments
@astrojuanlu Can you add some steps to reproduce these confusing error message(s)? |
I haven't worked on a reproducer yet but here's another user puzzled by the error message https://www.linen.dev/s/kedro/t/13226590/hi-everyone-i-m-having-a-bit-of-hard-time-understanding-what#055add79-fc39-450f-98eb-b8a8746cd2e7
|
So I have lived experience teaching people to use the feature and this not being straightforward - a fuzzy suggestion workflow would go a long way I think. |
So, the way to reproduce this error is, starting from the spaceflights tutorial, to create a from kedro.pipeline import Pipeline, node, pipeline
from .nodes import evaluate_model, split_data, train_model
def create_pipeline(**kwargs) -> Pipeline:
pipeline_instance = pipeline(
[
node(
func=split_data,
# Note wrong input name
inputs=["model_input_table_NOT_FOUND", "params:model_options"],
outputs=["X_train", "X_test", "y_train", "y_test"],
name="split_data_node",
),
node(
func=train_model,
inputs=["X_train", "y_train"],
outputs="regressor",
name="train_model_node",
),
node(
func=evaluate_model,
inputs=["regressor", "X_test", "y_test"],
outputs=None,
name="evaluate_model_node",
),
]
)
ds_pipeline_1 = pipeline(
pipe=pipeline_instance,
inputs="model_input_table",
namespace="active_modelling_pipeline",
)
ds_pipeline_2 = pipeline(
pipe=pipeline_instance,
inputs="model_input_table",
namespace="candidate_modelling_pipeline",
)
return ds_pipeline_1 + ds_pipeline_2 And then the error will be
Why is this confusing? Because |
I think it would be very helpful to suggest:
|
Description
Some errors that happen when using modular pipelines could be more helpful.
For example: if a node has a non-existing dataset as input, the
ModularPipelineError
will actually include the name of the existing dataset in the catalog. You can reproduce that by taking the code of our tutorial:kedro/docs/source/tutorial/add_another_pipeline.md
Line 149 in f8230cd
And changing one of the inputs to
model_input_table_NOT_FOUND
will raise this error:(source: https://www.linen.dev/s/kedro/t/12314014/hi-everyone-i-am-trying-to-use-the-modular-pipeline-module-b#27abe516-0718-43d4-8924-2bd965f64d22)
Another one that recently confused an internal user: the "Inputs should be free inputs to the pipeline". A free input is "not an output from another node, thus unbound or free" (@idanov).
https://github.com/kedro-org/kedro/blob/f8230cdbc653f4c66194c34b91fd74b919ae7183/kedro/pipeline/modular_pipeline.py#L54C1-L55
Possible Implementation
In the first case, maybe the error checking code should first check the nodes inputs, to give a more helpful error message.
In the second case, the text could say for example "Inputs must not be outputs from another node".
Possible Alternatives
(Optional) Describe any alternative solutions or features you've considered.
The text was updated successfully, but these errors were encountered: