Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird error messaging during pipeline execution #3678

Closed
nejox opened this issue Mar 4, 2024 · 9 comments
Closed

Weird error messaging during pipeline execution #3678

nejox opened this issue Mar 4, 2024 · 9 comments
Assignees
Labels
Community Issue/PR opened by the open-source community

Comments

@nejox
Copy link

nejox commented Mar 4, 2024

Description

During pipeline execution, when an exception in a node occurs the real error is hidden behind another exception of kedro. This seems to only appear with in-memory datasets in the pipeline.
This appeared after upgrading the project from kedro 0.18.14 to 0.19.3.

Context

Our monitoring of productive pipelines shows only the weird "KeyError: " that appeared last but not the real error what caused it during runtime. Manual analysis is needed to see the real cause.

Steps to Reproduce

  1. Create pipeline with >= 2 nodes, last one having a temporary dataset.
  2. Cause exception in 1st node
  3. See weird KeyError first, after scrolling then the real cause.

Example Error messages from our pipeline:

 File ".../pandas/core/indexes/base.py", line 5859, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['is_holiday'] not in index"

=> This KeyError causes our pipeline to fail due to missing column. But error log is bloated with this:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../kedro/framework/cli/cli.py", line 127, in main
    super().main(
  File ".../click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File ".../click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File ".../click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File ".../click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File ".../kedro/framework/cli/project.py", line 225, in run
    session.run(
  File ".../kedro/framework/session/session.py", line 392, in run
    run_result = runner.run(
  File ".../kedro/runner/runner.py", line 117, in run
    self._run(pipeline, catalog, hook_or_null_manager, session_id)  # type: ignore[arg-type]
  File ".../kedro/runner/sequential_runner.py", line 78, in _run
    self._suggest_resume_scenario(pipeline, done_nodes, catalog)
  File ".../kedro/runner/runner.py", line 206, in _suggest_resume_scenario
    start_p_persistent_ancestors = _find_persistent_ancestors(
  File ".../kedro/runner/runner.py", line 249, in _find_persistent_ancestors
    if _has_persistent_inputs(current_node, catalog):
  File ".../kedro/runner/runner.py", line 290, in _has_persistent_inputs
    if isinstance(catalog._datasets[node_input], MemoryDataset):
KeyError: '07_ProductRecommendationsUnaligned'

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.19.3
  • Python version used (python -V): 3.10
  • Operating system and version: macOS 14.2.1 (23C71)
@noklam
Copy link
Contributor

noklam commented Mar 4, 2024

Thank you for reporting this, we will look into it shortly. Cc @ankatiyar

@ankatiyar ankatiyar added the Community Issue/PR opened by the open-source community label Mar 4, 2024
@merelcht merelcht added this to the Improve Developer Experience milestone Mar 14, 2024
@noklam
Copy link
Contributor

noklam commented Mar 20, 2024

@nejox Any chance that you can create a repository with a minimal example that we can reproduce? I tried to do this myself today quickly but couldn't get the same result.

@nejox
Copy link
Author

nejox commented Mar 21, 2024

I'll try to build one these days.

@nejox
Copy link
Author

nejox commented Apr 2, 2024

okay, i managed to replicate it with this repository: https://github.com/nejox/kedro-issue-replication
While raising a ValueError in a Node with a in memory dataset as output, the output log first shows an KeyError for the mentioned in memory dataset.

py310/lib/python3.10/site-packages/kedro/runner/runner.py:290 in _has_persistent_inputs                                                                              │   287 │                                                                      │
│   288 │   """                                                                │
│   289 │   for node_input in node.inputs:                                     │
│ ❱ 290 │   │   if isinstance(catalog._datasets[node_input], MemoryDataset):   │
│   291 │   │   │   return False                                               │
│   292 │   return True                                                        │
│   293               

KeyError: 'tmp_inmemory_data'

@merelcht merelcht self-assigned this May 24, 2024
@merelcht
Copy link
Member

Hi @nejox, I have cloned your example repo and run the pipeline, but I'm not quite sure I understand the issue. In the example you provide this node:

def test(x: pd.DataFrame) -> pd.DataFrame:

    # x["notexisting_col"] += 1
    raise ValueError("This is a test error")

    return x.copy()

When I execute the run I see the ValueError. If I remove the comment and then execute I first see the KeyError. To me this makes sense because x["notexisting_col"] += 1 causing the KeyError comes before raise ValueError("This is a test error"). In your explanation it sounds like you'd expect something else?

@nejox
Copy link
Author

nejox commented Jun 14, 2024

Hi @merelcht, thanks for taking the time!
Sorry for the irritation regarding the ValueError or the missing column error - both lead to the same problem.
The problem was that the original error was hidden behind an resulting exception from kedro regarding the missing in memory dataset as part of the output in the specific node.
See for example this error message which was originally caused by the ValueError/NotExistingCol error raised in the node:

py310/lib/python3.10/site-packages/kedro/runner/runner.py:290 in _has_persistent_inputs                                                                              │   287 │                                                                      │
│   288 │   """                                                                │
│   289 │   for node_input in node.inputs:                                     │
│ ❱ 290 │   │   if isinstance(catalog._datasets[node_input], MemoryDataset):   │
│   291 │   │   │   return False                                               │
│   292 │   return True                                                        │
│   293               

KeyError: 'tmp_inmemory_data'

while 'tmp_inmemory_data' is output of:

            node(
                func=test,
                inputs="companies",
                outputs="tmp_inmemory_data",
                name="tmp_inmemory_data_node",
            ),

@nejox
Copy link
Author

nejox commented Jun 14, 2024

I also re-setup the repository locally and the environment and now I receive the same error message as you do @merelcht, which solves my issue with the expected error messaging - this is strange. I'll need to verify with our productive setup if this is still happening and with which package versions then...

@merelcht
Copy link
Member

I also re-setup the repository locally and the environment and now I receive the same error message as you do @merelcht, which solves my issue with the expected error messaging - this is strange. I'll need to verify with our productive setup if this is still happening and with which package versions then...

Okay, at least we're getting the same result on the same project which is already a step in the right direction. Let me know what you find in your production setup and hopefully we can resolve the problem.

@nejox
Copy link
Author

nejox commented Jun 26, 2024

Hi, thanks for your efforts. I couldn't reproduce the problem anymore, but on the other hand I can't really understand why it is solved. So I'm closing this.

@nejox nejox closed this as completed Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
None yet
Development

No branches or pull requests

4 participants