Workchain restart tests are very brittle #214

oschuett · 2024-03-23T20:06:05Z

The workchain tests concerned with restarting have been failing on the CP2K Dashboard for a while.

I believe the problem is that the workload, e.g. number of MD steps, and the wall-time have to be precisely tuned to finish the work after exactly one restart. If the work finishes during the initial run then the test failes because no restart was triggered. If the work needs more than one restart then the Cp2kBaseWorkChain bailes out:

Cp2kCalculation<25> failed and error was not handled for the second consecutive time, aborting

Since the time required for a workload depends on the hardware these tests are inherently brittle. I think the best solution would be to allow for multiple restarts. See also #174.

The text was updated successfully, but these errors were encountered:

yakutovicha · 2024-03-24T20:13:33Z

Since the time required for a workload depends on the hardware these tests are inherently brittle. I think the best solution would be to allow for multiple restarts. See also #174.

it is a bug in aiida-core, which should be fixed in the next release.

oschuett · 2024-05-10T11:20:39Z

Since #170 the dashboard test has been failing even more. Do you know for when the next aiida-core release is planned?

yakutovicha · 2024-05-14T23:48:23Z

Since #170 the dashboard test has been failing even more. Do you know for when the next aiida-core release is planned?

No idea, sorry 🤷‍♂️

yakutovicha · 2025-01-15T13:40:46Z

Hey @oschuett and @mkrack!

I now understand the problem. The origin is the time limit that I put when running the tests.

The tests failed in this repo because we still rely on aiida-core==2.5.3. The GitHub runner is sometimes slow, so it required more than one restart, which was not working due to the issue mentioned above.

The tests in the cp2k dashboard fail because it is too fast. So it does not do any restarts and the tests fail again. See here for example: just search for "work chain completed after".

To fix them I need a reliable way to stop CP2K before completing the optimisation. Can you think of another approach that doesn't involve time? @mkrack mentioned that he had implemented a way to stop CP2K if the convergence criteria were not met. Please let me know.

This was referenced Nov 5, 2024

Many tests are failing with latest version of CP2K #220

Closed

Get AiiDA tests working again (except two tests with restart) cp2k/cp2k#3762

Merged

yakutovicha mentioned this issue Jan 15, 2025

Make restart tests work #222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workchain restart tests are very brittle #214

Workchain restart tests are very brittle #214

oschuett commented Mar 23, 2024

yakutovicha commented Mar 24, 2024

oschuett commented May 10, 2024

yakutovicha commented May 14, 2024

yakutovicha commented Jan 15, 2025

Workchain restart tests are very brittle #214

Workchain restart tests are very brittle #214

Comments

oschuett commented Mar 23, 2024

yakutovicha commented Mar 24, 2024

oschuett commented May 10, 2024

yakutovicha commented May 14, 2024

yakutovicha commented Jan 15, 2025