Make restart tests work #222

yakutovicha · 2025-01-15T11:43:56Z

yakutovicha · 2025-01-15T14:30:52Z

@oschuett, this temporarily fixes the issues in this repo, but probably won't fix things for the CP2K one. As mentioned in #214 (comment), the time-based restarts are tricky and we might need a better solution. Please let me know if you have a better idea.

oschuett · 2025-01-15T15:22:09Z

Thanks a lot for looking into this issue!

Since aiidateam/aiida-core#6307 is now fixed, couldn't we lower the walltime to a value that works reliably everywhere?

yakutovicha · 2025-01-15T15:26:21Z

Thanks a lot for looking into this issue!

Sure!

Since aiidateam/aiida-core#6307 is now fixed, couldn't we lower the walltime to a value that works reliably everywhere?

I think we can try to play around with things, but there will always be a case where things are breaking if the machine is fast. That's why I would prefer a more predictable test.

oschuett · 2025-01-15T15:38:40Z

but there will always be a case where things are breaking if the machine is fast.

Couldn't we tune it such that it has to restart 3 times on a fast machine today? That should keep working for at least the next 10 years. On a slow machine it will obviously need more restarts, but overall the test should still finish within a reasonable time.

yakutovicha · 2025-01-15T15:53:13Z

but there will always be a case where things are breaking if the machine is fast.

Couldn't we tune it such that it has to restart 3 times on a fast machine today? That should keep working for at least the next 10 years. On a slow machine it will obviously need more restarts, but overall the test should still finish within a reasonable time.

Ok, trying here: d72faf2. However, I believe there is a significant dispersion among different runners on GitHub, so that might not be a super reliable solution either.

yakutovicha · 2025-01-15T19:13:42Z

Ok, seems that things have improved a little bit. The remaining test failure will most likely be fixed by #218. Shall we merge this one?

mkrack · 2025-01-16T09:52:26Z

Ok, trying here: d72faf2. However, I believe there is a significant dispersion among different runners on GitHub, so that might not be a super reliable solution either.

Maybe, it is better to use MAX_SCF in the SCF section or MAX_ITER in the GEO_OPT section to enforce a restart. That approach would not be affected by runtime fluctuation due to different hardware or availability of cloud resources. We could run for instance two GEO_OPT steps (MAX_ITER 2) and then restart once with an increase MAX_ITER 4 to run two more steps. We have only to make sure that the GEO_OPT run needs 4 or more GEO_OPT steps to converge for the selected system.

oschuett · 2025-01-16T10:13:41Z

We could run for instance two GEO_OPT steps MAX_ITER 2 and then restart once with an increase MAX_ITER 4 to run two more steps.

AFAIK, the restart mechanism in AiiDA does not allow for modifications of the input in between runs.

mkrack · 2025-01-16T10:19:04Z

We could run for instance two GEO_OPT steps MAX_ITER 2 and then restart once with an increase MAX_ITER 4 to run two more steps.

AFAIK, the restart mechanism in AiiDA does not allow for modifications of the input in between runs.

If that is true, then we can try a sufficiently small MAX_SCF with OUTER_SCF disabled which will require 2 or 3 restarts to converge the system.

yakutovicha · 2025-01-23T15:31:21Z

Ok, trying here: d72faf2. However, I believe there is a significant dispersion among different runners on GitHub, so that might not be a super reliable solution either.

Maybe, it is better to use MAX_SCF in the SCF section or MAX_ITER in the GEO_OPT section to enforce a restart. That approach would not be affected by runtime fluctuation due to different hardware or availability of cloud resources. We could run for instance two GEO_OPT steps (MAX_ITER 2) and then restart once with an increase MAX_ITER 4 to run two more steps. We have only to make sure that the GEO_OPT run needs 4 or more GEO_OPT steps to converge for the selected system.

I tried to implement that in b79531c. Please let me know if you have better ideas

yakutovicha added 4 commits January 15, 2025 12:41

Fixing restart tests due to aiida-core bug.

21d23e5

Fix pre-commit issues

e379fb6

example_base_geoopt_restart: revert the time change

a3c89df

Fixing the doc builds.

deb83b5

yakutovicha requested a review from oschuett January 15, 2025 14:28

Reduce runtimes for the restart workchains

d72faf2

Handle geometry optimization failure due to MAX_ITER limit.

b79531c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make restart tests work #222

Make restart tests work #222

yakutovicha commented Jan 15, 2025

yakutovicha commented Jan 15, 2025

oschuett commented Jan 15, 2025

yakutovicha commented Jan 15, 2025

oschuett commented Jan 15, 2025

yakutovicha commented Jan 15, 2025

yakutovicha commented Jan 15, 2025 •

edited

Loading

mkrack commented Jan 16, 2025 •

edited

Loading

oschuett commented Jan 16, 2025

mkrack commented Jan 16, 2025

yakutovicha commented Jan 23, 2025

Make restart tests work #222

Are you sure you want to change the base?

Make restart tests work #222

Conversation

yakutovicha commented Jan 15, 2025

yakutovicha commented Jan 15, 2025

oschuett commented Jan 15, 2025

yakutovicha commented Jan 15, 2025

oschuett commented Jan 15, 2025

yakutovicha commented Jan 15, 2025

yakutovicha commented Jan 15, 2025 • edited Loading

mkrack commented Jan 16, 2025 • edited Loading

oschuett commented Jan 16, 2025

mkrack commented Jan 16, 2025

yakutovicha commented Jan 23, 2025

yakutovicha commented Jan 15, 2025 •

edited

Loading

mkrack commented Jan 16, 2025 •

edited

Loading