-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make restart tests work #222
base: main
Are you sure you want to change the base?
Conversation
@oschuett, this temporarily fixes the issues in this repo, but probably won't fix things for the CP2K one. As mentioned in #214 (comment), the time-based restarts are tricky and we might need a better solution. Please let me know if you have a better idea. |
Thanks a lot for looking into this issue! Since aiidateam/aiida-core#6307 is now fixed, couldn't we lower the walltime to a value that works reliably everywhere? |
Sure!
I think we can try to play around with things, but there will always be a case where things are breaking if the machine is fast. That's why I would prefer a more predictable test. |
Couldn't we tune it such that it has to restart 3 times on a fast machine today? That should keep working for at least the next 10 years. On a slow machine it will obviously need more restarts, but overall the test should still finish within a reasonable time. |
Ok, trying here: d72faf2. However, I believe there is a significant dispersion among different runners on GitHub, so that might not be a super reliable solution either. |
Ok, seems that things have improved a little bit. The remaining test failure will most likely be fixed by #218. Shall we merge this one? |
Maybe, it is better to use MAX_SCF in the SCF section or MAX_ITER in the GEO_OPT section to enforce a restart. That approach would not be affected by runtime fluctuation due to different hardware or availability of cloud resources. We could run for instance two GEO_OPT steps ( |
AFAIK, the restart mechanism in AiiDA does not allow for modifications of the input in between runs. |
If that is true, then we can try a sufficiently small |
I tried to implement that in b79531c. Please let me know if you have better ideas |
fix #174