-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of auto_clone restarts #5397
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I see with this is that apparently there is no clear communication to test reviewers that tests are restarted so many times. If the auto-cloning would stop in the realistic case any test reviewers if they ever stumble across the scenario again would likely just hit the retrigger button anyway. So, how to communicate the stop of auto-cloning to test reviewers?
I don't really understand. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5397 +/- ##
=======================================
Coverage 98.37% 98.37%
=======================================
Files 389 389
Lines 37643 37708 +65
=======================================
+ Hits 37031 37096 +65
Misses 612 612 ☔ View full report in Codecov by Sentry. |
4f21e50
to
83328f0
Compare
Stop the clonging! s/clonging/cloning/ ;) |
83328f0
to
ecac0df
Compare
There is no clear communication with and without this PR. We have accumulated over 50 pages of jobs in the Next & Previous tab in relevant scenarios and apparently no reviewers took notice of it. If we now only have say 10 pages this will not change anything for reviewers (except that why might not be wondering anymore why the heck openQA is endlessly restarting these jobs if they would care about this scenario anyways which they apparently don't). So I don't see how this PR makes things worse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good. Maybe we could add a comment stating that the maximum number of retries are exhausted instead of just doing nothing. (I guess that wouldn't be too much work.)
ecac0df
to
cd77c89
Compare
I'm wondering how much work we should put into this, given we also have the investigation tools, that can also do retries and add comments. |
cd77c89
to
7a676a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rethink the overall approach. Obviously it's wasteful to restart jobs over and over again if nobody cares about the results. However the auto cloning triggered by the worker was always intended to trigger in the case when a need for a retry would arise that's not in the responsibility of any test maintainer like the bugs we have in the cache service or terminating worker processes.
Your question if we should move the retries to scripts makes me think that we need to clarify the original goal, maybe talking in person next week?
Instead of stopping the automatic retries I would rather think about preventing new incompletes to happen at all, here maybe with not even starting jobs in continuously failing scenarios?
7a676a2
to
2eb1a6d
Compare
b80e856
to
cb9732c
Compare
99045fb
to
197c241
Compare
197c241
to
d935c0a
Compare
It can happen that a job consistently fails with the same error. We want to prevent an endless cloning loop here. Issue: https://progress.opensuse.org/issues/152569
d935c0a
to
9ffa730
Compare
It can happen that a job consistently fails with the same error. We want to prevent an endless cloning loop here.
Issue: https://progress.opensuse.org/issues/152569