Limit number of auto_clone restarts #5397

perlpunk · 2023-12-14T12:42:32Z

It can happen that a job consistently fails with the same error. We want to prevent an endless cloning loop here.

Issue: https://progress.opensuse.org/issues/152569

okurz

The problem I see with this is that apparently there is no clear communication to test reviewers that tests are restarted so many times. If the auto-cloning would stop in the realistic case any test reviewers if they ever stumble across the scenario again would likely just hit the retrigger button anyway. So, how to communicate the stop of auto-cloning to test reviewers?

etc/openqa/openqa.ini

perlpunk · 2023-12-14T13:04:10Z

The problem I see with this is that apparently there is no clear communication to test reviewers that tests are restarted so many times

I don't really understand.
Is there any clear communication to test reviewers currently about when a job is auto cloned at all? And "so many times" - well, it's basically endlessly if the cloned jobs also fail.

codecov · 2023-12-14T13:05:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4d4e5b7) 98.37% compared to head (9ffa730) 98.37%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5397   +/-   ##
=======================================
  Coverage   98.37%   98.37%           
=======================================
  Files         389      389           
  Lines       37643    37708   +65     
=======================================
+ Hits        37031    37096   +65     
  Misses        612      612

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

asdil12 · 2023-12-14T13:19:10Z

Stop the clonging!

s/clonging/cloning/ ;)

Martchus · 2023-12-14T13:27:35Z

The problem I see with this is that apparently there is no clear communication to test reviewers that tests are restarted so many times

There is no clear communication with and without this PR. We have accumulated over 50 pages of jobs in the Next & Previous tab in relevant scenarios and apparently no reviewers took notice of it. If we now only have say 10 pages this will not change anything for reviewers (except that why might not be wondering anymore why the heck openQA is endlessly restarting these jobs if they would care about this scenario anyways which they apparently don't). So I don't see how this PR makes things worse.

Martchus

Looks generally good. Maybe we could add a comment stating that the maximum number of retries are exhausted instead of just doing nothing. (I guess that wouldn't be too much work.)

lib/OpenQA/Schema/Result/Jobs.pm

perlpunk · 2023-12-14T14:07:56Z

Maybe we could add a comment stating that the maximum number of retries are exhausted instead of just doing nothing. (I guess that wouldn't be too much work.)

I'm wondering how much work we should put into this, given we also have the investigation tools, that can also do retries and add comments.
Maybe this feature should be moved there instead?

okurz

We should rethink the overall approach. Obviously it's wasteful to restart jobs over and over again if nobody cares about the results. However the auto cloning triggered by the worker was always intended to trigger in the case when a need for a retry would arise that's not in the responsibility of any test maintainer like the bugs we have in the cache service or terminating worker processes.

Your question if we should move the retries to scripts makes me think that we need to clarify the original goal, maybe talking in person next week?

Instead of stopping the automatic retries I would rather think about preventing new incompletes to happen at all, here maybe with not even starting jobs in continuously failing scenarios?

etc/openqa/openqa.ini

lib/OpenQA/Schema/Result/Jobs.pm

etc/openqa/openqa.ini

t/api/04-jobs.t

lib/OpenQA/Schema/Result/Jobs.pm

It can happen that a job consistently fails with the same error. We want to prevent an endless cloning loop here. Issue: https://progress.opensuse.org/issues/152569

okurz requested changes Dec 14, 2023

View reviewed changes

etc/openqa/openqa.ini Outdated Show resolved Hide resolved

perlpunk force-pushed the limit-auto-clone branch from 4f21e50 to 83328f0 Compare December 14, 2023 13:13

perlpunk force-pushed the limit-auto-clone branch from 83328f0 to ecac0df Compare December 14, 2023 13:24

Martchus reviewed Dec 14, 2023

View reviewed changes

lib/OpenQA/Schema/Result/Jobs.pm Outdated Show resolved Hide resolved

perlpunk force-pushed the limit-auto-clone branch from ecac0df to cd77c89 Compare December 14, 2023 14:06

Martchus approved these changes Dec 14, 2023

View reviewed changes

perlpunk force-pushed the limit-auto-clone branch from cd77c89 to 7a676a2 Compare December 14, 2023 14:28

kalikiana approved these changes Dec 14, 2023

View reviewed changes

okurz requested changes Dec 16, 2023

View reviewed changes

etc/openqa/openqa.ini Outdated Show resolved Hide resolved

okurz requested changes Dec 19, 2023

View reviewed changes

lib/OpenQA/Schema/Result/Jobs.pm Outdated Show resolved Hide resolved

perlpunk marked this pull request as draft December 21, 2023 18:40

perlpunk force-pushed the limit-auto-clone branch from 7a676a2 to 2eb1a6d Compare December 21, 2023 18:52

okurz requested changes Dec 22, 2023

View reviewed changes

lib/OpenQA/Schema/Result/Jobs.pm Show resolved Hide resolved

etc/openqa/openqa.ini Show resolved Hide resolved

okurz requested changes Dec 22, 2023

View reviewed changes

t/api/04-jobs.t Outdated Show resolved Hide resolved

t/api/04-jobs.t Outdated Show resolved Hide resolved

t/api/04-jobs.t Show resolved Hide resolved

perlpunk force-pushed the limit-auto-clone branch 4 times, most recently from b80e856 to cb9732c Compare January 4, 2024 16:28

perlpunk marked this pull request as ready for review January 4, 2024 16:34

perlpunk force-pushed the limit-auto-clone branch 2 times, most recently from 99045fb to 197c241 Compare January 5, 2024 15:02

Martchus reviewed Jan 9, 2024

View reviewed changes

lib/OpenQA/Schema/Result/Jobs.pm Outdated Show resolved Hide resolved

lib/OpenQA/Schema/Result/Jobs.pm Outdated Show resolved Hide resolved

okurz reviewed Jan 10, 2024

View reviewed changes

lib/OpenQA/Schema/Result/Jobs.pm Outdated Show resolved Hide resolved

lib/OpenQA/Schema/Result/Jobs.pm Outdated Show resolved Hide resolved

perlpunk force-pushed the limit-auto-clone branch from 197c241 to d935c0a Compare January 10, 2024 11:18

Limit number of auto_clone restarts

9ffa730

It can happen that a job consistently fails with the same error. We want to prevent an endless cloning loop here. Issue: https://progress.opensuse.org/issues/152569

perlpunk force-pushed the limit-auto-clone branch from d935c0a to 9ffa730 Compare January 10, 2024 11:19

okurz approved these changes Jan 10, 2024

View reviewed changes

mergify bot merged commit b5e992e into os-autoinst:master Jan 10, 2024
36 checks passed

perlpunk deleted the limit-auto-clone branch January 11, 2024 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit number of auto_clone restarts #5397

Limit number of auto_clone restarts #5397

perlpunk commented Dec 14, 2023 •

edited

Loading

okurz left a comment

perlpunk commented Dec 14, 2023

codecov bot commented Dec 14, 2023 •

edited

Loading

asdil12 commented Dec 14, 2023

Martchus commented Dec 14, 2023

Martchus left a comment

perlpunk commented Dec 14, 2023

okurz left a comment

Limit number of auto_clone restarts #5397

Limit number of auto_clone restarts #5397

Conversation

perlpunk commented Dec 14, 2023 • edited Loading

okurz left a comment

Choose a reason for hiding this comment

perlpunk commented Dec 14, 2023

codecov bot commented Dec 14, 2023 • edited Loading

Codecov Report

asdil12 commented Dec 14, 2023

Martchus commented Dec 14, 2023

Martchus left a comment

Choose a reason for hiding this comment

perlpunk commented Dec 14, 2023

okurz left a comment

Choose a reason for hiding this comment

perlpunk commented Dec 14, 2023 •

edited

Loading

codecov bot commented Dec 14, 2023 •

edited

Loading