Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job expiration dates #1983

Merged
merged 8 commits into from
Aug 8, 2024
Merged

Add job expiration dates #1983

merged 8 commits into from
Aug 8, 2024

Conversation

zmc
Copy link
Member

@zmc zmc commented Jul 31, 2024

This feature has two parts:

  • Specifying expiration dates when scheduling test runs
  • A global maximum age

Expiration dates are provided by passing --expire to teuthology-suite with
a relative value like 1d (one day), 1w (one week), or an absolute value like
1999-12-31_23:59:59.

A new configuration item, max_job_age, is specified in seconds. This defaults
to two weeks.

When the dispatcher checks the queue for the next job to run, it will first
compare the job's timestamp value - which reflects the time the job was
scheduled. If more than max_job_age seconds have passed, the job is skipped
and marked dead. It next checks for an expire value; if that value is in the
past, the job is skipped and marked dead. Otherwise, it will be run as usual.

@zmc zmc force-pushed the expiry branch 3 times, most recently from 2d08bf5 to 46c1903 Compare August 1, 2024 00:46
teuthology/test/test_misc.py Outdated Show resolved Hide resolved
@zmc
Copy link
Member Author

zmc commented Aug 1, 2024

@zmc zmc marked this pull request as ready for review August 1, 2024 20:50
@@ -306,6 +309,31 @@ def prep_job(job_config, log_file_path, archive_dir):
return job_config, teuth_bin_path


def check_job_expiration(job_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I'd like to see inline docs for the method.
Second, since this method is used both in dispatcher and supervisor maybe it is the best to create a separate module for job, and move all related methods there, for example, some of them can be taken from schedule.

teuthology/util/time.py Outdated Show resolved Hide resolved
teuthology/util/time.py Outdated Show resolved Hide resolved
assert excinfo.value.returncode == 111
for record in caplog.records:
if record.levelname == 'ERROR':
assert ('replay full' in record.message or
'ABC\n' == record.message)

def test_sh_progress(caplog):
misc.sh("echo AB ; sleep 5 ; /bin/echo C", 2) == "ABC\n"
assert misc.sh("echo AB ; sleep 0.1 ; /bin/echo C", 2) == "AB\nC\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, that's probably improve test timing 5 seconds is overkill

scripts/suite.py Outdated Show resolved Hide resolved
@zmc zmc force-pushed the expiry branch 2 times, most recently from 683c097 to d6fcb84 Compare August 6, 2024 18:32
case 'w':
return timedelta(weeks=num)
case _:
raise ValueError(err_msg)
Copy link
Contributor

@kshtsk kshtsk Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting for the followup PR:

    m = re.match(r"^\s*"
                 r"(?:(?P<weeks>\d+)\s*w\s*)?"
                 r"(?:(?P<days>\d+)\s*d\s*)?"
                 r"(?:(?P<hours>\d+)\s*h\s*)?"
                 r"(?:(?P<minutes>\d+)\s*m\s*)?",
                 r"(?:(?P<seconds>\d+)\s*s\s*)?$",
                 offset.lower())
    if match is None:
        raise ValueError(err_msg)
    args = {k: int(v or "0") for k, v in m.groupdict().items()}
    return datetime.timedelta(**args)

["1x", ValueError],
["-1m", ValueError],
["0xde", ValueError],
["frog", ValueError],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How 'bout the case: "7dwarfs"?

kshtsk
kshtsk previously approved these changes Aug 7, 2024
@kshtsk
Copy link
Contributor

kshtsk commented Aug 7, 2024

rebase needed

zmc added 5 commits August 7, 2024 18:02
And move the format string to the time module.

Signed-off-by: Zack Cerza <[email protected]>
One test had a missing assert; another had a comparison that would never fire
because of an expected exception being raised during the call.

Signed-off-by: Zack Cerza <[email protected]>
zmc added 3 commits August 7, 2024 18:05
test_init.py was making modifications to the config object that persisted
between tests. When I fixed that, initially some tests in test_run_.py started
failing because of settings in my local ~/.teuthology.yaml. This change causes
all of the tests in suite.test to use default config values.

Signed-off-by: Zack Cerza <[email protected]>
This feature has two parts:
* Specifying expiration dates when scheduling test runs
* A global maximum age

Expiration dates are provided by passing `--expire` to `teuthology-suite` with
a relative value like `1d` (one day), `1w` (one week), or an absolute value like
`1999-12-31_23:59:59`.

A new configuration item, `max_job_age`, is specified in seconds. This defaults
to two weeks.

When the dispatcher checks the queue for the next job to run, it will first
compare the job's `timestamp` value - which reflects the time the job was
scheduled. If more than `max_job_age` seconds have passed, the job is skipped
and marked dead. It next checks for an `expire` value; if that value is in the
past, the job is skipped and marked dead. Otherwise, it will be run as usual.

Signed-off-by: Zack Cerza <[email protected]>
This commit isn't strictly necessary for the feature's implementation, but will
allow testing the feature on the production teuthology cluster before merging.

Signed-off-by: Zack Cerza <[email protected]>
@zmc zmc merged commit 43b8805 into main Aug 8, 2024
9 checks passed
@zmc zmc deleted the expiry branch August 8, 2024 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants