make replay testing more user-friendly #2953

cgolubi1 · 2024-05-23T12:37:26Z

The goal of replay testing is to throw compute time rather than human time at finding logic bugs / breaking changes in new code, even if the tester doesn't know what kind of bug they're looking for. That would be easier/more reliable if i sanded down some current rough edges in the replay test rig. Specific things i have in mind:

Flags:
- There are things i commonly want to do, such as restrict which buttons get used in novel games, which i do by editing replay_loop on the replay site. Those should be CLI flags.
- replay_loop should have help/usage text describing all the CLI flags.
- It'd be great to have a flag that could generate /buttonmen/test/src/api/responder99Test.php from whatever's in the output directory right now, and one that could execute phpunit based on that. No reason to copy-paste those lists of commands while iterating on a replay test.
Automate a mix of behavior:
- There are behaviors that are not worthwhile to test on every loop iteration, because either they'll fail or they won't, but should be tested more than zero times. One example is replay-testing of the novel games created by the current iteration of replay_loop --- that would catch problems like introduction of unmodelled randomization that are rare, but important. Currently, those behaviors are tested only if a particular CLI flag is selected --- instead they should always be tested a small percentage of the time.
- Another example related to both of the above, is that i'd like to be able to specify that a randomly-chosen button (the default behavior) be selected a fraction of the time, so that i can mostly test buttons related to a particular PR but also look for regressions impacting new games with other buttons.
- We should also have a default behavior of testing with CustomBM a fraction of the time
- The mix of behaviors that replay_loop tests by default should be documented for quick reference, so we can easily know what's been tested for a particular branch.
Replay site stuff outside of replay_loop that should work better:
- When the replay site container gets replaced, the new container doesn't have the post-install stuff, including the cron job that causes it to complain about not having run recent tests, so i have no way to get notified about the problem
Changes to random_ai behavior:
- Less manual work should be needed to turn a game created by random_ai, into a responder test that can be committed to the codebase.
Handle current known bugs or infrastructure issues:
- Sometimes 2 seconds is not long enough to sleep after stopping mysqld, so the new mysqld can't start, and the loop fails. (But often 2 seconds is long enough, so what i want is backoff, not just a longer sleep.) [When restarting mysqld during replay testing, retry the start a few times #3019 addresses this]
- RandomAI fails with OOM when playing a game which gets cancelled after 200 rounds.

The text was updated successfully, but these errors were encountered:

cgolubi1 · 2024-07-22T01:07:58Z

Another thought to integrate into the above list: it would be great to have less manual work (ideally none) needed to turn random_ai generated tests into responder tests.

cgolubi1 · 2024-11-09T03:06:45Z

A silly bug: when a game runs to 200 rounds (e.g. Echo vs IIconfused) and is cancelled, RandomAI falls over.

I think what's happening is simply that the python process gets OOM-killed when it's trying to pull the entire game action log into memory and write the final game state.

cgolubi1 self-assigned this May 23, 2024

cgolubi1 mentioned this issue Jul 27, 2024

Clean up argument parsing for replay_loop #2971

Merged

cgolubi1 mentioned this issue Oct 29, 2024

Implement some improvements to random_ai and replay_loop #3002

Merged

cgolubi1 mentioned this issue Jan 4, 2025

When restarting mysqld during replay testing, retry the start a few times #3019

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make replay testing more user-friendly #2953

make replay testing more user-friendly #2953

cgolubi1 commented May 23, 2024 •

edited

Loading

cgolubi1 commented Jul 22, 2024

cgolubi1 commented Nov 9, 2024

make replay testing more user-friendly #2953

make replay testing more user-friendly #2953

Comments

cgolubi1 commented May 23, 2024 • edited Loading

cgolubi1 commented Jul 22, 2024

cgolubi1 commented Nov 9, 2024

cgolubi1 commented May 23, 2024 •

edited

Loading