Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make replay testing more user-friendly #2953

Open
5 of 11 tasks
cgolubi1 opened this issue May 23, 2024 · 2 comments
Open
5 of 11 tasks

make replay testing more user-friendly #2953

cgolubi1 opened this issue May 23, 2024 · 2 comments
Assignees

Comments

@cgolubi1
Copy link
Contributor

cgolubi1 commented May 23, 2024

The goal of replay testing is to throw compute time rather than human time at finding logic bugs / breaking changes in new code, even if the tester doesn't know what kind of bug they're looking for. That would be easier/more reliable if i sanded down some current rough edges in the replay test rig. Specific things i have in mind:

  • Flags:
    • There are things i commonly want to do, such as restrict which buttons get used in novel games, which i do by editing replay_loop on the replay site. Those should be CLI flags.
    • replay_loop should have help/usage text describing all the CLI flags.
    • It'd be great to have a flag that could generate /buttonmen/test/src/api/responder99Test.php from whatever's in the output directory right now, and one that could execute phpunit based on that. No reason to copy-paste those lists of commands while iterating on a replay test.
  • Automate a mix of behavior:
    • There are behaviors that are not worthwhile to test on every loop iteration, because either they'll fail or they won't, but should be tested more than zero times. One example is replay-testing of the novel games created by the current iteration of replay_loop --- that would catch problems like introduction of unmodelled randomization that are rare, but important. Currently, those behaviors are tested only if a particular CLI flag is selected --- instead they should always be tested a small percentage of the time.
    • Another example related to both of the above, is that i'd like to be able to specify that a randomly-chosen button (the default behavior) be selected a fraction of the time, so that i can mostly test buttons related to a particular PR but also look for regressions impacting new games with other buttons.
    • We should also have a default behavior of testing with CustomBM a fraction of the time
    • The mix of behaviors that replay_loop tests by default should be documented for quick reference, so we can easily know what's been tested for a particular branch.
  • Replay site stuff outside of replay_loop that should work better:
    • When the replay site container gets replaced, the new container doesn't have the post-install stuff, including the cron job that causes it to complain about not having run recent tests, so i have no way to get notified about the problem
  • Changes to random_ai behavior:
    • Less manual work should be needed to turn a game created by random_ai, into a responder test that can be committed to the codebase.
  • Handle current known bugs or infrastructure issues:
@cgolubi1 cgolubi1 self-assigned this May 23, 2024
@cgolubi1
Copy link
Contributor Author

Another thought to integrate into the above list: it would be great to have less manual work (ideally none) needed to turn random_ai generated tests into responder tests.

@cgolubi1
Copy link
Contributor Author

cgolubi1 commented Nov 9, 2024

A silly bug: when a game runs to 200 rounds (e.g. Echo vs IIconfused) and is cancelled, RandomAI falls over.

I think what's happening is simply that the python process gets OOM-killed when it's trying to pull the entire game action log into memory and write the final game state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant