You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of replay testing is to throw compute time rather than human time at finding logic bugs / breaking changes in new code, even if the tester doesn't know what kind of bug they're looking for. That would be easier/more reliable if i sanded down some current rough edges in the replay test rig. Specific things i have in mind:
Flags:
There are things i commonly want to do, such as restrict which buttons get used in novel games, which i do by editing replay_loop on the replay site. Those should be CLI flags.
replay_loop should have help/usage text describing all the CLI flags.
It'd be great to have a flag that could generate /buttonmen/test/src/api/responder99Test.php from whatever's in the output directory right now, and one that could execute phpunit based on that. No reason to copy-paste those lists of commands while iterating on a replay test.
Automate a mix of behavior:
There are behaviors that are not worthwhile to test on every loop iteration, because either they'll fail or they won't, but should be tested more than zero times. One example is replay-testing of the novel games created by the current iteration of replay_loop --- that would catch problems like introduction of unmodelled randomization that are rare, but important. Currently, those behaviors are tested only if a particular CLI flag is selected --- instead they should always be tested a small percentage of the time.
Another example related to both of the above, is that i'd like to be able to specify that a randomly-chosen button (the default behavior) be selected a fraction of the time, so that i can mostly test buttons related to a particular PR but also look for regressions impacting new games with other buttons.
We should also have a default behavior of testing with CustomBM a fraction of the time
The mix of behaviors that replay_loop tests by default should be documented for quick reference, so we can easily know what's been tested for a particular branch.
Replay site stuff outside of replay_loop that should work better:
When the replay site container gets replaced, the new container doesn't have the post-install stuff, including the cron job that causes it to complain about not having run recent tests, so i have no way to get notified about the problem
Changes to random_ai behavior:
Less manual work should be needed to turn a game created by random_ai, into a responder test that can be committed to the codebase.
Handle current known bugs or infrastructure issues:
Another thought to integrate into the above list: it would be great to have less manual work (ideally none) needed to turn random_ai generated tests into responder tests.
A silly bug: when a game runs to 200 rounds (e.g. Echo vs IIconfused) and is cancelled, RandomAI falls over.
I think what's happening is simply that the python process gets OOM-killed when it's trying to pull the entire game action log into memory and write the final game state.
The goal of replay testing is to throw compute time rather than human time at finding logic bugs / breaking changes in new code, even if the tester doesn't know what kind of bug they're looking for. That would be easier/more reliable if i sanded down some current rough edges in the replay test rig. Specific things i have in mind:
replay_loop
on the replay site. Those should be CLI flags.replay_loop
should have help/usage text describing all the CLI flags./buttonmen/test/src/api/responder99Test.php
from whatever's in the output directory right now, and one that could execute phpunit based on that. No reason to copy-paste those lists of commands while iterating on a replay test.replay_loop
--- that would catch problems like introduction of unmodelled randomization that are rare, but important. Currently, those behaviors are tested only if a particular CLI flag is selected --- instead they should always be tested a small percentage of the time.CustomBM
a fraction of the timereplay_loop
tests by default should be documented for quick reference, so we can easily know what's been tested for a particular branch.replay_loop
that should work better:random_ai
behavior:random_ai
, into a responder test that can be committed to the codebase.The text was updated successfully, but these errors were encountered: