-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New offensive gameplay architecture and Q-learning #3246
Conversation
…/new_free_kick # Conflicts: # src/software/ai/hl/stp/play/BUILD # src/software/ai/hl/stp/play/free_kick_play.cpp # src/software/ai/hl/stp/play/free_kick_play.h
… ball placement go back to aligning when ball is lost
…m/nimazareian/Software into new_gameplay_staging
…into new_gameplay_staging
…2' into nima/receiver_position_generator2
…ld_testing_june_4
… field_testing_june_4
…bots/Software into new_gameplay_staging
…into new_gameplay_staging
BallPlacementPlay::BallPlacementPlay(TbotsProto::AiConfig config) | ||
: Play(config, true), fsm{BallPlacementPlayFSM{config}}, control_params{} | ||
BallPlacementPlay::BallPlacementPlay(std::shared_ptr<Strategy> strategy) | ||
: Play(true, strategy), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may make more sense for requires_goalie
to be false
during ball placement. Otherwise, the goalie will still try to goalie
`ssh [email protected]` (for Nanos) OR `ssh [email protected]` (for Pis) OR `ssh robot@robot_name.local` | ||
e.g. `ssh [email protected]` (for Nanos) or `ssh robot@192.168.1.203` (for Pis) or `ssh robot@robert.local` | ||
for a robot called robert with robot id 3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these IP addresses should be updated
pickoff_wall_tactic->updateControlParams( | ||
{.dribble_destination = pickoff_destination, | ||
.final_dribble_orientation = pickoff_final_orientation, | ||
.excessive_dribbling_mode = TbotsProto::ExcessiveDribblingMode::LOSE_BALL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be ExcessiveDribblingMode::LOSE_BALL
? Shouldn't we allow excessive dribbling?
} | ||
else if (ball_pos.y() < field_lines.yMin()) | ||
else if (near_negative_y_boundary && near_negative_x_boundary) // bottom left corner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bottom right corner
|
||
if (num_move_tactics <= 0) | ||
if (num_move_skill_tactics <= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move this conditionel check to the top of the function
double BACK_AWAY_FROM_WALL_M = ROBOT_MAX_RADIUS_METERS * 5.5; | ||
double MINIMUM_DISTANCE_FROM_WALL_FOR_ALIGN_METERS = ROBOT_MAX_RADIUS_METERS * 4.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: some of these comments have _M
suffix and others have a _METERS
suffix. We should make this consistent across all the constants here. I prefer the _M
prefix.
// Always try assigning AttackerTactic | ||
if (play_update.num_tactics > 0) | ||
{ | ||
const bool attacker_not_suspended = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const bool attacker_not_suspended = | |
const bool attacker_is_running = |
nit
GetBallControl_S, | ||
LoseBall_S + Update_E / loseBall_A, | ||
|
||
X + Update_E[!shouldExcessivelyDribble_G] / dribble_A, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X + Update_E[!shouldExcessivelyDribble_G] / dribble_A, | |
X + Update_E[!shouldExcessivelyDribble_G] / loseBall_A, |
lose ball right when we're trying NOT to excessively dribble?
Dribble_S + | ||
Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / dribble_A = X, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dribble_S + | |
Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / dribble_A = X, | |
Dribble_S + | |
Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / loseBall_A = X, |
If we don't want to excessively dribble, should we have loseBall
here?
// for it to be considered out of our control; otherwise we might consider | ||
// kicked balls as out of our control | ||
if (time_since_kick_start < lose_ball_control_time_threshold || | ||
ball.velocity().length() > ball_is_kicked_m_per_s_threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ball.velocity().length() > ball_is_kicked_m_per_s_threshold) | |
ball.velocity().length() < ball_is_kicked_m_per_s_threshold) |
Can you double check if the comparator should be flipped?
bazel_arguments += ["--hosts"] | ||
bazel_arguments += [f"192.168.0.20{id}" for id in args.flash_robots] | ||
if args.platform == "NANO": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about taking so long, but left some more feedback
Adds touching_ball_threshold to sensorfusion protobuf
* Adds the gl_max_dribble_layer.py file and the BUILD file entry. * fixes some errors introduced in the last commit * Adds the toggle to the widget setup functions. * Adds changes from pr #3246 with regards to updateDribbleDisplacement Adds touching_ball_threshold to sensorfusion protobuf * Starts implementation for dribble layer * Adds dribble displacement field to world, with a getter and setter, from pr #3246 * Adds dribble displacement field to python bindings * CONTINUE THE PIPELINE * Adds framework * bug fixes world.h * Undos the changes to pybindings * Finished. * changes the default setting * Address comments * Fixes mistake * [pre-commit.ci lite] apply automatic fixes * Changes default setting * Adds sigmoid function to protobuffer * [pre-commit.ci lite] apply automatic fixes * fixes some colour scaling bug, and implements the protobuffed sigmoid function * [pre-commit.ci lite] apply automatic fixes * Implements abstract color from gradient function * [pre-commit.ci lite] apply automatic fixes * move the helper function to util.py * Type annotation * [pre-commit.ci lite] apply automatic fixes * Adds line to BUILD file * [pre-commit.ci lite] apply automatic fixes --------- Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
Succeeded by #3415 |
Description
This PR overhauls our offensive gameplay architecture. The goal of the architecture redesign is to make our AI more dynamic and adaptive to the enemy we play against.
I recommend taking a look at the Gameplay Architecture RFC. Although it is very outdated in terms of the planned implementation and architecture, the goals and user stories provide some good background on the project.
Gameplay overview
DynamicPlay
is a base Play that selectsSupportTactic
s to assign. Support tactics play supporting roles on the field (e.g. going out to receiver positions, faking out enemy robots, etc.). Over time,DynamicPlay
is supposed to learn the best support tactics to select given the state of the game.The implementation of
DynamicPlay
and its support tactic selection algorithm may change in the future, and we currently only have one support tactic (ReceiverTactic
), so I would not focus on reviewing these changes.OffensePlay
is aDynamicPlay
that is run when we have possession. It assigns defensive tactics selected byDefensePlay
, support tactics that are selected byDynamicPlay
, and anAttackerTactic
that is the main ball handler during the play.There is now a clearer separation between Skills and Tactics. A Skill is smaller in scope and completes a single action (e.g. kick, chip, pass, dribble), while a Tactic has a greater set of responsibilities and objectives to complete. Skills have a similar interface to Tactics and are also implemented using FSMs that yield primitives. A Tactic can "execute" a Skill by forwarding the Skill's
updatePrimitive
result in its ownupdatePrimitive
.Q-learning for attacker skill selection
The attacker uses a reinforcement learning algorithm called Q-learning to select and learn which skills (actions) to execute given the state of the
World
.In Q-learning, the agent decides which action to take based on a Q-function$Q(s,a)$ that returns the expected reward for an action taken in a given state. After selecting some action $a$ , we observe a reward $r$ and enter a new state $s'$ which are used to update the Q-function and adjust the Q-value given for taking action $a$ in state $s$ . Since our state space is extremely large, we use linear Q-function approximation to estimate $Q(s,a)$ even if we have not previously applied action $a$ in state $s$ .
The Q-function weights are logged as protobufs and displayed in a new Q-learning widget in Thunderscope. Each weight in the table is associated with a feature and an action. Columns represent the features in the order they are initialized in
AttackerMdpFeatureExtractor
, and rows represent the actions in the order they are defined inAttackerMdpAction
.We can save the weights to a CSV file (they are also automatically written to a CSV under
/tmp/tbots
) and we can load in an initial set of weights when starting the AI (attacker_mdp_q_function_weights.csv
).Other changes
Strategy
class instead ofTbotsProto::AiConfig
. TheStrategy
contains shared gameplay calculations and has agetAiConfig()
method that returns the latestTbotsProto::AiConfig
.SensorFusion
to track the distance that the ball has been continuously dribbled by the friendly team. This dribble distance is output in theWorld
s thatSensorFusion
produces. We use this information to limit how farDribbleSkill
can dribble, so that even if multiple dribbling skills are executed sequentially, we will avoid going over the max dribble distance.PossessionTracker
to match original CMDragons possession algorithm more closely. There are now 4 types of possession (FRIENDLY, ENEMY, IN_CONTEST, LOOSE). To make our gameplay more aggressive,DefensePlay
is only run when we are in ENEMY possession; otherwise,OffensePlay
is run.Testing Done
Resolved Issues
Resolves #3080
Resolves #3079
Resolves #3078
Resolves #3077
Resolves #3076
Resolves #3074
Resolves #3073
Resolves #3071
Resolves #3070
Resolves #3069
Resolves #3065
Resolves #2514
Resolves #3083
Resolves #3081
Resolves #3072
Resolves #3219
Resolves #3203
Resolves #3216
Resolves #3156
Resolves #3233
Resolves #2930
Resolves #2868
Resolves #2643
Resolves #3098
Resolves #3097
Resolves #3096
Resolves #3082
Resolves #2134
Length Justification and Key Files to Review
software/ai/evaluation/q_learning
software/ai/hl/stp/skill
software/ai/hl/stp/play/dynamic_plays
software/ai/hl/stp/tactic/attacker
Review Checklist
It is the reviewers responsibility to also make sure every item here has been covered
.h
file) should have a javadoc style comment at the start of them. For examples, see the functions defined inthunderbots/software/geom
. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.TODO
(or similar) statements should either be completed or associated with a github issue