Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New offensive gameplay architecture and Q-learning #3246

Closed
wants to merge 516 commits into from

Conversation

williamckha
Copy link
Contributor

@williamckha williamckha commented Jul 8, 2024

Description

This PR overhauls our offensive gameplay architecture. The goal of the architecture redesign is to make our AI more dynamic and adaptive to the enemy we play against.

I recommend taking a look at the Gameplay Architecture RFC. Although it is very outdated in terms of the planned implementation and architecture, the goals and user stories provide some good background on the project.


Gameplay overview

classDiagram
    direction TB

    namespace Plays {
        class DynamicPlay
        class OffensePlay
    }

    <<abstract>> DynamicPlay

    namespace Tactics {
        class AttackerTactic
        class SupportTactic
    }
    
    namespace Skills {
        class ShootSkill
        class KeepAwaySkill
        class KickPassSkill
        class ChipPassSkill
    }

    <<abstract>> SupportTactic


    DynamicPlay <|-- OffensePlay : inherits
    OffensePlay --> AttackerTactic
    OffensePlay --> "many" SupportTactic

    AttackerTactic --> ShootSkill
    AttackerTactic --> KeepAwaySkill 
    AttackerTactic --> KickPassSkill 
    AttackerTactic --> ChipPassSkill 
Loading
  • DynamicPlay is a base Play that selects SupportTactics to assign. Support tactics play supporting roles on the field (e.g. going out to receiver positions, faking out enemy robots, etc.). Over time, DynamicPlay is supposed to learn the best support tactics to select given the state of the game.

    The implementation of DynamicPlay and its support tactic selection algorithm may change in the future, and we currently only have one support tactic (ReceiverTactic), so I would not focus on reviewing these changes.

  • OffensePlay is a DynamicPlay that is run when we have possession. It assigns defensive tactics selected by DefensePlay, support tactics that are selected by DynamicPlay, and an AttackerTactic that is the main ball handler during the play.

  • There is now a clearer separation between Skills and Tactics. A Skill is smaller in scope and completes a single action (e.g. kick, chip, pass, dribble), while a Tactic has a greater set of responsibilities and objectives to complete. Skills have a similar interface to Tactics and are also implemented using FSMs that yield primitives. A Tactic can "execute" a Skill by forwarding the Skill's updatePrimitive result in its own updatePrimitive.


Q-learning for attacker skill selection

The attacker uses a reinforcement learning algorithm called Q-learning to select and learn which skills (actions) to execute given the state of the World.

In Q-learning, the agent decides which action to take based on a Q-function $Q(s,a)$ that returns the expected reward for an action taken in a given state. After selecting some action $a$, we observe a reward $r$ and enter a new state $s'$ which are used to update the Q-function and adjust the Q-value given for taking action $a$ in state $s$. Since our state space is extremely large, we use linear Q-function approximation to estimate $Q(s,a)$ even if we have not previously applied action $a$ in state $s$.

Q-learning

The Q-function weights are logged as protobufs and displayed in a new Q-learning widget in Thunderscope. Each weight in the table is associated with a feature and an action. Columns represent the features in the order they are initialized in AttackerMdpFeatureExtractor, and rows represent the actions in the order they are defined in AttackerMdpAction.

We can save the weights to a CSV file (they are also automatically written to a CSV under /tmp/tbots) and we can load in an initial set of weights when starting the AI (attacker_mdp_q_function_weights.csv).


Other changes

  • Changed most Tactics and all Plays to accept a shared instance of a Strategy class instead of TbotsProto::AiConfig. The Strategy contains shared gameplay calculations and has a getAiConfig() method that returns the latest TbotsProto::AiConfig.
  • Updated SensorFusion to track the distance that the ball has been continuously dribbled by the friendly team. This dribble distance is output in the Worlds that SensorFusion produces. We use this information to limit how far DribbleSkill can dribble, so that even if multiple dribbling skills are executed sequentially, we will avoid going over the max dribble distance.
  • Changed PossessionTracker to match original CMDragons possession algorithm more closely. There are now 4 types of possession (FRIENDLY, ENEMY, IN_CONTEST, LOOSE). To make our gameplay more aggressive, DefensePlay is only run when we are in ENEMY possession; otherwise, OffensePlay is run.
  • Probably more changes that I can't remember...

Testing Done

  • Based on the eye test, everything works and looks OK. We ran the new gameplay for an extended period of time during the scrimmage, and I don't think there were any major hitches or crashes.
  • Existing simulated gameplay tests have been updated and all pass
  • Tactics converted to Skills have had their tests updated and all pass

Resolved Issues

Resolves #3080
Resolves #3079
Resolves #3078
Resolves #3077
Resolves #3076
Resolves #3074
Resolves #3073
Resolves #3071
Resolves #3070
Resolves #3069
Resolves #3065
Resolves #2514
Resolves #3083
Resolves #3081
Resolves #3072
Resolves #3219
Resolves #3203
Resolves #3216
Resolves #3156
Resolves #3233
Resolves #2930
Resolves #2868
Resolves #2643
Resolves #3098
Resolves #3097
Resolves #3096
Resolves #3082
Resolves #2134

Length Justification and Key Files to Review

  • software/ai/evaluation/q_learning
  • software/ai/hl/stp/skill
  • software/ai/hl/stp/play/dynamic_plays
  • software/ai/hl/stp/tactic/attacker

Review Checklist

It is the reviewers responsibility to also make sure every item here has been covered

  • Function & Class comments: All function definitions (usually in the .h file) should have a javadoc style comment at the start of them. For examples, see the functions defined in thunderbots/software/geom. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.
  • Remove all commented out code
  • Remove extra print statements: for example, those just used for testing
  • Resolve all TODO's: All TODO (or similar) statements should either be completed or associated with a github issue

nimazareian and others added 30 commits June 25, 2024 15:30
…/new_free_kick

# Conflicts:
#	src/software/ai/hl/stp/play/BUILD
#	src/software/ai/hl/stp/play/free_kick_play.cpp
#	src/software/ai/hl/stp/play/free_kick_play.h
… ball placement go back to aligning when ball is lost
@itsarune itsarune mentioned this pull request Sep 2, 2024
4 tasks
Copy link

BallPlacementPlay::BallPlacementPlay(TbotsProto::AiConfig config)
: Play(config, true), fsm{BallPlacementPlayFSM{config}}, control_params{}
BallPlacementPlay::BallPlacementPlay(std::shared_ptr<Strategy> strategy)
: Play(true, strategy),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make more sense for requires_goalie to be false during ball placement. Otherwise, the goalie will still try to goalie

Comment on lines +33 to +35
`ssh [email protected]` (for Nanos) OR `ssh [email protected]` (for Pis) OR `ssh robot@robot_name.local`
e.g. `ssh [email protected]` (for Nanos) or `ssh robot@192.168.1.203` (for Pis) or `ssh robot@robert.local`
for a robot called robert with robot id 3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these IP addresses should be updated

pickoff_wall_tactic->updateControlParams(
{.dribble_destination = pickoff_destination,
.final_dribble_orientation = pickoff_final_orientation,
.excessive_dribbling_mode = TbotsProto::ExcessiveDribblingMode::LOSE_BALL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ExcessiveDribblingMode::LOSE_BALL? Shouldn't we allow excessive dribbling?

}
else if (ball_pos.y() < field_lines.yMin())
else if (near_negative_y_boundary && near_negative_x_boundary) // bottom left corner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bottom right corner


if (num_move_tactics <= 0)
if (num_move_skill_tactics <= 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move this conditionel check to the top of the function

Comment on lines +16 to +17
double BACK_AWAY_FROM_WALL_M = ROBOT_MAX_RADIUS_METERS * 5.5;
double MINIMUM_DISTANCE_FROM_WALL_FOR_ALIGN_METERS = ROBOT_MAX_RADIUS_METERS * 4.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some of these comments have _M suffix and others have a _METERS suffix. We should make this consistent across all the constants here. I prefer the _M prefix.

// Always try assigning AttackerTactic
if (play_update.num_tactics > 0)
{
const bool attacker_not_suspended =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const bool attacker_not_suspended =
const bool attacker_is_running =

nit

GetBallControl_S,
LoseBall_S + Update_E / loseBall_A,

X + Update_E[!shouldExcessivelyDribble_G] / dribble_A,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
X + Update_E[!shouldExcessivelyDribble_G] / dribble_A,
X + Update_E[!shouldExcessivelyDribble_G] / loseBall_A,

lose ball right when we're trying NOT to excessively dribble?

Comment on lines +206 to +207
Dribble_S +
Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / dribble_A = X,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Dribble_S +
Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / dribble_A = X,
Dribble_S +
Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / loseBall_A = X,

If we don't want to excessively dribble, should we have loseBall here?

// for it to be considered out of our control; otherwise we might consider
// kicked balls as out of our control
if (time_since_kick_start < lose_ball_control_time_threshold ||
ball.velocity().length() > ball_is_kicked_m_per_s_threshold)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ball.velocity().length() > ball_is_kicked_m_per_s_threshold)
ball.velocity().length() < ball_is_kicked_m_per_s_threshold)

Can you double check if the comparator should be flipped?

bazel_arguments += ["--hosts"]
bazel_arguments += [f"192.168.0.20{id}" for id in args.flash_robots]
if args.platform == "NANO":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated now

Copy link
Contributor

@itsarune itsarune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about taking so long, but left some more feedback

GrayHoang added a commit that referenced this pull request Dec 1, 2024
Adds touching_ball_threshold to sensorfusion protobuf
@GrayHoang GrayHoang mentioned this pull request Dec 1, 2024
4 tasks
GrayHoang added a commit that referenced this pull request Jan 11, 2025
* Adds the gl_max_dribble_layer.py file and the BUILD file entry.

* fixes some errors introduced in the last commit

* Adds the toggle to the widget setup functions.

* Adds changes from pr #3246 with regards to updateDribbleDisplacement
Adds touching_ball_threshold to sensorfusion protobuf

* Starts implementation for dribble layer

* Adds dribble displacement field to world, with a getter and setter, from pr #3246

* Adds dribble displacement field to python bindings

* CONTINUE THE PIPELINE

* Adds framework

* bug fixes world.h

* Undos the changes to pybindings

* Finished.

* changes the default setting

* Address comments

* Fixes mistake

* [pre-commit.ci lite] apply automatic fixes

* Changes default setting

* Adds sigmoid function to protobuffer

* [pre-commit.ci lite] apply automatic fixes

* fixes some colour scaling bug, and implements the protobuffed sigmoid function

* [pre-commit.ci lite] apply automatic fixes

* Implements abstract color from gradient function

* [pre-commit.ci lite] apply automatic fixes

* move the helper function to util.py

* Type annotation

* [pre-commit.ci lite] apply automatic fixes

* Adds line to BUILD file

* [pre-commit.ci lite] apply automatic fixes

---------

Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
@williamckha
Copy link
Contributor Author

Succeeded by #3415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment