New offensive gameplay architecture and Q-learning #3246

williamckha · 2024-07-08T04:51:55Z

Description

This PR overhauls our offensive gameplay architecture. The goal of the architecture redesign is to make our AI more dynamic and adaptive to the enemy we play against.

I recommend taking a look at the Gameplay Architecture RFC. Although it is very outdated in terms of the planned implementation and architecture, the goals and user stories provide some good background on the project.

Gameplay overview

classDiagram
    direction TB

    namespace Plays {
        class DynamicPlay
        class OffensePlay
    }

    <<abstract>> DynamicPlay

    namespace Tactics {
        class AttackerTactic
        class SupportTactic
    }
    
    namespace Skills {
        class ShootSkill
        class KeepAwaySkill
        class KickPassSkill
        class ChipPassSkill
    }

    <<abstract>> SupportTactic


    DynamicPlay <|-- OffensePlay : inherits
    OffensePlay --> AttackerTactic
    OffensePlay --> "many" SupportTactic

    AttackerTactic --> ShootSkill
    AttackerTactic --> KeepAwaySkill 
    AttackerTactic --> KickPassSkill 
    AttackerTactic --> ChipPassSkill

DynamicPlay is a base Play that selects SupportTactics to assign. Support tactics play supporting roles on the field (e.g. going out to receiver positions, faking out enemy robots, etc.). Over time, DynamicPlay is supposed to learn the best support tactics to select given the state of the game.

The implementation of DynamicPlay and its support tactic selection algorithm may change in the future, and we currently only have one support tactic (ReceiverTactic), so I would not focus on reviewing these changes.
OffensePlay is a DynamicPlay that is run when we have possession. It assigns defensive tactics selected by DefensePlay, support tactics that are selected by DynamicPlay, and an AttackerTactic that is the main ball handler during the play.
There is now a clearer separation between Skills and Tactics. A Skill is smaller in scope and completes a single action (e.g. kick, chip, pass, dribble), while a Tactic has a greater set of responsibilities and objectives to complete. Skills have a similar interface to Tactics and are also implemented using FSMs that yield primitives. A Tactic can "execute" a Skill by forwarding the Skill's updatePrimitive result in its own updatePrimitive.

Q-learning for attacker skill selection

The attacker uses a reinforcement learning algorithm called Q-learning to select and learn which skills (actions) to execute given the state of the World.

In Q-learning, the agent decides which action to take based on a Q-function $Q(s,a)$ that returns the expected reward for an action taken in a given state. After selecting some action $a$, we observe a reward $r$ and enter a new state $s'$ which are used to update the Q-function and adjust the Q-value given for taking action $a$ in state $s$. Since our state space is extremely large, we use linear Q-function approximation to estimate $Q(s,a)$ even if we have not previously applied action $a$ in state $s$.

The Q-function weights are logged as protobufs and displayed in a new Q-learning widget in Thunderscope. Each weight in the table is associated with a feature and an action. Columns represent the features in the order they are initialized in AttackerMdpFeatureExtractor, and rows represent the actions in the order they are defined in AttackerMdpAction.

We can save the weights to a CSV file (they are also automatically written to a CSV under /tmp/tbots) and we can load in an initial set of weights when starting the AI (attacker_mdp_q_function_weights.csv).

Other changes

Changed most Tactics and all Plays to accept a shared instance of a Strategy class instead of TbotsProto::AiConfig. The Strategy contains shared gameplay calculations and has a getAiConfig() method that returns the latest TbotsProto::AiConfig.
Updated SensorFusion to track the distance that the ball has been continuously dribbled by the friendly team. This dribble distance is output in the Worlds that SensorFusion produces. We use this information to limit how far DribbleSkill can dribble, so that even if multiple dribbling skills are executed sequentially, we will avoid going over the max dribble distance.
Changed PossessionTracker to match original CMDragons possession algorithm more closely. There are now 4 types of possession (FRIENDLY, ENEMY, IN_CONTEST, LOOSE). To make our gameplay more aggressive, DefensePlay is only run when we are in ENEMY possession; otherwise, OffensePlay is run.
Probably more changes that I can't remember...

Testing Done

Based on the eye test, everything works and looks OK. We ran the new gameplay for an extended period of time during the scrimmage, and I don't think there were any major hitches or crashes.
Existing simulated gameplay tests have been updated and all pass
Tactics converted to Skills have had their tests updated and all pass

Resolved Issues

Resolves #3080
Resolves #3079
Resolves #3078
Resolves #3077
Resolves #3076
Resolves #3074
Resolves #3073
Resolves #3071
Resolves #3070
Resolves #3069
Resolves #3065
Resolves #2514
Resolves #3083
Resolves #3081
Resolves #3072
Resolves #3219
Resolves #3203
Resolves #3216
Resolves #3156
Resolves #3233
Resolves #2930
Resolves #2868
Resolves #2643
Resolves #3098
Resolves #3097
Resolves #3096
Resolves #3082
Resolves #2134

Length Justification and Key Files to Review

software/ai/evaluation/q_learning
software/ai/hl/stp/skill
software/ai/hl/stp/play/dynamic_plays
software/ai/hl/stp/tactic/attacker

Review Checklist

It is the reviewers responsibility to also make sure every item here has been covered

Function & Class comments: All function definitions (usually in the .h file) should have a javadoc style comment at the start of them. For examples, see the functions defined in thunderbots/software/geom. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.
Remove all commented out code
Remove extra print statements: for example, those just used for testing
Resolve all TODO's: All TODO (or similar) statements should either be completed or associated with a github issue

…/new_free_kick # Conflicts: # src/software/ai/hl/stp/play/BUILD # src/software/ai/hl/stp/play/free_kick_play.cpp # src/software/ai/hl/stp/play/free_kick_play.h

… ball placement go back to aligning when ball is lost

…m/nimazareian/Software into new_gameplay_staging

…into new_gameplay_staging

…2' into nima/receiver_position_generator2

…ld_testing_june_4

… field_testing_june_4

…bots/Software into new_gameplay_staging

…into new_gameplay_staging

notion-workspace · 2024-09-21T03:39:57Z

New offensive gameplay architecture and Q-learning

itsarune · 2024-10-12T10:20:37Z

src/software/ai/hl/stp/play/ball_placement/ball_placement_play.cpp

-BallPlacementPlay::BallPlacementPlay(TbotsProto::AiConfig config)
-    : Play(config, true), fsm{BallPlacementPlayFSM{config}}, control_params{}
+BallPlacementPlay::BallPlacementPlay(std::shared_ptr<Strategy> strategy)
+    : Play(true, strategy),


It may make more sense for requires_goalie to be false during ball placement. Otherwise, the goalie will still try to goalie

itsarune · 2024-10-30T20:37:06Z

docs/useful-robot-commands.md

+        `ssh [email protected]` (for Nanos) OR `ssh [email protected]` (for Pis) OR `ssh robot@robot_name.local`
+        e.g. `ssh [email protected]` (for Nanos) or `ssh robot@192.168.1.203` (for Pis) or `ssh robot@robert.local`
+        for a robot called robert with robot id 3")


these IP addresses should be updated

itsarune · 2024-10-30T20:45:39Z

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp

+    pickoff_wall_tactic->updateControlParams(
+        {.dribble_destination       = pickoff_destination,
+         .final_dribble_orientation = pickoff_final_orientation,
+         .excessive_dribbling_mode  = TbotsProto::ExcessiveDribblingMode::LOSE_BALL,


should this be ExcessiveDribblingMode::LOSE_BALL? Shouldn't we allow excessive dribbling?

itsarune · 2024-10-31T06:07:51Z

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp

    }
-    else if (ball_pos.y() < field_lines.yMin())
+    else if (near_negative_y_boundary && near_negative_x_boundary)  // bottom left corner


bottom right corner

itsarune · 2024-10-31T06:10:02Z

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp


-    if (num_move_tactics <= 0)
+    if (num_move_skill_tactics <= 0)


nit: move this conditionel check to the top of the function

itsarune · 2024-10-31T06:14:01Z

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.h

+    double BACK_AWAY_FROM_WALL_M                       = ROBOT_MAX_RADIUS_METERS * 5.5;
+    double MINIMUM_DISTANCE_FROM_WALL_FOR_ALIGN_METERS = ROBOT_MAX_RADIUS_METERS * 4.0;


nit: some of these comments have _M suffix and others have a _METERS suffix. We should make this consistent across all the constants here. I prefer the _M prefix.

itsarune · 2024-10-31T06:16:35Z

src/software/ai/hl/stp/play/dynamic_plays/offense_play.cpp

+    // Always try assigning AttackerTactic
+    if (play_update.num_tactics > 0)
+    {
+        const bool attacker_not_suspended =


Suggested change

const bool attacker_not_suspended =

const bool attacker_is_running =

nit

itsarune · 2024-10-31T13:07:46Z

src/software/ai/hl/stp/skill/dribble/dribble_skill_fsm.h

+                GetBallControl_S,
+            LoseBall_S + Update_E / loseBall_A,
+
+            X + Update_E[!shouldExcessivelyDribble_G] / dribble_A,


Suggested change

X + Update_E[!shouldExcessivelyDribble_G] / dribble_A,

X + Update_E[!shouldExcessivelyDribble_G] / loseBall_A,

lose ball right when we're trying NOT to excessively dribble?

itsarune · 2024-10-31T13:08:29Z

src/software/ai/hl/stp/skill/dribble/dribble_skill_fsm.h

+            Dribble_S +
+                Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / dribble_A = X,


Suggested change

Dribble_S +

Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / dribble_A = X,

Dribble_S +

Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / loseBall_A = X,

If we don't want to excessively dribble, should we have loseBall here?

itsarune · 2024-10-31T13:15:21Z

src/software/ai/hl/stp/skill/pivot_kick/pivot_kick_skill_fsm.cpp

+    // for it to be considered out of our control; otherwise we might consider
+    // kicked balls as out of our control
+    if (time_since_kick_start < lose_ball_control_time_threshold ||
+        ball.velocity().length() > ball_is_kicked_m_per_s_threshold)


Suggested change

ball.velocity().length() > ball_is_kicked_m_per_s_threshold)

ball.velocity().length() < ball_is_kicked_m_per_s_threshold)

Can you double check if the comparator should be flipped?

itsarune · 2024-11-02T11:35:44Z

src/tbots.py

        bazel_arguments += ["--hosts"]
-        bazel_arguments += [f"192.168.0.20{id}" for id in args.flash_robots]
+        if args.platform == "NANO":


outdated now

itsarune

sorry about taking so long, but left some more feedback

Adds touching_ball_threshold to sensorfusion protobuf

…rom pr #3246

* Adds the gl_max_dribble_layer.py file and the BUILD file entry. * fixes some errors introduced in the last commit * Adds the toggle to the widget setup functions. * Adds changes from pr #3246 with regards to updateDribbleDisplacement Adds touching_ball_threshold to sensorfusion protobuf * Starts implementation for dribble layer * Adds dribble displacement field to world, with a getter and setter, from pr #3246 * Adds dribble displacement field to python bindings * CONTINUE THE PIPELINE * Adds framework * bug fixes world.h * Undos the changes to pybindings * Finished. * changes the default setting * Address comments * Fixes mistake * [pre-commit.ci lite] apply automatic fixes * Changes default setting * Adds sigmoid function to protobuffer * [pre-commit.ci lite] apply automatic fixes * fixes some colour scaling bug, and implements the protobuffed sigmoid function * [pre-commit.ci lite] apply automatic fixes * Implements abstract color from gradient function * [pre-commit.ci lite] apply automatic fixes * move the helper function to util.py * Type annotation * [pre-commit.ci lite] apply automatic fixes * Adds line to BUILD file * [pre-commit.ci lite] apply automatic fixes --------- Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>

williamckha · 2025-01-25T05:17:56Z

Succeeded by #3415

nimazareian and others added 30 commits June 25, 2024 15:30

Merge branch 'refs/heads/nima/receiver_position_generator2' into nima…

22b9a28

…/new_free_kick # Conflicts: # src/software/ai/hl/stp/play/BUILD # src/software/ai/hl/stp/play/free_kick_play.cpp # src/software/ai/hl/stp/play/free_kick_play.h

Made move tactic only terminate when dribbler is not on release, made…

1b5ab3c

… ball placement go back to aligning when ball is lost

Added new speed mode for ball placement retreats

d079f30

wip. should update get behind ball polygon and check speed

0c0f786

Merge branch 'nima/receiver_position_generator2' of https://github.co…

45a033d

…m/nimazareian/Software into new_gameplay_staging

Merge branch 'master' of https://github.com/UBC-Thunderbots/Software …

1cbc658

…into new_gameplay_staging

Update behind ball polygon and update kick/chip fsm

1220776

wip using defense play

af86297

working impl of using defense play

7e1a86a

formatting and update some logic

e4e0933

additional cleanup

3a1a0b8

remove extra ;

b81dfa3

remove -0.5 kick speed

4fa603b

Add some basic validations for friendly freekick

45e47f8

Address comments

d79d913

Merge remote-tracking branch 'origin/nima/receiver_position_generator…

5facdd0

…2' into nima/receiver_position_generator2

Merge branch 'master' of github.com:UBC-Thunderbots/Software into fie…

a19c461

…ld_testing_june_4

ball placement tweaks

bde271a

new linux configs

d68fae0

spi config

5154e58

partial refactor

d204832

wip

dff1d07

cleaning up ansible script

671d89b

cleaning up ansible script

ce22ffe

Tuned motion planning, adjusted field test

091cedf

Merge branch 'UBC-Thunderbots:master' into field_testing_june_4

971dcce

Merge branch 'field_testing_june_4' of github.com:mkhlb/Software into…

82cd14e

… field_testing_june_4

Reverted some changes made for debugging

a1b00b1

remove extra logs

629ed49

update shouldRealignWithBall

7839010

williamckha and others added 7 commits August 17, 2024 23:40

Nits

74a4248

Merge branch 'update_python' into new_gameplay_staging

c3b00cc

Remove updateWorld in Strategy

fc50958

[pre-commit.ci lite] apply automatic fixes

ca38708

Fix broken links in docs

ed07467

Merge branch 'new_gameplay_staging' of https://github.com/UBC-Thunder…

e5cc47c

…bots/Software into new_gameplay_staging

Merge branch 'master' of https://github.com/UBC-Thunderbots/Software …

e55beaa

…into new_gameplay_staging

williamckha requested a review from sauravbanna as a code owner August 31, 2024 00:41

itsarune mentioned this pull request Sep 2, 2024

Ball placement improvements #3243

Closed

4 tasks

itsarune mentioned this pull request Oct 5, 2024

Pass receives should get out of the way when another robot is taking a shot #3343

Open

2 tasks

itsarune removed the Robocup-2024 label Oct 12, 2024

itsarune reviewed Oct 12, 2024

View reviewed changes

itsarune reviewed Oct 30, 2024

View reviewed changes

itsarune reviewed Oct 31, 2024

View reviewed changes

itsarune reviewed Nov 2, 2024

View reviewed changes

GrayHoang added a commit that referenced this pull request Dec 1, 2024

Adds changes from pr #3246 with regards to updateDribbleDisplacement

022eeaf

Adds touching_ball_threshold to sensorfusion protobuf

GrayHoang added a commit that referenced this pull request Dec 1, 2024

Adds dribble displacement field to world, with a getter and setter, f…

5e81e84

…rom pr #3246

GrayHoang mentioned this pull request Dec 1, 2024

Max Dribble Visualization #3413

Merged

4 tasks

williamckha closed this Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New offensive gameplay architecture and Q-learning #3246

New offensive gameplay architecture and Q-learning #3246

williamckha commented Jul 8, 2024 •

edited

Loading

notion-workspace bot commented Sep 21, 2024

itsarune Oct 12, 2024

itsarune Oct 30, 2024

itsarune Oct 30, 2024

itsarune Oct 31, 2024

itsarune Oct 31, 2024

itsarune Oct 31, 2024

itsarune Oct 31, 2024

itsarune Oct 31, 2024

itsarune Oct 31, 2024

itsarune Oct 31, 2024

itsarune Nov 2, 2024

itsarune left a comment

williamckha commented Jan 25, 2025

		double BACK_AWAY_FROM_WALL_M = ROBOT_MAX_RADIUS_METERS * 5.5;
		double MINIMUM_DISTANCE_FROM_WALL_FOR_ALIGN_METERS = ROBOT_MAX_RADIUS_METERS * 4.0;

	const bool attacker_not_suspended =
	const bool attacker_is_running =

	X + Update_E[!shouldExcessivelyDribble_G] / dribble_A,
	X + Update_E[!shouldExcessivelyDribble_G] / loseBall_A,

		Dribble_S +
		Update_E[shouldLoseBall_G && !shouldExcessivelyDribble_G] / dribble_A = X,

	ball.velocity().length() > ball_is_kicked_m_per_s_threshold)
	ball.velocity().length() < ball_is_kicked_m_per_s_threshold)

New offensive gameplay architecture and Q-learning #3246

New offensive gameplay architecture and Q-learning #3246

Conversation

williamckha commented Jul 8, 2024 • edited Loading

Description

Gameplay overview

Q-learning for attacker skill selection

Other changes

Testing Done

Resolved Issues

Length Justification and Key Files to Review

Review Checklist

notion-workspace bot commented Sep 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itsarune left a comment

Choose a reason for hiding this comment

williamckha commented Jan 25, 2025

williamckha commented Jul 8, 2024 •

edited

Loading