New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Skill selection with deep reinforcement learning (DQN) #3415

Draft

williamckha wants to merge 40 commits into UBC-Thunderbots:master from williamckha:william/rl

Contributor

williamckha commented Dec 7, 2024 •

edited

Loading

Description

WIP

Testing Done

Resolved Issues

Resolves #2514
Resolves #3083
Resolves #3219
Resolves #3156
Resolves #3233
Resolves #2930
Resolves #2643
Resolves #3082
Resolves #2134

Length Justification and Key Files to Review

Review Checklist

It is the reviewers responsibility to also make sure every item here has been covered

Function & Class comments: All function definitions (usually in the .h file) should have a javadoc style comment at the start of them. For examples, see the functions defined in thunderbots/software/geom. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.
Remove all commented out code
Remove extra print statements: for example, those just used for testing
Resolve all TODO's: All TODO (or similar) statements should either be completed or associated with a github issue

mkhlb and others added 28 commits

June 4, 2024 18:41


          wip ball placement improvements

ce89e2c


          Made move tactic only terminate when dribbler is not on release, made…

1b5ab3c

… ball placement go back to aligning when ball is lost


          Added new speed mode for ball placement retreats

d079f30


          Merge branch 'master' of github.com:UBC-Thunderbots/Software into fie…

a19c461

…ld_testing_june_4


          ball placement tweaks

bde271a


          Tuned motion planning, adjusted field test

091cedf


          Merge branch 'UBC-Thunderbots:master' into field_testing_june_4

971dcce


          Merge branch 'field_testing_june_4' of github.com:mkhlb/Software into…

82cd14e

… field_testing_june_4


          Reverted some changes made for debugging

a1b00b1


          Wrote comments

a18b940


          Split dribble speed modes into 2

7d0a80e


          Addressed more arune notes

d37546c


          Removed a debug log


          Fixed issue where the robot wouldn't exist the wall align state where…

74dbdb3

… a wall pickoff is no longer needed


          [pre-commit.ci lite] apply automatic fixes

ca99bc9


          Teehee

4d83ab8


          Merge remote-tracking branch 'origin/field_testing_june_4' into field…

90b1f62

…_testing_june_4


          Merge branch 'master' of github.com:UBC-Thunderbots/Software into fie…

4f305fc

…ld_testing_june_4


          Addressed some more notes

d1a6d4e


          Added constants

4644b4d


          Changed constant names

45c98d5


          Fixed ball placement play test failing

9940e3e


          Merge upstream

7be4c25


          [pre-commit.ci lite] apply automatic fixes

3e462dd


          cleaned up fsm logic

7966dd1


          Merge remote-tracking branch 'origin/field_testing_june_4' into field…

2f620b0

…_testing_june_4

# Conflicts:
#	src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.h


          Removed old constant


          [pre-commit.ci lite] apply automatic fixes

93141a1

itsarune reviewed

View reviewed changes

Contributor

itsarune left a comment

before we keep digging into this rabbit hole, have you considered other reinforcement learning algorithms? (https://en.wikipedia.org/wiki/Reinforcement_learning#Comparison_of_key_algorithms)

Contributor Author

williamckha commented Dec 12, 2024

before we keep digging into this rabbit hole, have you considered other reinforcement learning algorithms? (https://en.wikipedia.org/wiki/Reinforcement_learning#Comparison_of_key_algorithms)

DQN is the classic RL algorithm that I'm implementing mostly for pedagogical reasons. Most state of the art research focuses on actor-critic methods (e.g PPO, SAC) which we can look at later, but we can still use DQN as a baseline. There are various enhancements to DQN that we can implement as well (see Rainbow DQN).

williamckha added 2 commits

December 28, 2024 17:32


          Fix DQN update crash


          Implement prioritized replay buffer

ce7ab17

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/tactic/attacker/attacker_state.h Show resolved Hide resolved

itsarune reviewed

View reviewed changes

src/software/ai/play_selection_fsm.h Outdated

+                  bool enemyHasPossession(const Update& event);
+                  /**
+                   * Action to set up the OverridePlay, SetPlay, StopPlay, HaltPlay,

Contributor

itsarune Jan 3, 2025

OverridePlay doesn't exist

itsarune reviewed

View reviewed changes

src/software/ai/rl/dqn.hpp Show resolved Hide resolved

williamckha mentioned this pull request

New offensive gameplay architecture and Q-learning #3246

Closed

4 tasks

williamckha and others added 6 commits

January 25, 2025 16:57


          Run headless when --ci_mode option is enabled in thunderscope_main.py

2137b67


          Partial save, lock file

d6c361b


          Fix thunderscope_main.py

e1a0a98


          Terminate when ball is stationary

e1e712f


          Merge branch 'field_testing_june_4' of https://github.com/mkhlb/Software

687de0f

 into william/rl


          Add train.sh

d79a945

itsarune reviewed

View reviewed changes

src/shared/robot_constants.h

+                  // The max speed at which we will pick the ball off the wall
+                  float ball_placement_wall_max_speed_m_per_s;
+                  // The max speed at which we will retreat away from the ball after placing it [m/x]

Contributor

itsarune Feb 17, 2025

Suggested change

      
                // The max speed at which we will retreat away from the ball after placing it [m/x]
          
                // The max speed at which we will retreat away from the ball after placing it [m/s]

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp

                               .normalize();
                       Angle setup_angle = alignment_vector.orientation();
                       setup_point       = event.common.world_ptr->ball().position() -
-* alignment_vector * ROBOT_MAX_RADIUS_METERS;
+.5 * alignment_vector * ROBOT_MAX_RADIUS_METERS;

Contributor

itsarune Feb 17, 2025

2.5 * ROBOT_MAX_RADIUS_METERS is a magic number that could be a constant

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp

+                  if (!placement_point.has_value())
+                  {
+                      return;

Contributor

itsarune Feb 17, 2025

we should explicitly return tactics in all code paths

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp

-                  if (placement_point.has_value())
+                  if (!robot_placing_ball.has_value())
+                  {
+                      return;

Contributor

itsarune Feb 17, 2025

explicitly return tactics in all code paths

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp

-                  Rectangle field_lines = event.common.world_ptr->field().fieldLines();
-                  return !contains(field_lines, ball_pos);
+                  Rectangle field_lines = event.common.world_ptr->field().fieldBoundary();
+                  double wiggle_room    = std::abs(signedDistance(ball_pos, field_lines));

Contributor

itsarune Feb 17, 2025

why not just use distance() here

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.cpp

Comment on lines +332 to +359

+                  if (near_positive_y_boundary && near_positive_x_boundary)  // top right corner
                   {
-                      if (ball_pos.y() > 0)
-                      {
-                          kick_angle = Angle::fromDegrees(45);
-                      }
-                      else
-                      {
-                          kick_angle = Angle::fromDegrees(-45);
-                      }
+                      facing_angle  = Angle::fromDegrees(45);
+                      backoff_point = field_boundary.posXPosYCorner() -
+                                      Vector::createFromAngle(facing_angle)
+                                          .normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
                   }
-                  else if (ball_pos.x() < field_lines.xMin())
+                  else if (near_positive_y_boundary && near_negative_x_boundary)  // top left corner
                   {
-                      if (ball_pos.y() > 0)
-                      {
-                          kick_angle = Angle::fromDegrees(135);
-                      }
-                      else
-                      {
-                          kick_angle = Angle::fromDegrees(-135);
-                      }
+                      facing_angle  = Angle::fromDegrees(135);
+                      backoff_point = field_boundary.negXPosYCorner() -
+                                      Vector::createFromAngle(facing_angle)
+                                          .normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
                   }
-                  else if (ball_pos.y() > field_lines.yMax())
+                  else if (near_negative_y_boundary && near_positive_x_boundary)  // bottom right corner
                   {
-                      if (ball_pos.x() > 0)
-                      {
-                          kick_angle = Angle::fromDegrees(135);
-                      }
-                      else
-                      {
-                          kick_angle = Angle::fromDegrees(45);
-                      }
+                      facing_angle  = Angle::fromDegrees(-45);
+                      backoff_point = field_boundary.posXNegYCorner() -
+                                      Vector::createFromAngle(facing_angle)
+                                          .normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
                   }
-                  else if (ball_pos.y() < field_lines.yMin())
+                  else if (near_negative_y_boundary && near_negative_x_boundary)  // bottom left corner
                   {
-                      if (ball_pos.x() > 0)
-                      {
-                          kick_angle = Angle::fromDegrees(-135);
-                      }
-                      else
-                      {
-                          kick_angle = Angle::fromDegrees(-45);
-                      }
+                      facing_angle  = Angle::fromDegrees(-135);
+                      backoff_point = field_boundary.negXNegYCorner() -
+                                      Vector::createFromAngle(facing_angle)
+                                          .normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
+                  }

Contributor

itsarune Feb 17, 2025

we don't have notions of top, bottom, left and right in our documentation. This shouldn't be a part of the comments.

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.h

Comment on lines +16 to +18

+                  double BACK_AWAY_FROM_CORNER_EXTRA_M               = 0.9;
+                  double BACK_AWAY_FROM_WALL_M                       = ROBOT_MAX_RADIUS_METERS * 5.5;
+                  double MINIMUM_DISTANCE_FROM_WALL_FOR_ALIGN_METERS = ROBOT_MAX_RADIUS_METERS * 4.0;

Contributor

itsarune Feb 17, 2025

make private

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.h

                    *
                    * @param ball_pos the ball position to use when calculating the kick angle
                    * @param field_lines the field lines of the playing area
                    *
                    * @return the kick angle
                    */
-                  Angle calculateWallKickoffAngle(const Point& ball_pos, const Rectangle& field_lines);
+                  std::pair<Angle, Point> calculateWallPickOffLocation(const Point& ball_pos,

Contributor

itsarune Feb 17, 2025

max_dist isn't documented

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm_test.cpp

		@@ -33,94 +33,3 @@ TEST(BallPlacementPlayFSMTest, test_transitions)

		EXPECT_TRUE(fsm.is(boost::sml::state<BallPlacementPlayFSM::AlignPlacementState>));
		}

Contributor

itsarune Feb 17, 2025

thorougher FSM tests would be nice

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/play/offense/offense_play_fsm.cpp

+                  // Note that getBestReceivingPositions may return fewer positions than requested
+                  // if there are not enough robots, so we will need to check the size of the vector.
+                  for (unsigned int i = 0;
+                       i < offensive_positioning_tactics_.size() && i < best_receiving_positions.size();

Contributor

itsarune Feb 17, 2025

Suggested change

      
                     i < offensive_positioning_tactics_.size() && i < best_receiving_positions.size();
          
                      i < best_receiving_positions.size();

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/tactic/attacker/attacker_tactic.h

+                   * @param new_world the current World
+                   * @param is_final whether this is the final World in the current episode
+                   */
+                  void updateDQN(const WorldPtr& new_world, bool is_final);

Contributor

itsarune Feb 17, 2025

in our code conventions, acronyms are treated as words

Contributor

itsarune Feb 17, 2025 •

edited

Loading

updateDqn

itsarune reviewed

View reviewed changes

src/software/ai/hl/stp/tactic/dribble/dribble_tactic.h

@@ @@ -32,10 +32,15 @@ class DribbleTactic : public Tactic @@
                    * finishing dribbling
                    * @param allow_excessive_dribbling Whether to allow excessive dribbling, i.e. more
                    * than 1 metre at a time
+                   * @param max_speed_dribble The max speed attained while the ball is in possession

Contributor

itsarune Feb 17, 2025

max_speed_get_possession isn't documented

itsarune reviewed

View reviewed changes

src/software/ai/rl/dqn.hpp

+               * @tparam TAction type representing the action space of the environment
+               */
+              template <typename TState, typename TAction>
+              class DQN

Contributor

itsarune Feb 17, 2025

treat acronyms as words so Dqn

itsarune reviewed

View reviewed changes

src/software/field_tests/movement_robot_field_test.py

Contributor

itsarune Feb 17, 2025

accidental change?

itsarune reviewed

View reviewed changes

src/software/thunderscope/thunderscope_main.py

Comment on lines +544 to +547

+                                          ball_displacement = math.sqrt(
+                                              (ball_position.x_meters - last_ball_position.x_meters) ** 2 +
+                                              (ball_position.y_meters - last_ball_position.y_meters) ** 2
+                                          )

Contributor

itsarune Feb 17, 2025

can you use the cpp distance function?

itsarune reviewed

View reviewed changes

src/software/thunderscope/thunderscope_main.py

Comment on lines +549 to +557

+                                          if ball_displacement > 0.2:
+                                              last_ball_position = ball_position
+                                              last_time_ball_moved = current_time
+                                          if current_time - last_time_ball_moved > 30:
+                                              stop_event.set()
+                                              tscope.close()
+                                          else:
+                                              print(f"Ball stationary for: {current_time - last_time_ball_moved} seconds")

Contributor

itsarune Feb 17, 2025

magic numbers should be a constant

itsarune reviewed

View reviewed changes

src/train.sh

Comment on lines +6 to +7

		bazel run //software/thunderscope:thunderscope_main --copt=-O3 --jobs=4 \
		-- --training_mode --enable_autoref --ci_mode

Contributor

itsarune Feb 17, 2025

is limiting the number of jobs necessary?

itsarune reviewed

View reviewed changes

Contributor

itsarune left a comment

left some feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment