Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skill selection with deep reinforcement learning (DQN) #3415

Draft
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

williamckha
Copy link
Contributor

@williamckha williamckha commented Dec 7, 2024

Description

WIP

Testing Done

Resolved Issues

Resolves #2514
Resolves #3083
Resolves #3219
Resolves #3156
Resolves #3233
Resolves #2930
Resolves #2643
Resolves #3082
Resolves #2134

Length Justification and Key Files to Review

Review Checklist

It is the reviewers responsibility to also make sure every item here has been covered

  • Function & Class comments: All function definitions (usually in the .h file) should have a javadoc style comment at the start of them. For examples, see the functions defined in thunderbots/software/geom. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.
  • Remove all commented out code
  • Remove extra print statements: for example, those just used for testing
  • Resolve all TODO's: All TODO (or similar) statements should either be completed or associated with a github issue

mkhlb and others added 28 commits June 4, 2024 18:41
… ball placement go back to aligning when ball is lost
…_testing_june_4

# Conflicts:
#	src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.h
Copy link
Contributor

@itsarune itsarune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we keep digging into this rabbit hole, have you considered other reinforcement learning algorithms? (https://en.wikipedia.org/wiki/Reinforcement_learning#Comparison_of_key_algorithms)

@williamckha
Copy link
Contributor Author

before we keep digging into this rabbit hole, have you considered other reinforcement learning algorithms? (https://en.wikipedia.org/wiki/Reinforcement_learning#Comparison_of_key_algorithms)

DQN is the classic RL algorithm that I'm implementing mostly for pedagogical reasons. Most state of the art research focuses on actor-critic methods (e.g PPO, SAC) which we can look at later, but we can still use DQN as a baseline. There are various enhancements to DQN that we can implement as well (see Rainbow DQN).

bool enemyHasPossession(const Update& event);

/**
* Action to set up the OverridePlay, SetPlay, StopPlay, HaltPlay,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OverridePlay doesn't exist

// The max speed at which we will pick the ball off the wall
float ball_placement_wall_max_speed_m_per_s;

// The max speed at which we will retreat away from the ball after placing it [m/x]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The max speed at which we will retreat away from the ball after placing it [m/x]
// The max speed at which we will retreat away from the ball after placing it [m/s]

@@ -53,12 +95,12 @@ void BallPlacementPlayFSM::alignPlacement(const Update &event)
.normalize();
Angle setup_angle = alignment_vector.orientation();
setup_point = event.common.world_ptr->ball().position() -
2 * alignment_vector * ROBOT_MAX_RADIUS_METERS;
2.5 * alignment_vector * ROBOT_MAX_RADIUS_METERS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.5 * ROBOT_MAX_RADIUS_METERS is a magic number that could be a constant


if (!placement_point.has_value())
{
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should explicitly return tactics in all code paths

if (placement_point.has_value())
if (!robot_placing_ball.has_value())
{
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly return tactics in all code paths

Rectangle field_lines = event.common.world_ptr->field().fieldLines();
return !contains(field_lines, ball_pos);
Rectangle field_lines = event.common.world_ptr->field().fieldBoundary();
double wiggle_room = std::abs(signedDistance(ball_pos, field_lines));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use distance() here

Comment on lines +332 to +359
if (near_positive_y_boundary && near_positive_x_boundary) // top right corner
{
if (ball_pos.y() > 0)
{
kick_angle = Angle::fromDegrees(45);
}
else
{
kick_angle = Angle::fromDegrees(-45);
}
facing_angle = Angle::fromDegrees(45);
backoff_point = field_boundary.posXPosYCorner() -
Vector::createFromAngle(facing_angle)
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
}
else if (ball_pos.x() < field_lines.xMin())
else if (near_positive_y_boundary && near_negative_x_boundary) // top left corner
{
if (ball_pos.y() > 0)
{
kick_angle = Angle::fromDegrees(135);
}
else
{
kick_angle = Angle::fromDegrees(-135);
}
facing_angle = Angle::fromDegrees(135);
backoff_point = field_boundary.negXPosYCorner() -
Vector::createFromAngle(facing_angle)
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
}
else if (ball_pos.y() > field_lines.yMax())
else if (near_negative_y_boundary && near_positive_x_boundary) // bottom right corner
{
if (ball_pos.x() > 0)
{
kick_angle = Angle::fromDegrees(135);
}
else
{
kick_angle = Angle::fromDegrees(45);
}
facing_angle = Angle::fromDegrees(-45);
backoff_point = field_boundary.posXNegYCorner() -
Vector::createFromAngle(facing_angle)
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
}
else if (ball_pos.y() < field_lines.yMin())
else if (near_negative_y_boundary && near_negative_x_boundary) // bottom left corner
{
if (ball_pos.x() > 0)
{
kick_angle = Angle::fromDegrees(-135);
}
else
{
kick_angle = Angle::fromDegrees(-45);
}
facing_angle = Angle::fromDegrees(-135);
backoff_point = field_boundary.negXNegYCorner() -
Vector::createFromAngle(facing_angle)
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have notions of top, bottom, left and right in our documentation. This shouldn't be a part of the comments.

Comment on lines +16 to +18
double BACK_AWAY_FROM_CORNER_EXTRA_M = 0.9;
double BACK_AWAY_FROM_WALL_M = ROBOT_MAX_RADIUS_METERS * 5.5;
double MINIMUM_DISTANCE_FROM_WALL_FOR_ALIGN_METERS = ROBOT_MAX_RADIUS_METERS * 4.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make private

*
* @param ball_pos the ball position to use when calculating the kick angle
* @param field_lines the field lines of the playing area
*
* @return the kick angle
*/
Angle calculateWallKickoffAngle(const Point& ball_pos, const Rectangle& field_lines);
std::pair<Angle, Point> calculateWallPickOffLocation(const Point& ball_pos,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_dist isn't documented

@@ -33,94 +33,3 @@ TEST(BallPlacementPlayFSMTest, test_transitions)

EXPECT_TRUE(fsm.is(boost::sml::state<BallPlacementPlayFSM::AlignPlacementState>));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thorougher FSM tests would be nice

// Note that getBestReceivingPositions may return fewer positions than requested
// if there are not enough robots, so we will need to check the size of the vector.
for (unsigned int i = 0;
i < offensive_positioning_tactics_.size() && i < best_receiving_positions.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
i < offensive_positioning_tactics_.size() && i < best_receiving_positions.size();
i < best_receiving_positions.size();

* @param new_world the current World
* @param is_final whether this is the final World in the current episode
*/
void updateDQN(const WorldPtr& new_world, bool is_final);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our code conventions, acronyms are treated as words

Copy link
Contributor

@itsarune itsarune Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateDqn

@@ -32,10 +32,15 @@ class DribbleTactic : public Tactic
* finishing dribbling
* @param allow_excessive_dribbling Whether to allow excessive dribbling, i.e. more
* than 1 metre at a time
* @param max_speed_dribble The max speed attained while the ball is in possession
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_speed_get_possession isn't documented

* @tparam TAction type representing the action space of the environment
*/
template <typename TState, typename TAction>
class DQN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treat acronyms as words so Dqn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidental change?

Comment on lines +544 to +547
ball_displacement = math.sqrt(
(ball_position.x_meters - last_ball_position.x_meters) ** 2 +
(ball_position.y_meters - last_ball_position.y_meters) ** 2
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the cpp distance function?

Comment on lines +549 to +557
if ball_displacement > 0.2:
last_ball_position = ball_position
last_time_ball_moved = current_time

if current_time - last_time_ball_moved > 30:
stop_event.set()
tscope.close()
else:
print(f"Ball stationary for: {current_time - last_time_ball_moved} seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic numbers should be a constant

Comment on lines +6 to +7
bazel run //software/thunderscope:thunderscope_main --copt=-O3 --jobs=4 \
-- --training_mode --enable_autoref --ci_mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is limiting the number of jobs necessary?

Copy link
Contributor

@itsarune itsarune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants