-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skill selection with deep reinforcement learning (DQN) #3415
base: master
Are you sure you want to change the base?
Conversation
… ball placement go back to aligning when ball is lost
…ld_testing_june_4
… field_testing_june_4
… a wall pickoff is no longer needed
…ld_testing_june_4
…_testing_june_4 # Conflicts: # src/software/ai/hl/stp/play/ball_placement/ball_placement_play_fsm.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before we keep digging into this rabbit hole, have you considered other reinforcement learning algorithms? (https://en.wikipedia.org/wiki/Reinforcement_learning#Comparison_of_key_algorithms)
DQN is the classic RL algorithm that I'm implementing mostly for pedagogical reasons. Most state of the art research focuses on actor-critic methods (e.g PPO, SAC) which we can look at later, but we can still use DQN as a baseline. There are various enhancements to DQN that we can implement as well (see Rainbow DQN). |
src/software/ai/play_selection_fsm.h
Outdated
bool enemyHasPossession(const Update& event); | ||
|
||
/** | ||
* Action to set up the OverridePlay, SetPlay, StopPlay, HaltPlay, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OverridePlay
doesn't exist
// The max speed at which we will pick the ball off the wall | ||
float ball_placement_wall_max_speed_m_per_s; | ||
|
||
// The max speed at which we will retreat away from the ball after placing it [m/x] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The max speed at which we will retreat away from the ball after placing it [m/x] | |
// The max speed at which we will retreat away from the ball after placing it [m/s] |
@@ -53,12 +95,12 @@ void BallPlacementPlayFSM::alignPlacement(const Update &event) | |||
.normalize(); | |||
Angle setup_angle = alignment_vector.orientation(); | |||
setup_point = event.common.world_ptr->ball().position() - | |||
2 * alignment_vector * ROBOT_MAX_RADIUS_METERS; | |||
2.5 * alignment_vector * ROBOT_MAX_RADIUS_METERS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.5 * ROBOT_MAX_RADIUS_METERS
is a magic number that could be a constant
|
||
if (!placement_point.has_value()) | ||
{ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should explicitly return tactics in all code paths
if (placement_point.has_value()) | ||
if (!robot_placing_ball.has_value()) | ||
{ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicitly return tactics in all code paths
Rectangle field_lines = event.common.world_ptr->field().fieldLines(); | ||
return !contains(field_lines, ball_pos); | ||
Rectangle field_lines = event.common.world_ptr->field().fieldBoundary(); | ||
double wiggle_room = std::abs(signedDistance(ball_pos, field_lines)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just use distance()
here
if (near_positive_y_boundary && near_positive_x_boundary) // top right corner | ||
{ | ||
if (ball_pos.y() > 0) | ||
{ | ||
kick_angle = Angle::fromDegrees(45); | ||
} | ||
else | ||
{ | ||
kick_angle = Angle::fromDegrees(-45); | ||
} | ||
facing_angle = Angle::fromDegrees(45); | ||
backoff_point = field_boundary.posXPosYCorner() - | ||
Vector::createFromAngle(facing_angle) | ||
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M); | ||
} | ||
else if (ball_pos.x() < field_lines.xMin()) | ||
else if (near_positive_y_boundary && near_negative_x_boundary) // top left corner | ||
{ | ||
if (ball_pos.y() > 0) | ||
{ | ||
kick_angle = Angle::fromDegrees(135); | ||
} | ||
else | ||
{ | ||
kick_angle = Angle::fromDegrees(-135); | ||
} | ||
facing_angle = Angle::fromDegrees(135); | ||
backoff_point = field_boundary.negXPosYCorner() - | ||
Vector::createFromAngle(facing_angle) | ||
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M); | ||
} | ||
else if (ball_pos.y() > field_lines.yMax()) | ||
else if (near_negative_y_boundary && near_positive_x_boundary) // bottom right corner | ||
{ | ||
if (ball_pos.x() > 0) | ||
{ | ||
kick_angle = Angle::fromDegrees(135); | ||
} | ||
else | ||
{ | ||
kick_angle = Angle::fromDegrees(45); | ||
} | ||
facing_angle = Angle::fromDegrees(-45); | ||
backoff_point = field_boundary.posXNegYCorner() - | ||
Vector::createFromAngle(facing_angle) | ||
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M); | ||
} | ||
else if (ball_pos.y() < field_lines.yMin()) | ||
else if (near_negative_y_boundary && near_negative_x_boundary) // bottom left corner | ||
{ | ||
if (ball_pos.x() > 0) | ||
{ | ||
kick_angle = Angle::fromDegrees(-135); | ||
} | ||
else | ||
{ | ||
kick_angle = Angle::fromDegrees(-45); | ||
} | ||
facing_angle = Angle::fromDegrees(-135); | ||
backoff_point = field_boundary.negXNegYCorner() - | ||
Vector::createFromAngle(facing_angle) | ||
.normalize(BACK_AWAY_FROM_CORNER_EXTRA_M); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have notions of top, bottom, left and right in our documentation. This shouldn't be a part of the comments.
double BACK_AWAY_FROM_CORNER_EXTRA_M = 0.9; | ||
double BACK_AWAY_FROM_WALL_M = ROBOT_MAX_RADIUS_METERS * 5.5; | ||
double MINIMUM_DISTANCE_FROM_WALL_FOR_ALIGN_METERS = ROBOT_MAX_RADIUS_METERS * 4.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make private
* | ||
* @param ball_pos the ball position to use when calculating the kick angle | ||
* @param field_lines the field lines of the playing area | ||
* | ||
* @return the kick angle | ||
*/ | ||
Angle calculateWallKickoffAngle(const Point& ball_pos, const Rectangle& field_lines); | ||
std::pair<Angle, Point> calculateWallPickOffLocation(const Point& ball_pos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_dist
isn't documented
@@ -33,94 +33,3 @@ TEST(BallPlacementPlayFSMTest, test_transitions) | |||
|
|||
EXPECT_TRUE(fsm.is(boost::sml::state<BallPlacementPlayFSM::AlignPlacementState>)); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thorougher FSM tests would be nice
// Note that getBestReceivingPositions may return fewer positions than requested | ||
// if there are not enough robots, so we will need to check the size of the vector. | ||
for (unsigned int i = 0; | ||
i < offensive_positioning_tactics_.size() && i < best_receiving_positions.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i < offensive_positioning_tactics_.size() && i < best_receiving_positions.size(); | |
i < best_receiving_positions.size(); |
* @param new_world the current World | ||
* @param is_final whether this is the final World in the current episode | ||
*/ | ||
void updateDQN(const WorldPtr& new_world, bool is_final); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in our code conventions, acronyms are treated as words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateDqn
@@ -32,10 +32,15 @@ class DribbleTactic : public Tactic | |||
* finishing dribbling | |||
* @param allow_excessive_dribbling Whether to allow excessive dribbling, i.e. more | |||
* than 1 metre at a time | |||
* @param max_speed_dribble The max speed attained while the ball is in possession |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_speed_get_possession
isn't documented
* @tparam TAction type representing the action space of the environment | ||
*/ | ||
template <typename TState, typename TAction> | ||
class DQN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
treat acronyms as words so Dqn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accidental change?
ball_displacement = math.sqrt( | ||
(ball_position.x_meters - last_ball_position.x_meters) ** 2 + | ||
(ball_position.y_meters - last_ball_position.y_meters) ** 2 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the cpp distance function?
if ball_displacement > 0.2: | ||
last_ball_position = ball_position | ||
last_time_ball_moved = current_time | ||
|
||
if current_time - last_time_ball_moved > 30: | ||
stop_event.set() | ||
tscope.close() | ||
else: | ||
print(f"Ball stationary for: {current_time - last_time_ball_moved} seconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic numbers should be a constant
bazel run //software/thunderscope:thunderscope_main --copt=-O3 --jobs=4 \ | ||
-- --training_mode --enable_autoref --ci_mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is limiting the number of jobs necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some feedback
Description
WIP
Testing Done
Resolved Issues
Resolves #2514
Resolves #3083
Resolves #3219
Resolves #3156
Resolves #3233
Resolves #2930
Resolves #2643
Resolves #3082
Resolves #2134
Length Justification and Key Files to Review
Review Checklist
It is the reviewers responsibility to also make sure every item here has been covered
.h
file) should have a javadoc style comment at the start of them. For examples, see the functions defined inthunderbots/software/geom
. Similarly, all classes should have an associated Javadoc comment explaining the purpose of the class.TODO
(or similar) statements should either be completed or associated with a github issue