Frozen Lake

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
frozen_lake.py		frozen_lake.py
frozen_lake_monte_carlo.py		frozen_lake_monte_carlo.py
frozen_lake_q_lambda_learning.py		frozen_lake_q_lambda_learning.py
frozen_lake_q_learning.py		frozen_lake_q_learning.py
vis_1.png		vis_1.png
vis_2.png		vis_2.png
vis_3.png		vis_3.png
vis_4.png		vis_4.png
vis_5.png		vis_5.png

README.md

Frozen Lake - Description

I've made a simple game, in order to test 3 important algorithms:

Monte-Carlo
Q Learning
Q(lambda) Learning

In this game, an agent begins at the starting position on a frozen lake. The agent can move freely on the lake (UP, DOWN, LEFT, RIGHT). The goal is to reach one of the terminal states (Goals). In order to make it harder, I have added some holes on the lake. If the agent falls on a hole, he loses the game.

The agent deploys an Exploration-Exploitation strategy in order to learn as fast as possible. The agent uses the modified epsilon greedy strategy in order to take an action at each step of the game.

Modified Epsilon Greedy

In each step, the agent picks a random number p between (0.0, 1.0).

If p < epsilon, then the agent chooses a random action:
  action = random( Q[state] )
else:
  action = Q'[state]

The epsilon is a constant set to 0.1 by default. However, if the modified epsilon greedy is used, then epsilon is a parameter, which starts with a high value of 1.0, so that the agent explores the enviroment enough times to find the best path. In every episode, the epsilon is reduced exponentially:

(e = 1/i)

where i is the i-th episode.

Q[s] is the Q Function of the 4 states: Q[state][UP], Q[state][DOWN], Q[state][LEFT], Q[state][RIGHT] and Q[state'] is the highest value of the current state's actions.

Quality Function

The Q (Quality) Function is a value function, which is used by the agent, in order to decide what's the best action to make.

So, the Q function is an array of shape (num_of_states, num_of_actions).

Note: In monte-carlo updates, I've used a simple value function, instead of Q, which is an array of (num_of_states) cells, because in monte-carlo we are not interested in the remembering the best actions.

Comparisons & Results

As you can see in plots above, all algorithms are pretty solid and converge within the first 10-20 episodes.

Future Work

Add more Environments
Add Sarsa algorithm
Rewrite the entire scripts, in order to be more clear for the viewers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

Frozen Lake

Frozen Lake

README.md

Frozen Lake - Description

Modified Epsilon Greedy

Quality Function

Comparisons & Results

Future Work

Files

Frozen Lake

Directory actions

More options

Directory actions

More options

Latest commit

History

Frozen Lake

Folders and files

parent directory

README.md

Frozen Lake - Description

Modified Epsilon Greedy

Quality Function

Comparisons & Results

Future Work