For this project, we will work with the Tennis environment.
In Tennis, two agents control rackets to bounce a ball over the net. If an agent hits the ball over the net, it receives a reward of 0.1. Letting the ball touch the ground ot hitting the ball out of the court gets the agent a reward of -0.01. The goal of the two agents is to keep the ball in play. The gif above shows two agents trained used this notebook.
The observation state for each agent consists of 24 continuous variables which represent several parameters relating to the position and velocity of the ball and racket. Each agent receives its own specific observation. The agents can perform two continuous actions: moving toward or away from the net and jumping. Both actions are bound within the interval [-1, 1].
The task is episodic, with episodes beginning when the ball is dropped into the playing field and ending when it hits the ground or gets knocked out of bounds. The task is considered solved when an average score of +0.5 over 100 consecutive episodes is achieved. More concretely, at the end of each episode each agent receives a reward, meaning we likely have 2 different scores. You sould take the maximum of those 2 rewards to get the task score for that episode and it is this value which you average over 100 episodes to check if the task has been solved.
First, reproduce my environment by following the instructions here.
Then, clone this repository.
First download the Unity environment from whichever link below matches your operating system:
Then, unzip the downloaded file into the same directory as the cloned repo.
- PyTorch 1.4.0
- Numpy 1.18.1
- Matplotlib 3.1.3
This repo has a Training Notebook with all the code to train the agent, a Report explaining the approach used and Checkpoints to load a trained agent.
After solving the dependencies and setting up the project, open the Training Notebook and:
- Run 'Imports' cell;
- Run 'Unity environment & info' to instantiate the environment and retrieve necessary variables;
- Run/edit 'Hyperparameters' cell. Defaults work fine;
- Run every cell under 'DDPG agents' to define networks, their architectures and methods, as well as the Replay Buffer;
- Run cells 'Instantiate agent' and 'Define training function'
- Finally, run cell 'Train & save checkpoint to path', replacing the default value of path with your chosen filename prefix for the model checkpoints.
- Watch a trained agent:
- If you trained a new agent using the notebook, skip to cell 'Watch the agent'
- Alternatively, run the cell 'Load model' setting the variable appropriately to load parameters for either mine or your pre-trained model, then run cell 'Watch the agent'.