Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Naresh1318 committed Dec 18, 2017
1 parent 5059bf8 commit 6465169
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 5 deletions.
109 changes: 104 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,118 @@
# Playing Atari the right way!
Paper link: [pdf](https://drive.google.com/open?id=1s2jKSDQGXy4xC0-SCnFvydPdf8DaE-8g)

<center>
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Breakout_ram_good.gif" alt="Cover" style="width: 150px;"/>
</center>
<br>
The simplest implementation of playing Atari games using game screen as input. Also contains code to
implement visual foresight using adversarial action conditioned video prediction model (Still working on this).

**Paper**:
[Playing Atari the right way!, ECE6504 Project, 2017](https://drive.google.com/open?id=1s2jKSDQGXy4xC0-SCnFvydPdf8DaE-8g)


## Dependencies
Install virtualenv and creating a new virtual environment:

pip install virtualenv
virtualenv -p /usr/bin/python3 atari

Install dependencies

pip3 install -r requirements.txt

***Notes:***
* Training the agent to play breakout at a reasonable level took about 80 hours on two p100s.
Don't even think about running this on a CPU. **I would highly appreciate it if you can submit a pull request that
makes training faster** (I know some of my methods suck).

* The trained models can easily be used to test the performance of an agent on a CPU.

## Architecture graph from Tensorboard

<center>
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Architectur.jpg" alt="Cartpole agent">
</center>

## Training a DQN agent
### Playing Cartpole using the game states as input

python3 play_cartpole.py

To change the hyperparameters modify `mission_control_cartpole.py`.

<center>
<img src="ontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/CartPole.gif" alt="Cartpole agent">
</center>

**Note:**
* This isn't as computationally demanding as Breakout using frames.

### Playing Breakout using the game frames as input

python3 play_breakout.py

## Training a DQN agent
To change the hyperparameters modify `mission_control_breakout.py`.

## Results from training
<center>
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Breakout_ram_good.gif" alt="Breakout agent">
</center>


## Results from training Breakout agent

### Plot of the rewards obtained per episode during training

<center>
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Breakout_rewards.png" alt="Breakout Reward" width="500px">
</center>

### Q-value histogram after each episode

<center>
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/q_val_hist.jpg" alt="Breakout histo">
</center>


### Max Q-values after each episode

<center>
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/max_q_value.jpg" alt="Breakout max Q">
</center>

## Use the trained model to generate dataset

python3 generate_dataset.py

**Note:**

* You might get some directory nt found errors (Will fix it soon) or just figure it out.

## Training an action conditioned video prediction model

python3 generate_model_skip.py

**Note:**

* This uses the adversarial action conditioned video prediction model.
* Run `generate_model.py` to use the architecture from [2].


## Results from action conditioned video prediction model

##
<center>
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/skip.jpg" alt="Breakout agent">
</center>
<br>

***Note:***
* Each run generates a required tensorboard files under `./Results/<model>/<time_stamp_and_parameters>/Tensorboard` directory.
* Use `tensorboard --logdir <tensorboard_dir>` to look at loss variations, rewards and a whole lot more.
* Windows gives an error when `:` is used during folder naming (this is produced during the folder creation for each run). I
would suggest you to remove the time stamp from `folder_name` variable in the `form_results()` function. Or, just dual boot linux!


## References
[1] [Human Level Control Through Deep Reinforcement Learning](https://deepmind.com/research/publications/human-level-control-through-deep-reinforcement-learning/)

Work in progress!
[2] [Action-Conditional Video Prediction using Deep Networks in Atari Games](https://arxiv.org/abs/1507.08750)
7 changes: 7 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
gym==0.9.4
matplotlib==2.1.0
numpy==1.13.3
Pillow==4.3.0
scipy==0.19.1
tensorflow==1.3.0
tensorflow-tensorboard==0.1.8

0 comments on commit 6465169

Please sign in to comment.