-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5059bf8
commit 6465169
Showing
2 changed files
with
111 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,118 @@ | ||
# Playing Atari the right way! | ||
Paper link: [pdf](https://drive.google.com/open?id=1s2jKSDQGXy4xC0-SCnFvydPdf8DaE-8g) | ||
|
||
<center> | ||
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Breakout_ram_good.gif" alt="Cover" style="width: 150px;"/> | ||
</center> | ||
<br> | ||
The simplest implementation of playing Atari games using game screen as input. Also contains code to | ||
implement visual foresight using adversarial action conditioned video prediction model (Still working on this). | ||
|
||
**Paper**: | ||
[Playing Atari the right way!, ECE6504 Project, 2017](https://drive.google.com/open?id=1s2jKSDQGXy4xC0-SCnFvydPdf8DaE-8g) | ||
|
||
|
||
## Dependencies | ||
Install virtualenv and creating a new virtual environment: | ||
|
||
pip install virtualenv | ||
virtualenv -p /usr/bin/python3 atari | ||
|
||
Install dependencies | ||
|
||
pip3 install -r requirements.txt | ||
|
||
***Notes:*** | ||
* Training the agent to play breakout at a reasonable level took about 80 hours on two p100s. | ||
Don't even think about running this on a CPU. **I would highly appreciate it if you can submit a pull request that | ||
makes training faster** (I know some of my methods suck). | ||
|
||
* The trained models can easily be used to test the performance of an agent on a CPU. | ||
|
||
## Architecture graph from Tensorboard | ||
|
||
<center> | ||
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Architectur.jpg" alt="Cartpole agent"> | ||
</center> | ||
|
||
## Training a DQN agent | ||
### Playing Cartpole using the game states as input | ||
|
||
python3 play_cartpole.py | ||
|
||
To change the hyperparameters modify `mission_control_cartpole.py`. | ||
|
||
<center> | ||
<img src="ontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/CartPole.gif" alt="Cartpole agent"> | ||
</center> | ||
|
||
**Note:** | ||
* This isn't as computationally demanding as Breakout using frames. | ||
|
||
### Playing Breakout using the game frames as input | ||
|
||
python3 play_breakout.py | ||
|
||
## Training a DQN agent | ||
To change the hyperparameters modify `mission_control_breakout.py`. | ||
|
||
## Results from training | ||
<center> | ||
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Breakout_ram_good.gif" alt="Breakout agent"> | ||
</center> | ||
|
||
|
||
## Results from training Breakout agent | ||
|
||
### Plot of the rewards obtained per episode during training | ||
|
||
<center> | ||
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/Breakout_rewards.png" alt="Breakout Reward" width="500px"> | ||
</center> | ||
|
||
### Q-value histogram after each episode | ||
|
||
<center> | ||
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/q_val_hist.jpg" alt="Breakout histo"> | ||
</center> | ||
|
||
|
||
### Max Q-values after each episode | ||
|
||
<center> | ||
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/max_q_value.jpg" alt="Breakout max Q"> | ||
</center> | ||
|
||
## Use the trained model to generate dataset | ||
|
||
python3 generate_dataset.py | ||
|
||
**Note:** | ||
|
||
* You might get some directory nt found errors (Will fix it soon) or just figure it out. | ||
|
||
## Training an action conditioned video prediction model | ||
|
||
python3 generate_model_skip.py | ||
|
||
**Note:** | ||
|
||
* This uses the adversarial action conditioned video prediction model. | ||
* Run `generate_model.py` to use the architecture from [2]. | ||
|
||
|
||
## Results from action conditioned video prediction model | ||
|
||
## | ||
<center> | ||
<img src="https://raw.githubusercontent.com/Naresh1318/Playing_Atari_the_right_way/master/README/skip.jpg" alt="Breakout agent"> | ||
</center> | ||
<br> | ||
|
||
***Note:*** | ||
* Each run generates a required tensorboard files under `./Results/<model>/<time_stamp_and_parameters>/Tensorboard` directory. | ||
* Use `tensorboard --logdir <tensorboard_dir>` to look at loss variations, rewards and a whole lot more. | ||
* Windows gives an error when `:` is used during folder naming (this is produced during the folder creation for each run). I | ||
would suggest you to remove the time stamp from `folder_name` variable in the `form_results()` function. Or, just dual boot linux! | ||
|
||
|
||
## References | ||
[1] [Human Level Control Through Deep Reinforcement Learning](https://deepmind.com/research/publications/human-level-control-through-deep-reinforcement-learning/) | ||
|
||
Work in progress! | ||
[2] [Action-Conditional Video Prediction using Deep Networks in Atari Games](https://arxiv.org/abs/1507.08750) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
gym==0.9.4 | ||
matplotlib==2.1.0 | ||
numpy==1.13.3 | ||
Pillow==4.3.0 | ||
scipy==0.19.1 | ||
tensorflow==1.3.0 | ||
tensorflow-tensorboard==0.1.8 |