Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

This is the Tensorflow implementation of IEEE Symposium Series on Computational Intelligence (SSCI) 2022 paper: Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation.

We propose a simple method that combines the generation of intrinsic rewards with self-imitation learning techniques that rank previous episodes and replay them to reproduce good exploration behaviors through imitation (behavioral cloning). We built our method on top of RAPID and evaluate the performance over hard-exploration procedurally-generated environments on MiniGrid. The results show that using intrinsic motivation techniques with self-imitation learning methods exhibits a equal or better performance and sample efficiency in comparison to execute those methods in isolation.

The implementation is based on RAPID-github-repo, which also comprises the original implementation of self-imitation-learning.

Cite This Work

@INPROCEEDINGS{10022199,
  author={Andres, Alain and Villar-Rodriguez, Esther and Del Ser, Javier},
  booktitle={2022 IEEE Symposium Series on Computational Intelligence (SSCI)}, 
  title={Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation}, 
  year={2022},
  volume={},
  number={},
  pages={890-899},
  keywords={Learning systems;Schedules;Accelerated aging;Diversity reception;Reinforcement learning;Manuals;Trajectory;Reinforcement Learning;Intrinsic Motivation;Self Imitation Learning;Sparse Rewards;Generalization},
  doi={10.1109/SSCI51031.2022.10022199}}

or

@article{andres2022towards,
  title={Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation},
  author={Andres, Alain and Villar-Rodriguez, Esther and Del Ser, Javier},
  journal={arXiv preprint arXiv:2211.16838},
  year={2022}
}

Installation

Please make sure that you have Python 3.6 installed. First, clone the repo with

git clone https://github.com/aklein1995/exploration_sil_im.git

Then install the dependencies with pip:

pip3 install -r requirements.txt

Example of use

The entry is main.py. Some important hyperparameters are as follows.

--env: what environment to be used
--frames: the number of frames/timesteps to be run
--nsteps: the time horizon selected for gathering experiences before a PPO update
--log_dir: the directory to save logs

More related to intrinsic motivation:

--im_coef: the intrinsic coefficient value (0=no IM)
--im_type: the intrinsic module/approach used to compute the intrinsic rewards
--use_ep_counts: boolean that is used to scale the generated rewards based on the episodic counts
--use_1st_counts: boolean that is used to just effectively reward the first time a state is reached in the scope of an episode

And more specifically to self-imitation-learning:

--w0: the weight of extrinsic reward score
--w1: the weight of local score
--w2: the weight of global score
--buffer_size: maximum number of experiences stored in the replay buffer
--sl_until: do the RAPID update until which timestep
--disable_rapid: use it to compare with PPO baseline or with PPO+intrinsic rewards

Reproducing the result of MiniGrid environments

For example, if we want to just analyze the results with intrinsic motivation(i.e. bebold) but without RAPID in MultiRoom with 7 rooms of size 8, run

python3 main.py --log_dir MN7S8_ent001_im0005_bb_0 --seed 0 --disable_rapid --ent_coef 0.01 --im_type bebold --im_coef 0.005  --frames 20000000 --env 'MiniGrid-MultiRoom-N7-S8-v0'

On the other hand, the same experiment with RAPID and no intrinsic rewards, run

python3 main.py --log_dir MN7S8_ent001_w0w1w2_0 --seed 0 --ent_coef 0.01 --im_coef 0 --frames 20000000 --env 'MiniGrid-MultiRoom-N7-S8-v0'

To evaluate on different environments, change --env and configure the simulation based on your preferences.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
gym_minigrid		gym_minigrid
rapid.egg-info		rapid.egg-info
rapid		rapid
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
utils_storage.py		utils_storage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Cite This Work

Installation

Example of use

Reproducing the result of MiniGrid environments

About

Releases

Packages

Languages

License

aklein1995/exploration_sil_im

Folders and files

Latest commit

History

Repository files navigation

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Cite This Work

Installation

Example of use

Reproducing the result of MiniGrid environments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages