MaxInfoRL: Boosting exploration in RL through information gain maximization

A jax implementation of MaxInfoRL, a simple, flexible, and scalable class of reinforcement learning algorithms that enhance exploration in RL by automatically combining intrinsic and extrinsic rewards. For a Pytorch implementation, visit this Pytorch repository.

To learn more:

MaxInfoRL

MaxInfoRL boosts exploration in RL by combining extrinsic rewards with intrinsic exploration bonuses derived from information gain of the underlying MDP. MaxInfoRL naturally trades off maximization of the value function with that of the entropy over states, rewards, and actions. MaxInfoRL is very general and can be combined with a variety of off-policy model-free RL methods for continuous state-action spaces. We provide implementations of MaxInfoSac, MaxInfoREDQ, MaxInfoDrQ, MaxInfoDrQv2.

Instructions

Installation

pip install -e .

Remark:

The above command does not install the GPU version of JAX. Please manually install the GPU version if needed. For instance using

pip install -U "jax[cuda12]"

To run HumanoidBench experiments, please install the benchmark dependencies following the instructions in the original repo.

Training

Training script:

State-based

python examples/state_based/experiment.py \
  --project_name maxinforl \
  --entity_name wandb_entity_name \
  --alg_name maxinfosac \
  --env_name cartpole-swingup_sparse \
  --wandb_log 1

For the state based experiments you can run sac, redq, maxinfosac or maxinforedq by specifying the alg_name flag.

Vision-based

python examples/vision_based/experiment.py \
  --project_name maxinforl \
  --entity_name wandb_entity_name \
  --alg_name maxinfodrq \
  --env_name cartpole-swingup_sparse \
  --wandb_log 1

For the vision based experiments you can run drq, drqv2, maxinfodrq or maxinfodrqv2 by specifying the alg_name flag.

All hyperparameters are listed in the examples/state_based//configs.yaml and examples/vision_based//configs.yaml files. You can override them if needed.

Custom environments

This repo relies on jaxrl to load environments, natively supporting Gym and DM Control environments. If your environment is registered in Gym, you can directly use it (just adjust the configs.yaml file accordingly).

Citation

If you find MaxInfoRL useful for your research, please cite this work:

@article{sukhija2024maxinforl,
  title={MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization},
  author={Sukhija, Bhavya and Coros, Stelian and Krause, Andreas and Abbeel, Pieter and Sferrazza, Carmelo},
  journal={arXiv preprint arXiv:2412.12098},
  year={2024}
}

References

This codebase contains some files adapted from other sources:

jaxrl (original repo): https://github.com/ikostrikov/jaxrl/tree/main
jaxrl (fork): https://github.com/sukhijab/jaxrl

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
maxinforl_jax		maxinforl_jax
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaxInfoRL: Boosting exploration in RL through information gain maximization

MaxInfoRL

Instructions

Installation

Remark:

Training

Custom environments

Citation

References

About

Releases

Packages

Contributors 2

Languages

License

sukhijab/maxinforl_jax

Folders and files

Latest commit

History

Repository files navigation

MaxInfoRL: Boosting exploration in RL through information gain maximization

MaxInfoRL

Instructions

Installation

Remark:

Training

Custom environments

Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages