PAC-Bayesian-Offline-Meta-Reinforcement-Learning

Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them. The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.

This repository provides the implementation of [PBOMAC].

Simulation Environment

To ensure the code runs, you first need to install Pytorch and MuJoCo. Mujoco can be 150 or above. If you face any problem, please follow FOCAL steps to install.

For software dependencies, create conda environment by

conda env create -f environment.yaml

Data Generation

PBOMAC requires fixed data (batch) for meta-training and meta-testing, which are generated by trained SAC behavior policies. The training samples required for comparative experiments can be found in the data folder.

If you want to get offline sample data for other environments, please cd into the src.algos folder and generating trajectories through:

python agent.py --env_name cheetah-vel --total_timesteps 2000000 --n_tasks 10

Generated data will be saved in ./data/

Getting Started

python main.py --env_config ./configs/ant-dir.json

Experiments are configured via json configuration files located in ./configs. You can complete other experiments by changing the parameters of env_config, such as cheeta-dir.json, etc. Of course, you can also manually adjust other parameters in main.py.

Contact

Please open an issue on issues tracker to report problems or to ask questions or send an email to me, outshine-J.

Reference

@inproceedings{PBOMAC,
  title={{PBOMAC}: PAC-Bayesian offline Meta-reinforcement learning},
  author={Zheng Sun and Chenheng Jing and Shangqi Guo and Lingling An},
  booktitle={Applied Intelligence},
  year={2023},
  url={https://doi.org/10.1007/s10489-023-04911-y}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
configs		configs
data		data
src		src
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAC-Bayesian-Offline-Meta-Reinforcement-Learning

Simulation Environment

Data Generation

Getting Started

Contact

Reference

About

Releases

Packages

Languages

License

outshine-J/PAC-Bayesian-Offline-Meta-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

PAC-Bayesian-Offline-Meta-Reinforcement-Learning

Simulation Environment

Data Generation

Getting Started

Contact

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages