Skip to content

outshine-J/PAC-Bayesian-Offline-Meta-Reinforcement-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PAC-Bayesian-Offline-Meta-Reinforcement-Learning

Meta-reinforcement learning (Meta-RL) utilizes shared structure among tasks to enable rapid adaptation to new tasks with only a little experience. However, most existing Meta-RL algorithms lack theoretical generalization guarantees or offer such guarantees under restrictive assumptions (e.g., strong assumptions on the data distribution). This paper for the first time conducts a theoretical analysis for estimating the generalization performance of the Meta-RL learner using the PAC-Bayesian theory. The application of PAC-Bayesian theory to Meta-RL poses a challenge due to the existence of dependencies in the training data, which renders the independent and identically distributed (i.i.d.) assumption invalid. To address this challenge, we propose a dependency graph-based offline decomposition (DGOD) approach, which decomposes non-i.i.d. Meta-RL data into multiple offline i.i.d. datasets by utilizing the techniques of offline sampling and graph decomposition. With the DGOD approach, we derive the practical PAC-Bayesian offline Meta-RL generalization bounds and design an algorithm with generalization guarantees to optimize them. The results of experiments conducted on several challenging Meta-RL benchmarks demonstrate that our algorithm performs well in avoiding meta-overfitting and outperforms recent state-of-the-art Meta-RL algorithms without generalization bounds.

This repository provides the implementation of [PBOMAC].

Simulation Environment

To ensure the code runs, you first need to install Pytorch and MuJoCo. Mujoco can be 150 or above. If you face any problem, please follow FOCAL steps to install.

For software dependencies, create conda environment by

conda env create -f environment.yaml

Data Generation

PBOMAC requires fixed data (batch) for meta-training and meta-testing, which are generated by trained SAC behavior policies. The training samples required for comparative experiments can be found in the data folder.

If you want to get offline sample data for other environments, please cd into the src.algos folder and generating trajectories through:

python agent.py --env_name cheetah-vel --total_timesteps 2000000 --n_tasks 10

Generated data will be saved in ./data/

Getting Started

python main.py --env_config ./configs/ant-dir.json

Experiments are configured via json configuration files located in ./configs. You can complete other experiments by changing the parameters of env_config, such as cheeta-dir.json, etc. Of course, you can also manually adjust other parameters in main.py.

Contact

Please open an issue on issues tracker to report problems or to ask questions or send an email to me, outshine-J.

Reference

@inproceedings{PBOMAC,
  title={{PBOMAC}: PAC-Bayesian offline Meta-reinforcement learning},
  author={Zheng Sun and Chenheng Jing and Shangqi Guo and Lingling An},
  booktitle={Applied Intelligence},
  year={2023},
  url={https://doi.org/10.1007/s10489-023-04911-y}
}

About

an offline meta-learning algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages