Paper Β· Website Β· Code Β· Tutorial (under active development) Β· Issue
This benchmark aims to advance robust reinforcement learning (RL) for real-world applications and domain adaptation. The benchmark provides a comprehensive set of tasks that cover various robustness requirements in the face of uncertainty on state, action, reward and environmental dynamics, and span diverse applications including control, robot manipulations, dexterous hand, and so on (This repository is under actively development. We appreciate any constructive comments and suggestions).
π₯ Benchmark Features:
- High Modularity: It is designed for flexible adaptation to a variety of research needs, featuring high modularity to support a wide range of experiments.
- Task Coverage: It provides a comprehensive set of tasks to evaluate robustness across different RL scenarios (at least 170 tasks).
- High Compatibility: It can be seamless and compatible with a wide range of existing environments.
- Support Vectorized Environments: It can be useful to enable parallel processing of multiple environments for efficient experimentation.
- Support for New Gym API: It fully supports the latest standards in Gym API, facilitating easy integration and expansion.
- LLMs Guide Robust Learning: Leverage LLMs to set robust parameters (LLMs as adversary policies).
π₯ Benchmark Tasks:
- Robust MuJoCo Tasks: Tackle complex simulations with enhanced robustness.
- Robust Box2D Tasks: Engage with 2D physics environments designed for robustness evaluation.
- Robust Robot Manipulation Tasks: Robust robotic manipulation with Kuka and Franka robots.
- Robust Safety Tasks: Prioritize safety in robustness evaluation.
- Robust Android Hand Tasks: Explore sophisticated hand manipulation challenges in robust settings.
- Robust Dexterous Tasks: Advance the robust capabilities in dexterous robotics.
- Robust Fetch Manipulation Tasks: Robust object manipulation with Fetch robots.
- Robust Robot Kitchen Tasks: Robust manipulation in Kitchen environments with robots.
- Robust Maze Tasks: Robust navigation robots.
- Robust Humanoid Robot Tasks: Humanoid robot control with robust settings.
- Robust Multi-Agent Tasks: Facilitate robust coordination among multiple agents.
Each of these robust tasks incorporates robust elements such as robust observations, actions, reward signals, and dynamics to evaluate the robustness of RL algorithms.
π₯ Our Vision: We hope this benchmark serves as a useful platform for pushing the boundaries of RL in real-world problems --- promoting robustness and domain adaptation ability!
Any suggestions and issues are welcome. If you have any questions, please propose an issue or pull request, or contact us directly via email at [email protected]; we will respond to you in one week.
Content
- Introduction
- Environments and Tasks
- Disruptor Module for Perturbations
- Tutorials
- Selected Demos
- Citation
- Acknowledgments
Reinforcement learning (RL) problems is formulated as that an agent seeks a policy that optimizes the long-term expected return through interacting with an environment. While standard RL has been heavily investigated recently, its use can be significantly hampered in practice due to noise, malicious attacks, the sim-to-real gap, domain generalization requirements, or even a combination of those and more factors. Consequently, in addition to maximizing the cumulative rewards, robustness to unexpected uncertainty/perturbation emerges as another critical goal for RL, especially in high-stakes applications such as robotics, financial investments, autonomous driving, and so on. This leads to a surge of considerations of more robust RL algorithms for different problems, termed as robust RL, including but not limited to single-agent RL, safe RL, and multi-agent RL.
Robust RL problems typically consists of three modules
-
An agent (a policy): tries to learn a strategy
$\pi$ (a policy) based on the observation from the environment to achieve optimal long-term return -
An environment/task: a task that determine the agents' immediate reward
$r(\cdot |s,a)$ and the physical or logical dynamics (transition function$P_t( \cdot | s,a)$ ) - The disruptor module: represents the uncertainty/perturbation events that happens during any parts of the interaction process between the agent and environment, with different modes, sources, and frequencies.
We illustrate the framework of robust RL for single-agent problems for instance:
This benchmark support various 1) environments/tasks and 2) disruptors οΌperturbations to the interaction process). This allows users to design and evaluate different algorithms in different application scenarios when encountering diverse uncertainty issues. Switch to the sections below if you want to get a quick glance of which environments and perturbations that Robust-Gymnasium support.
Tasks: Random, Adversary, Semantic Tasks (Robot Manipulation Tasks).
Robust MuJoCo Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward | Robust Dynamics |
---|---|---|---|---|
Ant-v2-v3-v4-v5 | β | β | β | β |
HalfCheetah-v2-v3-v4-v5 | β | β | β | β |
Hopper-v2-v3-v4-v5 | β | β | β | β |
Walker2d-v2-v3-v4-v5 | β | β | β | β |
Swimmer-v2-v3-v4-v5 | β | β | β | β |
Humanoid-v2-v3-v4-v5 | β | β | β | β |
HumanoidStandup-v2-v3-v4-v5 | β | β | β | β |
Pusher-v2-v3-v4-v5 | β | β | β | β |
Reacher-v2-v3-v4-v5 | β | β | β | β |
InvertedDoublePendulum-v2-v3-v4-v5 | β | β | β | β |
InvertedPendulum-v2-v3-v4-v5 | β | β | β | β |
Robust Boxd2d Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
CarRacing-v2 | β | β | β |
LunarLanderContinuous-v3 | β | β | β |
BipedalWalker-v3 | β | β | β |
LunarLander-v3 (Discrete Task) | β | β | β |
Robust Robot Manipulation Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
RobustLift | β | β | β |
RobustDoor | β | β | β |
RobustNutAssembly | β | β | β |
RobustPickPlace | β | β | β |
RobustStack | β | β | β |
RobustWipe | β | β | β |
RobustToolHang | β | β | β |
RobustTwoArmLift | β | β | β |
RobustTwoArmPegInHole | β | β | β |
RobustTwoArmHandover | β | β | β |
RobustTwoArmTransport | β | β | β |
MultiRobustDoor | β | β | β |
Robust Safety Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
RobustSafetyAnt-v4 | β | β | β |
RobustSafetyHalfCheetah-v4 | β | β | β |
RobustSafetyHopper-v4 | β | β | β |
RobustSafetyWalker2d-v4 | β | β | β |
RobustSafetySwimmer-v4 | β | β | β |
RobustSafetyHumanoid-v4 | β | β | β |
RobustSafetyHumanoidStandup-v4 | β | β | β |
RobustSafetyPusher-v4 | β | β | β |
RobustSafetyReacher-v4 | β | β | β |
Robust Androit Hand Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
RobustAdroitHandDoor-v1 | β | β | β |
RobustAdroitHandHammer-v1 | β | β | β |
RobustAdroitHandPen-v1 | β | β | β |
RobustAdroitHandRelocate-v1 | β | β | β |
Robust Dexterous Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
RobustHandManipulateEgg_BooleanTouchSensors-v1 | β | β | β |
RobustHandReach-v2 | β | β | β |
RobustHandManipulateBlock-v1 | β | β | β |
RobustHandManipulateEgg-v1 | β | β | β |
RobustHandManipulatePen-v1 | β | β | β |
Robust Fetch Manipulation Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
RobustFetchPush-v3 | β | β | β |
RobustFetchReach-v3 | β | β | β |
RobustFetchSlide-v3 | β | β | β |
RobustFetchPickAndPlace-v3 | β | β | β |
Robust Robot Kitchen Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
FrankaKitchen-v1 | β | β | β |
Robust Maze Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
AntMaze_UMaze-v4 | β | β | β |
PointMaze_UMaze-v3 | β | β | β |
Robust Multi-Agent Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
MA-Ant-2x4, 2x4d, 4x2, 4x1 | β | β | β |
MA-HalfCheetah-2x3, 6x1 | β | β | β |
MA-Hopper-3x1 | β | β | β |
MA-Walker2d-2x3 | β | β | β |
MA-Swimmer-2x1 | β | β | β |
MA-Humanoid-9|8 | β | β | β |
MA-HumanoidStandup-v4 | β | β | β |
MA-Pusher-3p | β | β | β |
MA-Reacher-2x1 | β | β | β |
Many-MA-Swimmer-10x2, 5x4, 6x1, 1x2 | β | β | β |
Many-MA-Ant-2x3, 3x1 | β | β | β |
CoupledHalfCheetah-p1p | β | β | β |
Robust Humanoid Tasks
Tasks\Robust type | Robust State | Robust Action | Robust Reward |
---|---|---|---|
Robusth1hand-reach-v0 | β | β | β |
Robusth1hand-push-v0 | β | β | β |
h1hand-truck-v0 | β | β | β |
Robusth1hand-slide-v0 | β | β | β |
Before introducing the disruptor module, we recall that RL problem can be formulated as a process involving several key concepts: an agent, state, action, reward, and an environment. Specifically, at each time
Considering this, in this benchmark, we consider extensive potential uncertainty/disturbance/generalizable events that happen in this process (including both training and testing phases) during any places, with any modes, and at any time, summarized in the following table.
Perturbation modes\sources | Observed state | Observed reward | Action | Environment/task |
---|---|---|---|---|
Random | β | β | β | β |
Adversarial | β | \ | β | \ |
Set arbitrarily | \ | \ | \ | β |
Semantic Domain shift | \ | \ | \ | β |
Those perturbation events can be generally categorized from three different perspectives:
-
Sources: which component is perturbed/attacked.
-
Agent's observed state: The agent observes a noisy/attacked 'state'
$\widetilde{s}_t$ (diverge from the real state$s_t$ ) and use it as the input of its policy to determine the action. -
Agent's observed reward: The agent observes a noisy/attacked 'reward'
$\widetilde{r}_t$ (differ from the real immediate reward ($r_t$ ) obtained from the environment) and construct their policy according to it. -
Action: The action
$a_t$ chosen by the agent is contaminated before sent to the environment. Namely, a perturbed action$\widetilde{a}_t$ serves as the input of the environment for the next step. -
Environment: an environment includes both immediate reward function
$r$ and dynamic function$P_t$ . An agent may interact with a shifted or unstationary environment.
-
Agent's observed state: The agent observes a noisy/attacked 'state'
-
Modes: what kind of perturbation is imposed on.
- Random: the nominal variable will be added by some random noise following some distributions, such as Gaussian, or uniform distribution. This mode can be used to all perturbation sources.
- Adversarial: an adversarial attacker will choose the perturbed output within some admissible set to degrade the agent's performance. This mode can be used to the perturbations towards observation and action.
- Set arbitrarily: An environment can be set to any fixed one within some pre-scribed uncertainty set of the environments.
- Semantic-domain-shifted: We offer some partially-similar environment/tasks while with some semantic diversity (such as different goals) for domain generalization or transfer learning tasks.
- Frequency: when does the perturbation happen. Viewed through the lens of time, the perturbations can happen at different period during training and testing process, even with different frequency. We provide interactive modes that support step-wise varying interaction between disruptors, agents, and environments. So the user can choose to apply perturbations at any point in the dimension of time in any way.
Not all environments support all kinds of disruptors (perturbations). Please refer to the above section (Environments and Tasks) for more information.
Here, we provide a step-by-step tutorial for users to create and use a domain-shifted/noisy task by choosing any environment/task combined with any uncertainty factor to perturb some original environment, see the link.
-
Create an environment (requires Conda installation): We are currently developing our environments on an Ubuntu system. The operating system version in our server is 20.04.3 LTS.
Use the following command to create a new Conda environment named
robustgymnasium
with Python 3.11:conda create -n robustgymnasium python=3.11
Activate the newly created environment:
conda activate robustgymnasium
-
Install dependency packages:
Install the necessary packages using pip. Make sure you are in the project directory where the
setup.py
file is located:pip install -r requirements.txt pip install -e .
To run the tests, navigate to the examples
directory and Test. te the test script, e.g.,
cd examples/robust_action/mujoco/
chmod +x test.sh
./test.sh
Ensure you follow these steps to set up and test the environment properly. Adjust paths and versions as necessary based on your specific setup requirements.
If you met some issues, please check the existing solutions for the reported issues, which could help you address your issue.
These demonstrations are from version 4 of the MuJoCo tasks with robust settings.
These demonstrations are Robust MuJoCo variant tasks with robust settings.
These demonstrations are from robot manipulation tasks with robust settings.
These demonstrations are from dexterous hand and maze tasks with robust settings.
If you find the repository useful, please cite the study
@article{robustrl2024,
title={Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning},
author={Gu, Shangding and Shi, Laixi and Wen, Muning and Jin, Ming and Mazumdar, Eric and Chi, Yuejie and Wierman, Adam and Spanos, Costas},
journal={Github},
year={2024}
}
We thank the contributors from MuJoCo, Gymnasium, Humanoid-bench and Robosuite.