MAVRL: Learn to Fly in Cluttered Environments with Varying Speed

1. Introduction

Many existing obstacle avoidance algorithms overlook the crucial balance between safety and agility, especially in environments of varying complexity. In our study, we introduce an obstacle avoidance pipeline based on reinforcement learning. This pipeline enables drones to adapt their flying speed according to the environmental complexity. Moreover, to improve the obstacle avoidance performance in cluttered environments, we propose a novel latent space. The latent space in this representation is explicitly trained to retain memory of previous depth map observations. Our findings confirm that varying speed leads to a superior balance of success rate and agility in cluttered environments. Additionally, our memory-augmented latent representation outperforms the latent representation commonly used in reinforcement learning. Finally, after minimal fine-tuning, we successfully deployed our network on a real drone for enhanced obstacle avoidance.

2. Installation

2.1 Install AvoidBench

Please refer to AvoidBench and check the dependency of installation. Run the following commands to setup:

# install Open3D
sudo apt update
sudo apt install git libtool build-essential cmake
git clone --recursive -b v0.9.0
cd Open3D
mkdir build
cd build
cmake ..
make -j
sudo make install
sudo apt update
sudo apt install libzmqpp-dev libopencv-dev unzip python3-catkin-tools
sudo apt install libgoogle-glog-dev protobuf-compiler ros-noetic-octomap-msgs ros-noetic-octomap-ros python3-vcstool
git clone [email protected]:tudelft/AvoidBench.git
cd AvoidBench/src/avoidbench/unity_scene/
wget -O
unzip -o
echo "export AVOIDBENCH_PATH=path_to_this_project/AvoidBench/src/avoidbench" >> ~/.bashrc

2.1 Install MAVRL

Get the mavrl ros package:

cd AvoidBench/src
git clone [email protected]:tudelft/mavrl.git

Create conda environment:

cd mavrl
conda env create -f environment.yaml

Install mav_baselines:

cd mavrl
pip install .

Install reinforcement learning environment:

conda activate mavrl
cd avoidbench/avoidlib/build
cmake ..
make -j
pip install .

3. Training

Our pipeline comprises three main components: the VAE, LSTM, and PPO. The training process is as following:

  • We begin by training a basic PPO policy, while the VAE and LSTM components are initially set to random. This foundational policy allows the drone to navigate to the target in environments without obstacles.
  • This initial policy is utilized to gather a dataset, focused primarily on capturing a multitude of depth image sequences without the concern of collisions. Subsequently, we use this dataset for the training of the VAE, bypassing the LSTM phase in this step.
  • Once the VAE is trained, we maintain the encoder in a fixed state and proceed to train the LSTM using the dataset generated by the initial policy.
  • After training both the VAE and LSTM, we freeze them and retrain the PPO, adapting it to environments of varying complexity.

Start a terminal and run unity standalone

cd AvoidBench/src/avoidbench/unity_scene/

Start another terminal to train an initial policy:

cd AvoidBench/src/mavrl/
python --retrain 0 --train 1 --scene_id 1 # scene_id=0: indoor warehouse, scene_id=1: outdoor forest

We suggest to train around 200 iterations and use the last weight file as initial policy. Then use the initial policy to collect datasets for perception part (need to tun unity standalone first):

python --trial 1 --iter 200 --scene_id 1

where trial=1 and iter=200 means to load the weight from saved/RecurrentPPO_1/Policy/iter_00200.pth. Set different --scene_id to get both indoor and outdoor data. Train Variational AutoEncoder (VAE) (This traning process don't need unity standalone):


Make sure you have built a folder exp_vae in mavrl. You can also download the VAE weight file from here that we already trained. Then load VAE and train LSTM (This traning process don't need unity standalone):

python --trial 1 --iter 200 --recon 1 1 0 --lstm_exp LSTM_110_0

where trial=1 and iter=200 means to load the weight from saved/RecurrentPPO_1/Policy/iter_00200.pth. The argument recon deicde if reconstruct past, current, and future depth or not. --recon 1 1 0 means reconstructing past and current depth. lstm_exp defines the output folder name of LSTM training. Load VAE and LSTM training result and retrain the policy (need to tun unity standalone first):

python --retrain 1 --trial 1 --iter 1950 --scene_id 1 --nocontrol 1

where trial=1 and iter=1950 means to load the weight from saved/RecurrentPPO_1/Policy/iter_01950.pth, make sure to change the output folder name LSTM_xxx_x_0 to RecurrrntPPO_x before retrain the policy. The argument ````--nocontrol``` decide if load the initial policy or train the policy from random set policy network.

As we discussed in the article, MAVRL used a varying speed strategy, and we recommand users to train their own policy with different flight speed. Here we also provide the Intermediate training weights of initial obstacle-free policy, VAE, LSTM, and the collected datasets here. You can use these intermediate results to start training from any step.

4. Test and Benchmarking

4.1 Policy Test without dynamics and controller

Start a terminal and run unity standalone

cd AvoidBench/src/avoidbench/unity_scene/

Run the evaluation environments

python --trial 2 --iter 20 --scene_id 1

4.2 Test on AvoidBench

Before run the network for benchmarking, please mention the checkpoint which you trained before in the config file config.yaml

  input_update_freq: 10
  use_depth: true
  velocity_frame: wf
  seq_len: 1
  goal_obs_dim: 7
  trial: 2 # if you want to load the checkpoint for the file of 'Recurrent_2'
  iter: 200 # if you want to load the checkpoint named as 'iter_00600.pth'
  pre_steps: 4

Compile AvoidBench

cd AvoidBench
catkin build

Start AvoidBench:

source devel/setup.bash
# run the launch file
roslaunch avoid_manage rotors_gazebo.launch

Open another terminal, start controller from agilicous:

source devel/setup.bash
roslaunch mavrl flying_hummingbird.launch

Open another terminal, run network inference:

source devel/setup.bash
roscd mavrl

5. Real World Implementation (code for real drone coming soon).