diff --git a/content/appendix/_appendix.tex b/content/appendix/_appendix.tex index f39f750..749cba2 100644 --- a/content/appendix/_appendix.tex +++ b/content/appendix/_appendix.tex @@ -9,8 +9,11 @@ \chapter*{Appendices} \renewcommand\thefigure{\thesection.\arabic{figure}} \input{content/appendix/joint_trajectory_controller} +\newpage \input{content/appendix/hyperparameters} +\newpage \input{content/appendix/camera_pose_calibration} \newpage \input{content/appendix/camera_configuration_and_postprocessing} +\newpage \input{content/appendix/feature_extraction_from_rgb_and_rgbd_observations} diff --git a/content/introduction.tex b/content/introduction.tex index 8d77f7e..4bc3682 100644 --- a/content/introduction.tex +++ b/content/introduction.tex @@ -16,7 +16,7 @@ \chapter{Introduction} The primary focus of this work is to apply DRL to robotic grasping of diverse objects with the use of compact 3D observations in form of octrees. The key contributions are listed below. \begin{itemize} - \item \textbf{Simulated Environment for Grasping with Domain Randomization} -- A novel simulated environment for robotic grasping in the context of RL research is developed in this work. It utilises realistic 3D scanned objects and extensive domain randomization in order to enable sim-to-real transfer. The environment is developed on top of Ignition Gazebo\footnote{\href{https://ignitionrobotics.org}{https://ignitionrobotics.org}} robotics simulator that is interfaced by the use of Gym-Ignition \cite{ferigo_gym-ignition_2020} to provide compatibility with other OpenAI Gym environments \cite{brockman_openai_2016}. + \item \textbf{Simulation Environment for Grasping with Domain Randomization} -- A novel simulation environment for robotic grasping in the context of RL research is developed in this work. It utilises realistic 3D scanned objects and domain randomization in order to enable sim-to-real transfer. The environment is developed on top of Ignition Gazebo\footnote{\href{https://ignitionrobotics.org}{https://ignitionrobotics.org}} robotics simulator that is interfaced by the use of Gym-Ignition \cite{ferigo_gym-ignition_2020} to provide compatibility with other OpenAI Gym environments \cite{brockman_openai_2016}. \item \textbf{Octree Observations for End-to-End Grasping with DRL} -- This work introduces a novel approach for utilising octree-based visual observations for end-to-end robotic grasping with DRL. Octrees provide an efficient 3D data representation with a regular structure that enables the use of 3D convolutions to extract spatial features. Furthermore, the use of 3D representation promotes invariance to camera pose, which further improves sim-to-real transfer to various real-world setups. \item \textbf{Invariance to Robots} -- The same combination of RL algorithm, observations and hyperparameters can be used to train robots with different kinematic chains and gripper designs. Furthermore, transfer of a policy trained on one robot to another is also investigated in addition to evaluating the sim-to-real transfer to a real robot. \item \textbf{Comparison of Three Actor-Critic RL Algorithms} -- Three off-policy actor-critic RL algorithms are compared on the developed grasping environment with the proposed octree observations. The compared algorithms are TD3, SAC and TQC. diff --git a/content/preamble/summary.tex b/content/preamble/summary.tex index 510889f..1b54361 100644 --- a/content/preamble/summary.tex +++ b/content/preamble/summary.tex @@ -1,2 +1,12 @@ \chapter*{Summary} \addcontentsline{toc}{chapter}{Summary} + +In this work, deep reinforcement learning is applied for the task of vision-based robotic grasping with focus on generalisation to diverse objects in varying scenes. Model-free reinforcement learning is employed to learn an end-to-end policy that directly maps visual observations to continuous actions in Cartesian space. For observations, octrees are utilised in a novel approach to provide an efficient representation of the 3D scene. In order to allow agent to generalise over spatial positions and orientations, a 3D convolutional neural network is designed to extract abstract features. An agent is then trained by combining such feature extractor with off-policy actor-critic reinforcement learning algorithms. + +As training of robotics agents in real world is expensive and potentially unsafe, a new simulation environment for robotic grasping is created. This environment is developed on top of open-source Ignition Gazebo robotics simulator in order to provide high-fidelity physics and photorealistic rendering. Sim-to-real transfer of a learned policy is made possible by combining a dataset of realistic 3D scanned objects and textures with domain randomisation. Among others, this includes randomising the pose of a virtual RGB-D camera with aim to simplify the transfer of a simulated setup to real-world domain. + +Results of experimental evaluation indicate that deep reinforcement learning can be applied to learn an end-to-end policy with octree-based observations, while providing noteworthy advantages over traditionally used RGB and RGB-D images. On novel scenes with static camera pose, agent with octree observations is able to reach a success rate of~81.5\%, whereas agent with RGB-D observations and analogous feature extractor achieves~59\%. However, the advantage of 3D observations emerges with invariance to camera pose, where both RGB and RGB-D observations struggle to learn a policy while octrees still retain a success rate of~77\%. + +The same policy can be successfully transferred to a real robot without any need for retraining. On scenes with previously unseen everyday objects, a policy trained solely inside simulation can achieve success rate of~68.3\%. The invariance to camera pose enables a simple transfer without requiring the real-world setup to match its digital counterpart. In some cases, octree-based observations furthermore allow transfer of a policy trained on one robot to another with different gripper design and kinematic chain, while achieving almost identical performance to a policy that was trained on the target robot. + +Besides the aforementioned experiments, this work compares actor-critic algorithms TD3, SAC and TQC for continuous control, and studies benefits of several ablations and configurations such as the use of demonstrations, curriculum learning and proprioceptive observations. diff --git a/master_thesis.pdf b/master_thesis.pdf index 23a305c..fbacd36 100644 Binary files a/master_thesis.pdf and b/master_thesis.pdf differ diff --git a/master_thesis.tex b/master_thesis.tex index ca3f590..d3c66e9 100644 --- a/master_thesis.tex +++ b/master_thesis.tex @@ -34,8 +34,8 @@ %%% Include the main content \include{content/_content} -%%% Include bibliography, use single line spacing for compactness -\singlespacing +%%% Include bibliography +\setstretch{1.175} \bibliography{bibliography/bibliography} \setstretch{\blocklinespacing}