Skip to content

Commit

Permalink
Review (pt. 1)
Browse files Browse the repository at this point in the history
Signed-off-by: Andrej Orsula <[email protected]>
  • Loading branch information
AndrejOrsula committed Jun 2, 2021
1 parent f9b211d commit 308189f
Show file tree
Hide file tree
Showing 12 changed files with 82 additions and 86 deletions.
5 changes: 1 addition & 4 deletions _frontmatter/titlepage.tex
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,5 @@
\end{tabular}
\capstarttrue%
\vfill
\makeatletter
\def\blfootnote{\gdef\@thefnmark{}\@footnotetext}
\makeatother
\blfootnote{\href{mailto:\thesisauthormail}{{\includegraphics[height=6pt]{_misc/email_logo.pdf}}~\thesisauthormail}}
{\noindent\tiny\href{mailto:\thesisauthormail}{{\includegraphics[height=6pt]{_misc/email_logo.pdf}}~\thesisauthormail}}
\cleardoublepage
2 changes: 1 addition & 1 deletion _style/_style.tex
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
\usepackage{framed}
\usepackage{geometry}
\usepackage[dvips]{graphicx}
\usepackage{hyperref}
\usepackage[hyperfootnotes=false]{hyperref}
\usepackage[all]{hypcap}
\usepackage[utf8]{inputenc}
\usepackage{lastpage}
Expand Down
1 change: 0 additions & 1 deletion content/appendix/_appendix.tex
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,4 @@ \chapter*{Appendices}
\input{content/appendix/camera_pose_calibration}
\newpage
\input{content/appendix/camera_configuration_and_postprocessing}
\newpage
\input{content/appendix/feature_extraction_from_rgb_and_rgbd_observations}
4 changes: 2 additions & 2 deletions content/appendix/camera_configuration_and_postprocessing.tex
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
\section{Camera Configuration and Post-Processing}\label{app:camera_configuration_and_postprocessing}

In order to improve success of sim-to-real transfer, the quality of visual observations is of great importance. However, the default configuration of the utilised D435 camera produced a very noisy depth map with many holes. Primary reason for this is the utilised workspace setup that consisted of a reflective surface inside a laboratory with large amount of ambient illumination. Not only does the polished metallic surface of the workspace result in a specular reflection of ceiling lights, the pattern projected by the laser emitter of the camera is completely reflected. Lack of such pattern results in limited material texture of the surface, which further decreases the attainable depth quality.
In order to improve success of sim-to-real transfer, the quality of visual observations is of great importance. However, the default configuration of the utilised D435 camera produces a very noisy depth map with many holes. Primary reason for this is the utilised workspace setup that consisted of a reflective surface inside a laboratory with large amount of ambient illumination. Not only does the smooth metallic surface of the workspace result in a specular reflection of ceiling lights, but the pattern projected by the laser emitter of the camera is completely reflected. Lack of such pattern results in limited material texture of the surface, which further decreases the attainable depth quality.

To improve quality of the raw depth map, few steps are taken. First, automatic expose of the camera's IR sensors is configured for a region of interest that covers only the workspace. This significantly reduces hot-spot clipping caused by the specular reflection, which in turn decreases the amount of holes. To mitigate noise, spatial and temporal filters are applied to the depth image. In order to achieve best results, these filters are applied to a corresponding disparity map with a high resolution of~1280~\(\times\)720~px at~30~FPS. Furthermore, the depth map is clipped only to depth rage of interest in order to reduce computational load. Once filtered, the image is decimated to a more manageable resolution of~320~\(\times\)180~px and converted to a point cloud, which can then be converted to an octree. Post-processed point cloud can be seen in \autoref{app_fig:camera_config_and_post_processing}.
To improve quality of the raw depth map, few steps are taken. First, automatic expose of the camera's IR sensors is configured for a region of interest that covers only the workspace. This significantly reduces hot-spot clipping caused by the specular reflection, which in turn decreases the amount of holes. To mitigate noise, spatial and temporal filters are applied to the depth image. In order to achieve best results, these filters are applied to a corresponding disparity map with a high resolution of~1280~\(\times\)720~px at~30~FPS. Furthermore, the depth map is clipped only to the range of interest in order to reduce computational load. Once filtered, the image is decimated to a more manageable resolution of~320~\(\times\)180~px and converted to a point cloud, which can then be converted to an octree. Post-processed point cloud can be seen in \autoref{app_fig:camera_config_and_post_processing}.

\setcounter{figure}{0}
\begin{figure}[ht]
Expand Down
2 changes: 1 addition & 1 deletion content/appendix/camera_pose_calibration.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
\section{Camera Pose Calibration}\label{app:camera_pose_calibration}

For evaluation of sim-to-real transfer, the camera pose is calibrated with respect to the robot base frame. For this, a calibration board with ArUcO markers \cite{garrido-jurado_automatic_2014} is used as an intermediate reference. \autoref{app_fig:calibration_setup} shows the utilised setup. Position of this intermediate reference is first found in the robot coordinate system by positioning robot's tool centre point above origin of the calibration board, and using robot's joint encoders together with forward kinematics. Hereafter, ArUcO pattern is detected from RGB images of the utilised camera. The perceived pixel positions of the pattern are then used with its known design to solve perspective-n-point problem and determine camera pose with respect to the pattern. Once known, pose of the camera is determined with respect to the robot and the calibration board is removed from the scene.
For evaluation of sim-to-real transfer, the camera pose is calibrated with respect to the robot base frame. For this, a calibration board with ArUcO markers \cite{garrido-jurado_automatic_2014} is used as an intermediate reference. \autoref{app_fig:calibration_setup} shows the utilised setup. Position of this intermediate reference is first found in the robot coordinate system by positioning robot's tool centre point above origin of the calibration board, and using robot's joint encoders together with forward kinematics. Hereafter, ArUcO pattern is detected from RGB images of the utilised camera. The perceived pixel positions of the pattern are then used with its known design to solve a perspective-n-point problem and determine camera pose with respect to the pattern. Once known, pose of the camera is determined with respect to the robot and the calibration board is removed from the scene.

\setcounter{figure}{0}
\begin{figure}[ht]
Expand Down
6 changes: 3 additions & 3 deletions content/background.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ \chapter{Background}\label{ch:background}

\section{Markov Decision Process}

The goal of RL agent is to maximize the total reward that is accumulated during a sequential interaction with the environment. This paradigm can be expressed with a classical formulation of Markov decision process (MDP), where \autoref{fig:bg_mdp_loop} illustrates its basic interaction loop. In MDPs, actions of agent within the environment make it traverse different states and receive corresponding rewards. MDP is an extension of Markov chains, with an addition that agents are allowed to select the actions they execute. Both of these satisfy the Markov property, which assumes that each state is only dependent on the previous state, i.e.~a memoryless property where each state contains all information that is necessary to predict the next state. Therefore, MDP formulation is commonly used within the context of RL because it captures a variety of tasks that general-purpose RL algorithms can be applied to, including robotic manipulation tasks.
The goal of RL agent is to maximise the total reward that is accumulated during a sequential interaction with the environment. This paradigm can be expressed with a classical formulation of Markov decision process (MDP), where \autoref{fig:bg_mdp_loop} illustrates its basic interaction loop. In MDPs, actions of agent within the environment make it traverse different states and receive corresponding rewards. MDP is an extension of Markov chains, with an addition that agents are allowed to select the actions they execute. Both of these satisfy the Markov property, which assumes that each state is only dependent on the previous state, i.e.~a memoryless property where each state contains all information that is necessary to predict the next state. Therefore, MDP formulation is commonly used within the context of RL because it captures a variety of tasks that general-purpose RL algorithms can be applied to, including robotic manipulation tasks.

It should be noted that partially observable Markov decision process (POMDP) is a more accurate characterisation of most robotics tasks because the states are commonly unobservable or only partially observable, however, the difficulty of solving POMDPs limits their usage \cite{kroemer_review_2021}. Therefore, this chapter presents only on MDPs where observations and states are considered to be the same.

Expand Down Expand Up @@ -85,7 +85,7 @@ \subsection{Value-Based Methods}\label{subsec:bg_value_based_methods}

\subsection{Policy-Based Methods}

Instead of determining actions based on their value, policy-based methods directly optimize a stochastic policy~\(\pi\) as a probability distribution~\(\pi(a \vert s, \theta)\) that is parameterised by~\(\theta\).
Instead of determining actions based on their value, policy-based methods directly optimise a stochastic policy~\(\pi\) as a probability distribution~\(\pi(a \vert s, \theta)\) that is parameterised by~\(\theta\).
\begin{equation}
\pi(a \vert s, \theta) = \Pr\{A_{t}{=}a \vert S_{t}{=}s, \theta_{t}{=}\theta \}
\end{equation}
Expand All @@ -98,7 +98,7 @@ \subsection{Actor-Critic Methods}

In contrast to value- and policy-based methods as the two primary categories, actor-critic methods include algorithms that utilise both a parameterised policy, i.e.~actor, and a value function, critic. This is achieved by using separate networks, where the actor and critic can sometimes share some common parameters. Such combination allows actor-critic algorithms to simultaneously possess advantages of both approaches such as sample efficiency and continuous action space. Therefore, these properties have made actor-critic methods popular for robotic manipulation while achieving state of the art performance among other RL approaches in this domain.

Similar to policy-based methods, the actor network learns the probability of selecting a specific action~\(a\) in a given state~\(s\) as~\(\pi(a \vert s, \theta)\). The critic network estimates action-value function~\(Q(s, a)\) by minimising TD error~\(\delta_{t}\) via \autoref{eq:q_learning}, which is used to critique the actor based on how good the selected action is. This process is visualized in \autoref{fig:bg_actor_critic_loop}. It is however argued that the co-dependence of each other's output distribution can result in instability during learning and make them difficult to tune \cite{quillen_deep_2018}. Despite of this, actor-critic model-free RL algorithms are utilised in this work.
Similar to policy-based methods, the actor network learns the probability of selecting a specific action~\(a\) in a given state~\(s\) as~\(\pi(a \vert s, \theta)\). The critic network estimates action-value function~\(Q(s, a)\) by minimising TD error~\(\delta_{t}\) via \autoref{eq:q_learning}, which is used to critique the actor based on how good the selected action is. This process is visualised in \autoref{fig:bg_actor_critic_loop}. It is however argued that the co-dependence of each other's output distribution can result in instability during learning and make them difficult to tune \cite{quillen_deep_2018}. Despite of this, actor-critic model-free RL algorithms are utilised in this work.

\begin{figure}[ht]
\centering
Expand Down
Loading

0 comments on commit 308189f

Please sign in to comment.