Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds two new guide for 3D deep learning arch. + updates to old 3D DL guides. #2209

Open
wants to merge 1 commit into
base: next
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
},
"source": [
"<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Introduction\" data-toc-modified-id=\"Introduction-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href=\"#RandLA-Net-architecture\" data-toc-modified-id=\"RandLA-Net-architecture-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>RandLA-Net architecture</a></span><ul class=\"toc-item\"><li><span><a href=\"#Local-Feature-Aggregation\" data-toc-modified-id=\"Local-Feature-Aggregation-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Local Feature Aggregation</a></span></li></ul></li><li><span><a href=\"#Implementation-in-arcgis.learn\" data-toc-modified-id=\"Implementation-in-arcgis.learn-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Implementation in <code>arcgis.learn</code></a></span><ul class=\"toc-item\"><li><span><a href=\"#For-advanced-users\" data-toc-modified-id=\"For-advanced-users-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>For advanced users</a></span></li></ul></li><li><span><a href=\"#Setting-up-the-environment\" data-toc-modified-id=\"Setting-up-the-environment-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Setting up the environment</a></span><ul class=\"toc-item\"><li><ul class=\"toc-item\"><li><span><a href=\"#For-ArcGIS-Pro-users:\" data-toc-modified-id=\"For-ArcGIS-Pro-users:-4.0.1\"><span class=\"toc-item-num\">4.0.1&nbsp;&nbsp;</span>For ArcGIS Pro users:</a></span></li><li><span><a href=\"#For-Anaconda-users-(Windows-and-Linux-platforms):\" data-toc-modified-id=\"For-Anaconda-users-(Windows-and-Linux-platforms):-4.0.2\"><span class=\"toc-item-num\">4.0.2&nbsp;&nbsp;</span>For Anaconda users (Windows and Linux platforms):</a></span></li></ul></li></ul></li><li><span><a href=\"#Best-practices-for-RandLA-Net-workflow\" data-toc-modified-id=\"Best-practices-for-RandLA-Net-workflow-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Best practices for RandLA-Net workflow</a></span></li><li><span><a href=\"#References\" data-toc-modified-id=\"References-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>References</a></span></li></ul></div>"
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Introduction\" data-toc-modified-id=\"Introduction-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href=\"#RandLA-Net-architecture\" data-toc-modified-id=\"RandLA-Net-architecture-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>RandLA-Net architecture</a></span><ul class=\"toc-item\"><li><span><a href=\"#Local-Feature-Aggregation\" data-toc-modified-id=\"Local-Feature-Aggregation-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Local Feature Aggregation</a></span></li></ul></li><li><span><a href=\"#Implementation-in-arcgis.learn\" data-toc-modified-id=\"Implementation-in-arcgis.learn-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Implementation in <code>arcgis.learn</code></a></span><ul class=\"toc-item\"><li><span><a href=\"#For-advanced-users\" data-toc-modified-id=\"For-advanced-users-3.1\"><span class=\"toc-item-num\">3.1&nbsp;&nbsp;</span>For advanced users</a></span></li></ul></li><li><span><a href=\"#Best-practices-for-RandLA-Net-workflow\" data-toc-modified-id=\"Best-practices-for-RandLA-Net-workflow-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Best practices for RandLA-Net workflow</a></span></li><li><span><a href=\"#References\" data-toc-modified-id=\"References-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>References</a></span></li></ul></div>"
]
},
{
Expand Down Expand Up @@ -52,7 +52,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Point cloud classification is a task where each point in the point cloud is assigned a label, representing a real-world entity (see Figure 1.). And similar to how it's done in traditional methods, for deep learning, the point cloud classification process involves training – where the neural network learns from an already classified (labeled) point cloud dataset, where each point has a unique class code. These class codes are used to represent the features that we want the neural network to recognize. \n",
"Point cloud classification is a task where each point in the point cloud is assigned a label, representing a real-world entity (see Figure 1). And similar to how it's done in traditional methods, for deep learning, the point cloud classification process involves training – where the neural network learns from an already classified (labeled) point cloud dataset, where each point has a unique class code. These class codes are used to represent the features that we want the neural network to recognize. \n",
"\n",
"In deep learning workflows for point cloud classification, one should not use a ‘thinned-out’ representation of a point cloud dataset that preserves only class codes of interest but drops a majority of the undesired return points, as we would like the neural network to learn and be able to differentiate points of interest and those that are not. Likewise, additional attributes that are present in training datasets, for example, Intensity, RGB, number of returns, etc. will improve the model’s accuracy but could inversely affect it if those parameters are not correct in the datasets that are used for inferencing."
]
Expand Down Expand Up @@ -90,7 +90,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"RandLA-Net is an architecture that allows for the learning of point features within a point cloud by using an encoder-decoder sequence with skip connections. The network applies shared MLP layers along with four encoding and decoding layers, as well as three fully-connected layers and a dropout layer to predict the semantic label of each point (see Figure 2.).\n",
"RandLA-Net is an architecture that allows for the learning of point features within a point cloud by using an encoder-decoder sequence with skip connections. The network applies shared MLP layers along with four encoding and decoding layers, as well as three fully-connected layers and a dropout layer to predict the semantic label of each point (see Figure 2).\n",
"\n",
"\n",
"- The input to the architecture is a large-scale point cloud consisting of N points with feature dimensions of d<sub>in</sub>, where the batch dimension is dropped for simplicity.\n",
Expand Down Expand Up @@ -140,7 +140,7 @@
"- attentive pooling,\n",
"- and dilated residual block.\n",
"\n",
"These units work together to learn complex local structures by preserving local geometric features while progressively increasing the receptive field size in each neural layer (see Figure 3.). The LocSE unit is introduced first to capture the local spatial encoding of the point. Then, the attentive pooling unit is leveraged to select the most useful local features that contribute the most to the classification task. Finally, the multiple LocSE and attentive pooling units are stacked together as a dilated residual block to further enhance the effective receptive field for each point in a computationally efficient way."
"These units work together to learn complex local structures by preserving local geometric features while progressively increasing the receptive field size in each neural layer (see Figure 3). The LocSE unit is introduced first to capture the local spatial encoding of the point. Then, the attentive pooling unit is leveraged to select the most useful local features that contribute the most to the classification task. Finally, the multiple LocSE and attentive pooling units are stacked together as a dilated residual block to further enhance the effective receptive field for each point in a computationally efficient way."
]
},
{
Expand All @@ -164,7 +164,7 @@
"\n",
"In an attentive pooling unit, the attention mechanism is used to automatically learn important local features and aggregate neighboring point features while avoiding the loss of crucial information. It also maintains the focus on the overall objective, which is to learn complex local structures in a point cloud by considering the relative importance of neighboring point features.\n",
"\n",
"Lastly in the dilated residual block unit, the receptive field is increased for each point by stacking multiple LocSE and Attentive Pooling units. This dilated residual block operates by cheaply dilating the receptive field and expanding the effective neighborhood through feature propagation (see Figure 4.). Stacking more and more units enhances the receptive field and makes the block more powerful, which may compromise the overall computation efficiency and lead to overfitting. Hence, in RandLA-Net, two sets of LocSE and Attentive Pooling are stacked as a standard residual block to achieve a balance between efficiency and effectiveness <a href=\"#References\">[1]</a>."
"Lastly in the dilated residual block unit, the receptive field is increased for each point by stacking multiple LocSE and Attentive Pooling units. This dilated residual block operates by cheaply dilating the receptive field and expanding the effective neighborhood through feature propagation (see Figure 4). Stacking more and more units enhances the receptive field and makes the block more powerful, which may compromise the overall computation efficiency and lead to overfitting. Hence, in RandLA-Net, two sets of LocSE and Attentive Pooling are stacked as a standard residual block to achieve a balance between efficiency and effectiveness <a href=\"#References\">[1]</a>."
]
},
{
Expand Down Expand Up @@ -198,7 +198,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For this step of exporting the data into an intermediate format, use <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/prepare-point-cloud-training-data.htm\" target=\"_blank\">Prepare Point Cloud Training Data</a> tool, in the <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/an-overview-of-the-3d-analyst-toolbox.htm\" target=\"_blank\">3D Analyst extension</a>, available from ArcGIS Pro 2.8 onwards (see Figure 5.)."
"For this step of exporting the data into an intermediate format, use <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/prepare-point-cloud-training-data.htm\" target=\"_blank\">Prepare Point Cloud Training Data</a> tool, in the <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/an-overview-of-the-3d-analyst-toolbox.htm\" target=\"_blank\">3D Analyst extension</a> (see Figure 5)."
]
},
{
Expand Down Expand Up @@ -232,7 +232,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For inferencing, use <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/classify-point-cloud-using-trained-model.htm\" target=\"_blank\">Classify Points Using Trained Model</a> tool, in the <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/an-overview-of-the-3d-analyst-toolbox.htm\" target=\"_blank\">3D Analyst extension</a>, available from ArcGIS Pro 2.8 onwards (see Figure 6.).\n",
"For inferencing, use <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/classify-point-cloud-using-trained-model.htm\" target=\"_blank\">Classify Points Using Trained Model</a> tool, in the <a href=\"https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/an-overview-of-the-3d-analyst-toolbox.htm\" target=\"_blank\">3D Analyst extension</a> (see Figure 6).\n",
"\n",
"Main features available during the inferencing step:\n",
" \n",
Expand Down Expand Up @@ -285,64 +285,6 @@
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up the environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<i>Make sure to update your 'GPU driver' to a recent version and use 'Administrator Rights' for all the steps, written in this guide.</i>\n",
"\n",
"_**Below, are the instructions to set up the required 'conda environment':**_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### For ArcGIS Pro users:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://github.com/esri/deep-learning-frameworks\" target=\"_blank\">Deep learning frameworks</a>\n",
"can be used to install all the required dependencies in ArcGIS Pro's default python environment using an MSI installer. \n",
"\n",
"Alternatively, \n",
"for a cloned environment of ArcGIS Pro's default environment, `deep-learning-essentials` metapackage can be used to install the required dependencies which can be done using the following command, in the _`Python Command Prompt`_ <i>(included with ArcGIS Pro)</i>:\n",
"\n",
"`conda install deep-learning-essentials`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### For Anaconda users (Windows and Linux platforms):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`arcgis_learn` metapackage can be used for both `windows` and `linux` installations of `Anaconda` in a new environment.\n",
"\n",
"The following command will update `Anaconda` to the latest version. \n",
"\n",
"`conda update conda`\n",
"\n",
"After that, metapackage can be installed using the command below:\n",
"\n",
"`conda install -c esri arcgis_learn=3.9`"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -410,7 +352,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"- `mask_class` functionality in `show_results()` can be used for analyzing any inter-class noises present in the validation output. This can be used to understand which classes need more diversity in training data or need an increase in its number of labeled points _(As shown below, in Figure 7.)_.\n",
"- `mask_class` functionality in `show_results()` can be used for analyzing any inter-class noises present in the validation output. This can be used to understand which classes need more diversity in training data or need an increase in its number of labeled points _(See Figure 7)_.\n",
"\n",
"\n",
"<p align=\"center\"><center><img src=\"../../static/img/pointcnn_guide_gif_1.gif\" /></center></p>\n",
Expand Down Expand Up @@ -471,7 +413,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"[1] Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, N., & Markham, A. (2020). Randla-Net: Efficient semantic segmentation of large-scale point clouds. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 11105–11114. https://doi.org/10.1109/CVPR42600.2020.01112"
]
}
Expand All @@ -492,7 +433,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.11.10"
},
"toc": {
"base_numbering": 1,
Expand Down
Loading