diff --git a/README.md b/README.md
index b864c854c..30bf55876 100644
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ Vision primitives, such as [`imageNet`](imageNet.h) for image recognition, [`det
* [System Setup](#system-setup)
* [Building from Source on Jetson](#building-from-source-on-jetson)
* [Classifying Images with ImageNet](#classifying-images-with-imagenet)
- * [Using the Console Program](#using-the-console-program)
+ * [Using the Console Program on Jetson](#using-the-console-program)
* [Running the Live Camera Recognition Demo](#running-the-live-camera-recognition-demo)
* [Re-training the Network with DIGITS](#re-training-the-network-with-DIGITS)
* [Locating Object Coordinates using DetectNet](#locating-object-coordinates-using-detectNet)
@@ -327,7 +327,7 @@ There are multiple types of deep learning networks available, including recognit
The [`imageNet`](imageNet.h) object accepts an input image and outputs the probability for each class. Having been trained on ImageNet database of **[1000 objects](data/networks/ilsvrc12_synset_words.txt)**, the standard AlexNet and GoogleNet networks are downloaded during [step 2](#configuring-with-cmake) from above. As examples of using [`imageNet`](imageNet.h) we provide a command-line interface called [`imagenet-console`](imagenet-console/imagenet-console.cpp) and a live camera program called [`imagenet-camera`](imagenet-camera/imagenet-camera.cpp).
-### Using the Console Program
+### Using the Console Program on Jetson
First, use the [`imagenet-console`](imagenet-console/imagenet-console.cpp) program to test imageNet recognition on some example images. After [building](#building-from-source-on-jetson), make sure your terminal is located in the aarch64/bin directory:
@@ -390,57 +390,284 @@ Then, while creating the new network model in DIGITS, copy the [GoogleNet protot
The network training should now converge faster than if it were trained from scratch. After the desired accuracy has been reached, copy the new model checkpoint back over to your Jetson and proceed as before, but now with the added classes available for recognition.
## Locating Object Coordinates using DetectNet
-The previous image recognition examples output class probabilities representing the entire input image. The second deep learning capability we're highlighting in this tutorial is detecting multiple objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes). This is performed using a 'detectNet' - or object detection / localization network.
+The previous image recognition examples output class probabilities representing the entire input image. The second deep learning capability we're highlighting in this tutorial is detecting objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes). This is performed using a 'detectNet' - or object detection / localization network.
-The [`detectNet`](detectNet.h) object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes. Three example detection network models are are automatically downloaded during the repo [source configuration](#configuring):
+The [`detectNet`](detectNet.h) object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes. To train the object detection model, first a pretrained ImageNet recognition model (like Googlenet) is used with bounding coordinate labels included in the training dataset in addition to the source imagery.
+
+The following pretrained DetectNet models are included with the tutorial:
1. **ped-100** (single-class pedestrian detector)
2. **multiped-500** (multi-class pedestrian + baggage detector)
3. **facenet-120** (single-class facial recognition detector)
+4. **coco-airplane** (MS COCO airplane class)
+5. **coco-bottle** (MS COCO bottle class)
+6. **coco-chair** (MS COCO chair class)
+7. **coco-dog** (MS COCO dog class)
As with the previous examples, provided are a console program and a camera streaming program for using detectNet.
-### Processing Images from the Command Line
-To process test images with [`detectNet`](detectNet.h) and TensorRT, use the [`detectnet-console`](detectnet-console/detectnet-console.cpp) program. [`detectnet-console`](detectnet-console/detectnet-console.cpp) accepts command-line arguments representing the path to the input image and path to the output image (with the bounding box overlays rendered). Some test images are included with the repo:
+### Detection Data Formats with DIGITS
+
+Example object detection datasets with include [KITTI](http://www.cvlibs.net/datasets/kitti/eval_object.php), [MS-COCO](http://mscoco.org/), and others. To use the KITTI dataset follow this [DIGITS object detection tutorial with KITTI](https://github.com/NVIDIA/DIGITS/blob/digits-4.0/digits/extensions/data/objectDetection/README.md).
+
+Regardless of dataset, DIGITS uses KITTI metadata format for ingesting the detection bounding labels. These consist of text files with frame numbers corresponding to image filenames, including contents such as:
+
+```dog 0 0 0 528.63 315.22 569.09 354.18 0 0 0 0 0 0 0
+sheep 0 0 0 235.28 300.59 270.52 346.55 0 0 0 0 0 0 0```
+
+[Read more](https://github.com/NVIDIA/DIGITS/blob/digits-4.0/digits/extensions/data/objectDetection/README.md) about the folder structure and KITTI label format that DIGITS uses.
+
+### Downloading the Detection Dataset
+
+Let's explore using [MS-COCO](http://mscoco.org/) to train and deploy networks that detect the locations of everyday objects in camera feeds. See the [coco2kitty.py](tools/coco2kitty.py) script for converting MS-COCO object classes to KITTI format. Once in DIGITS folder structure, they can be imported as datasets into DIGITS.
+
+To get started, from a terminal on your DIGITS server download and extract [sample MS-COCO classes](https://nvidia.box.com/shared/static/tdrvaw3fd2cwst2zu2jsi0u43vzk8ecu.gz) already in DIGITS/KITTI format here:
+
+```bash
+$ wget --no-check-certificate https://nvidia.box.com/shared/static/tdrvaw3fd2cwst2zu2jsi0u43vzk8ecu.gz -O coco.tar.gz
+
+HTTP request sent, awaiting response... 200 OK
+Length: 5140413391 (4.5G) [application/octet-stream]
+Saving to: ‘coco.tar.gz’
+
+coco 100%[======================================>] 4.5G 3.33MB/s in 28m 22s
+
+2017-04-17 10:41:19 (2.5 MB/s) - ‘coco.tar.gz’ saved [5140413391/5140413391]
+
+$ tar -xzvf coco.tar.gz
+```
+
+Included is the training data in DIGITS format for the airplane, bottle, chair, and dog classes. `coco2kitty.py` can be used to convert other classes.
+
+### Importing the Detection Dataset into DIGITS
+
+Navigate your browser to your DIGITS server instance and choose to create a new `Detection Dataset` from the drop-down in the Datasets tab:
+
+
+
+In the form fields, specify the following options and paths to the image and label folders under the location where you extracted the aerial dataset:
+
+* Training image folder: `coco/train/images/dog`
+* Training label folder: `coco/train/labels/dog`
+* Validation image folder: `coco/val/images/dog`
+* Validation label folder: `coco/val/labels/dog`
+* Pad image (Width x Height): 640 x 640
+* Custom classes: dontcare, dog
+* Group Name: MS-COCO
+* Dataset Name: coco-dog
+
+data:image/s3,"s3://crabby-images/54f0b/54f0b2d03fab514d530cc353eec10584bc37512f" alt="Alt text"
+
+Name the dataset whatever you choose and click the `Create` button at the bottom of the page to launch the importing job. Next we'll create the new detection model and begin training it.
+
+### Creating DetectNet Model with DIGITS
+
+When the previous data import job is complete, return to the DIGITS home screen. Select the `Models` tab and choose to create a new `Detection Model` from the drop-down:
+
+
+
+Make the following settings in the form:
+
+* Select Dataset: `coco-dog`
+* Training epochs: `100`
+* Subtract Mean: `none`
+* Solver Type: `Adam`
+* Base learning rate: `2.5e-05`
+* Select `Show advanced learning options`
+ * Policy: `Exponential Decay`
+ * Gamma: `0.99`
+
+#### Selecting DetectNet Batch Size
+
+DetectNet's network default batch size of 10 consumes up to 12GB GPU memory during training. However by using the `Batch Accumulation` field, you can also train DetectNet on a GPU with less than 12GB memory. See the table below depending on the amount of GPU memory available in your DIGITS server:
+
+| GPU Memory | Batch Size | Batch Accumulation |
+| -------------- |:-------------------------:|:-------------------:|
+| 4GB | 2 | 5 |
+| 8GB | 5 | 2 |
+| 12GB or larger | `[network defaults]` (10) | Leave blank (1) |
+
+If you're training on a card with 12GB of memory or more, leave the `Batch Size` as the default and leave the `Batch Accumulation` blank. For GPUs with less memory, use the settings from above.
+
+#### Specifying the DetectNet Prototxt
+
+In the network area select the `Custom Network` tab and then copy/paste the contents of [`detectnet.prototxt`](data/networks/detectnet.prototxt)
+
+data:image/s3,"s3://crabby-images/ab8c3/ab8c328463cd7de3d9781311a942da01bb321248" alt="Alt text"
+
+#### Training the Model with Pretrained Googlenet
+
+Since DetectNet is derived from Googlenet it is strongly recommended to use pre-trained weights from Googlenet as this will help speed up and stabilize training significantly. Download the model from [here](http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel) or by running the following command from your DIGITS server:
+
+```bash
+wget http://dl.caffe.berkeleyvision.org/bvlc_googlenet.caffemodel
+```
+
+Then specify the path to your Googlenet under the `Pretrained Model` field.
+
+Select a GPU to train on and set a name and group for the model:
+
+* Group Name `MS-COCO`
+* Model Name `DetectNet-COCO-Dog`
+
+Finally, click the `Create` button at the bottom of the form to begin training.
+
+data:image/s3,"s3://crabby-images/a08fd/a08fddb36cfa1d1b8a8450ac74f4f41c431bbfbe" alt="Alt text"
+
+### Testing DetectNet Model in DIGITS
+
+Leave the training job to run for a while, say 50 epochs, until the mAP (`Mean Average Precision`) plot begins to increase. Note that due to the way mAP is calculated by the DetectNet loss function, the scale of mAP isn't necessarily 0-100, and even an mAP between 5 and 10 may indicate the model is functional. With the size of the example COCO datasets we are using, it should take a couple hours training on a recent GPU before all 100 epochs are complete.
+
+data:image/s3,"s3://crabby-images/26f81/26f81fe33334d1c6a20c4cbea12058c5de506b60" alt="Alt text"
+
+At this point, we can try testing our new model's inference on some example images in DIGITS. On the same page as the plot above, scroll down under the `Trained Models` section. Set the `Visualization Model` to *Bounding Boxes* and under `Test a Single Image`, select an image to try (for example from the COCO validation set, `/coco/val/images/dog/000074.png`):
+
+
+
+Press the `Test One` button and you should see a display similar to:
+
+data:image/s3,"s3://crabby-images/07f0c/07f0c07331bff08f3a75d2e86537d9f6e3dfb3bd" alt="Alt text"
+
+
+### Download the Model Snapshot to Jetson
+
+Next, download and extract the trained model snapshot to Jetson. From the browser on your Jetson TX1/TX2, navigate to your DIGITS server and the `DetectNet-COCO-Dog` model. Under the `Trained Models` section, select the desired snapshot from the drop-down (usually the one with the highest epoch) and click the `Download Model` button.
+
+data:image/s3,"s3://crabby-images/38a30/38a30cd0b10065f92d97b6cc73da61e87e2cf5da" alt="Alt text"
+
+Alternatively, if your Jetson and DIGITS server aren't accessible from the same network, you can use the step above to download the snapshot to an intermediary machine and then use SCP or USB stick to copy it to Jetson. You can then extract the archive with a command similar to:
+
+```cd
+tar -xzvf 20170504-190602-879f_epoch_100.0.tar.gz
+```
+
+### DetectNet Patches for TensorRT
+
+In the original DetectNet prototxt exists a Python clustering layer which isn't available in TensorRT and should be deleted from the `deploy.prototxt` included in the snapshot. In this repo the [`detectNet`](detectNet.h) class handles the clustering as opposed to Python.
+
+At the end of `deploy.prototxt`, delete the layer named `cluster`:
+
+```
+layer {
+ name: "cluster"
+ type: "Python"
+ bottom: "coverage"
+ bottom: "bboxes"
+ top: "bbox-list"
+ python_param {
+ module: "caffe.layers.detectnet.clustering"
+ layer: "ClusterDetections"
+ param_str: "640, 640, 16, 0.6, 2, 0.02, 22, 1"
+ }
+}
+```
+
+The snapshot can now be imported into TensorRT.
+
+
+### Processing Images from the Command Line on Jetson
+
+To process test images with [`detectNet`](detectNet.h) and TensorRT, use the [`detectnet-console`](detectnet-console/detectnet-console.cpp) program. [`detectnet-console`](detectnet-console/detectnet-console.cpp) accepts command-line arguments representing the path to the input image and path to the output image (with the bounding box overlays rendered). Some test images are also included with the repo.
+
+To specify your model that you downloaded from DIGITS, use the syntax to `detectnet-console` below. First, for convienience, set the path to your extracted snapshot into a `$NET` variable:
``` bash
-$ ./detectnet-console peds-007.png output-7.png
+$ NET=20170504-190602-879f_epoch_100
+
+$ ./detectnet-console dog_0.jpg output_0.jpg \
+--prototxt=$NET/deploy.prototxt \
+--model=$NET/snapshot_iter_38600.caffemodel \
+--input_blob=data \
+--output_cvg=coverage \
+--output_bbox=bboxes
```
-
+> ***note:*** the `input_blob`, `output_cvg`, and `output_bbox` arguments may be omitted if your DetectNet layer names match the defaults above (i.e. if you are using the prototxt from following this tutorial). They are provided if you are using a customized DetectNet with different layer names.
+
+data:image/s3,"s3://crabby-images/77163/77163cf9aefdc6c746575bdea0d63096dd93b5b9" alt="Alt text"
+
+Alternatively, to load one of the pretrained snapshots that comes with the repo, you can specify the pretrained model name as the 3rd argument to `detectnet-console`:
+
+``` bash
+$ ./detectnet-console dog_1.jpg output_1.jpg coco-dog
+```
+
+The above command will process dog_1.jpg, saving it to output_1.jpg, using the pretrained DetectNet-COCO-Dog model. This is a shortcut of sorts so you don't need to wait for the model to complete training if you don't want to.
+
+data:image/s3,"s3://crabby-images/9b49a/9b49a8426d23a73fb2c69937d3751587d8c25709" alt="Alt text"
+
+#### Pretrained DetectNet Models Available
+
+Below is a table of the pretrained DetectNet snapshots downloaded with the repo (located in the `data/networks` directory after running `cmake` step) and the associated argument to `detectnet-console` used for loading the pretrained model:
+
+| DIGITS model | CLI argument | classes |
+| ------------------------|---------------|----------------------|
+| DetectNet-COCO-Airplane | coco-airplane | airplanes |
+| DetectNet-COCO-Bottle | coco-bottle | bottles |
+| DetectNet-COCO-Chair | coco-chair | chairs |
+| DetectNet-COCO-Dog | coco-dog | dogs |
+| ped-100 | pednet | pedestrians |
+| multiped-500 | multiped | pedestrians, luggage |
+| facenet-120 | facenet | faces |
-To change the network that [`detectnet-console`](detectnet-console/detectnet-console.cpp) uses, modify [`detectnet-console.cpp`](detectnet-console/detectnet-console.cpp) (beginning line 33):
-``` c
-detectNet* net = detectNet::Create( detectNet::PEDNET_MULTI ); // uncomment to enable one of these
-//detectNet* net = detectNet::Create( detectNet::PEDNET );
-//detectNet* net = detectNet::Create( detectNet::FACENET );
+These all also have the python layer patch above already applied.
+
+#### Running Other MS-COCO Models on Jetson
+
+Let's try running some of the other COCO models. The training data for these are all included in the dataset downloaded above. Although the DIGITS training example above was for the coco-dog model, the same procedure can be followed to train DetectNet on the other classes included in the sample COCO dataset.
+
+``` bash
+$ ./detectnet-console bottle_0.jpg output_2.jpg coco-bottle
```
-Then to recompile, navigate to the `jetson-inference/build` directory and run `make`.
+
+data:image/s3,"s3://crabby-images/0122d/0122df83e23137af262c9b993d2f9a9de4be41fe" alt="Alt text"
+
+
+``` bash
+$ ./detectnet-console airplane_0.jpg output_3.jpg coco-airplane
+```
+
+data:image/s3,"s3://crabby-images/ddb74/ddb74299ef1536420eaff0d26d44f470b067108f" alt="Alt text"
+
+#### Running Pedestrian Models on Jetson
+
+Included in the repo are also DetectNet models pretrained to detect humans. The `pednet` and `multiped` models recognized pedestrians while `facenet` recognizes faces (from [FDDB](http://vis-www.cs.umass.edu/fddb/)). Here's an example of detecting multiple humans simultaneously in a crowded space:
+
+
+``` bash
+$ ./detectnet-console peds-007.png output_7.png multiped
+```
+
+
+
### Multi-class Object Detection
-When using the multiped-500 model (`PEDNET_MULTI`), for images containing luggage or baggage in addition to pedestrians, the 2nd object class is rendered with a green overlay.
+When using the multiped model (`PEDNET_MULTI`), for images containing luggage or baggage in addition to pedestrians, the 2nd object class is rendered with a green overlay.
+
``` bash
-$ ./detectnet-console peds-008.png output-8.png
+$ ./detectnet-console peds-008.png output_8.png multiped
```
-### Running the Live Camera Detection Demo
+### Running the Live Camera Detection Demo on Jetson
Similar to the previous example, [`detectnet-camera`](detectnet-camera/detectnet-camera.cpp) runs the object detection networks on live video feed from the Jetson onboard camera. Launch it from command line along with the type of desired network:
``` bash
+$ ./detectnet-camera coco-bottle # detect bottles/soda cans in the camera
+$ ./detectnet-camera coco-dog # detect dogs in the camera
$ ./detectnet-camera multiped # run using multi-class pedestrian/luggage detector
-$ ./detectnet-camera ped-100 # run using original single-class pedestrian detector
+$ ./detectnet-camera pednet # run using original single-class pedestrian detector
$ ./detectnet-camera facenet # run using facial recognition network
$ ./detectnet-camera # by default, program will run using multiped
```
-> **note**: to achieve maximum performance while running detectnet, increase the Jetson TX1 clock limits by running the script:
+> **note**: to achieve maximum performance while running detectnet, increase the Jetson clock limits by running the script:
> `sudo ~/jetson_clocks.sh`
-> **note**: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the `DEFAULT_CAMERA` define at the top of [`detectnet-camera.cpp`](detectnet-camera/detectnet-camera.cpp) to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920.
+> **note**: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the `DEFAULT_CAMERA` define at the top of [`detectnet-camera.cpp`](detectnet-camera/detectnet-camera.cpp) to reflect the /dev/video V4L2 device of your USB camera and recompile. The webcam model it's tested with is Logitech C920.
### Re-training DetectNet with DIGITS
@@ -535,7 +762,7 @@ At this point, we can try testing our new model's inference on some example imag
### Testing Inference Model in DIGITS
-Before transfering the trained model to Jetson, let's test it first in DIGITS. On the same page as previous plot, scroll down under the `Trained Models` section. Set the `Visualization Model` to Image Segmentation and under `Test a Single Image`, select an image to try (for example `/NVIDIA-Aerial-Drone-Dataset/FPV/SFWA/720p/images/0428.png`):
+Before transfering the trained model to Jetson, let's test it first in DIGITS. On the same page as previous plot, scroll down under the `Trained Models` section. Set the `Visualization Model` to *Image Segmentation* and under `Test a Single Image`, select an image to try (for example `/NVIDIA-Aerial-Drone-Dataset/FPV/SFWA/720p/images/0428.png`):
diff --git a/data/images/dog_2.jpg b/data/images/dog_2.jpg
new file mode 100644
index 000000000..7eabe4c41
Binary files /dev/null and b/data/images/dog_2.jpg differ
diff --git a/data/networks/detectnet.prototxt b/data/networks/detectnet.prototxt
new file mode 100644
index 000000000..f20add1e7
--- /dev/null
+++ b/data/networks/detectnet.prototxt
@@ -0,0 +1,2548 @@
+# DetectNet network
+
+# Data/Input layers
+name: "DetectNet"
+layer {
+ name: "train_data"
+ type: "Data"
+ top: "data"
+ data_param {
+ backend: LMDB
+ source: "examples/kitti/kitti_train_images.lmdb"
+ batch_size: 10
+ }
+ include: { phase: TRAIN }
+}
+layer {
+ name: "train_label"
+ type: "Data"
+ top: "label"
+ data_param {
+ backend: LMDB
+ source: "examples/kitti/kitti_train_labels.lmdb"
+ batch_size: 10
+ }
+ include: { phase: TRAIN }
+}
+layer {
+ name: "val_data"
+ type: "Data"
+ top: "data"
+ data_param {
+ backend: LMDB
+ source: "examples/kitti/kitti_test_images.lmdb"
+ batch_size: 6
+ }
+ include: { phase: TEST stage: "val" }
+}
+layer {
+ name: "val_label"
+ type: "Data"
+ top: "label"
+ data_param {
+ backend: LMDB
+ source: "examples/kitti/kitti_test_labels.lmdb"
+ batch_size: 6
+ }
+ include: { phase: TEST stage: "val" }
+}
+layer {
+ name: "deploy_data"
+ type: "Input"
+ top: "data"
+ input_param {
+ shape {
+ dim: 1
+ dim: 3
+ dim: 640
+ dim: 640
+ }
+ }
+ include: { phase: TEST not_stage: "val" }
+}
+
+# Data transformation layers
+layer {
+ name: "train_transform"
+ type: "DetectNetTransformation"
+ bottom: "data"
+ bottom: "label"
+ top: "transformed_data"
+ top: "transformed_label"
+ detectnet_groundtruth_param: {
+ stride: 16
+ scale_cvg: 0.4
+ gridbox_type: GRIDBOX_MIN
+ coverage_type: RECTANGULAR
+ min_cvg_len: 20
+ obj_norm: true
+ image_size_x: 640
+ image_size_y: 640
+ crop_bboxes: false
+ object_class: { src: 1 dst: 0} # obj class 1 -> cvg index 0
+ }
+ detectnet_augmentation_param: {
+ crop_prob: 1
+ shift_x: 32
+ shift_y: 32
+ flip_prob: 0.5
+ rotation_prob: 0
+ max_rotate_degree: 5
+ scale_prob: 0.4
+ scale_min: 0.8
+ scale_max: 1.2
+ hue_rotation_prob: 0.8
+ hue_rotation: 30
+ desaturation_prob: 0.8
+ desaturation_max: 0.8
+ }
+ transform_param: {
+ mean_value: 127
+ }
+ include: { phase: TRAIN }
+}
+layer {
+ name: "val_transform"
+ type: "DetectNetTransformation"
+ bottom: "data"
+ bottom: "label"
+ top: "transformed_data"
+ top: "transformed_label"
+ detectnet_groundtruth_param: {
+ stride: 16
+ scale_cvg: 0.4
+ gridbox_type: GRIDBOX_MIN
+ coverage_type: RECTANGULAR
+ min_cvg_len: 20
+ obj_norm: true
+ image_size_x: 640
+ image_size_y: 640
+ crop_bboxes: false
+ object_class: { src: 1 dst: 0} # obj class 1 -> cvg index 0
+ }
+ transform_param: {
+ mean_value: 127
+ }
+ include: { phase: TEST stage: "val" }
+}
+layer {
+ name: "deploy_transform"
+ type: "Power"
+ bottom: "data"
+ top: "transformed_data"
+ power_param {
+ shift: -127
+ }
+ include: { phase: TEST not_stage: "val" }
+}
+
+# Label conversion layers
+layer {
+ name: "slice-label"
+ type: "Slice"
+ bottom: "transformed_label"
+ top: "foreground-label"
+ top: "bbox-label"
+ top: "size-label"
+ top: "obj-label"
+ top: "coverage-label"
+ slice_param {
+ slice_dim: 1
+ slice_point: 1
+ slice_point: 5
+ slice_point: 7
+ slice_point: 8
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "coverage-block"
+ type: "Concat"
+ bottom: "foreground-label"
+ bottom: "foreground-label"
+ bottom: "foreground-label"
+ bottom: "foreground-label"
+ top: "coverage-block"
+ concat_param {
+ concat_dim: 1
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "size-block"
+ type: "Concat"
+ bottom: "size-label"
+ bottom: "size-label"
+ top: "size-block"
+ concat_param {
+ concat_dim: 1
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "obj-block"
+ type: "Concat"
+ bottom: "obj-label"
+ bottom: "obj-label"
+ bottom: "obj-label"
+ bottom: "obj-label"
+ top: "obj-block"
+ concat_param {
+ concat_dim: 1
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "bb-label-norm"
+ type: "Eltwise"
+ bottom: "bbox-label"
+ bottom: "size-block"
+ top: "bbox-label-norm"
+ eltwise_param {
+ operation: PROD
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "bb-obj-norm"
+ type: "Eltwise"
+ bottom: "bbox-label-norm"
+ bottom: "obj-block"
+ top: "bbox-obj-label-norm"
+ eltwise_param {
+ operation: PROD
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+
+######################################################################
+# Start of convolutional network
+######################################################################
+
+layer {
+ name: "conv1/7x7_s2"
+ type: "Convolution"
+ bottom: "transformed_data"
+ top: "conv1/7x7_s2"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ pad: 3
+ kernel_size: 7
+ stride: 2
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "conv1/relu_7x7"
+ type: "ReLU"
+ bottom: "conv1/7x7_s2"
+ top: "conv1/7x7_s2"
+}
+
+layer {
+ name: "pool1/3x3_s2"
+ type: "Pooling"
+ bottom: "conv1/7x7_s2"
+ top: "pool1/3x3_s2"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 2
+ }
+}
+
+layer {
+ name: "pool1/norm1"
+ type: "LRN"
+ bottom: "pool1/3x3_s2"
+ top: "pool1/norm1"
+ lrn_param {
+ local_size: 5
+ alpha: 0.0001
+ beta: 0.75
+ }
+}
+
+layer {
+ name: "conv2/3x3_reduce"
+ type: "Convolution"
+ bottom: "pool1/norm1"
+ top: "conv2/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "conv2/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "conv2/3x3_reduce"
+ top: "conv2/3x3_reduce"
+}
+
+layer {
+ name: "conv2/3x3"
+ type: "Convolution"
+ bottom: "conv2/3x3_reduce"
+ top: "conv2/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 192
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "conv2/relu_3x3"
+ type: "ReLU"
+ bottom: "conv2/3x3"
+ top: "conv2/3x3"
+}
+
+layer {
+ name: "conv2/norm2"
+ type: "LRN"
+ bottom: "conv2/3x3"
+ top: "conv2/norm2"
+ lrn_param {
+ local_size: 5
+ alpha: 0.0001
+ beta: 0.75
+ }
+}
+
+layer {
+ name: "pool2/3x3_s2"
+ type: "Pooling"
+ bottom: "conv2/norm2"
+ top: "pool2/3x3_s2"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 2
+ }
+}
+
+layer {
+ name: "inception_3a/1x1"
+ type: "Convolution"
+ bottom: "pool2/3x3_s2"
+ top: "inception_3a/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_3a/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_3a/1x1"
+ top: "inception_3a/1x1"
+}
+
+layer {
+ name: "inception_3a/3x3_reduce"
+ type: "Convolution"
+ bottom: "pool2/3x3_s2"
+ top: "inception_3a/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 96
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.09
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_3a/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_3a/3x3_reduce"
+ top: "inception_3a/3x3_reduce"
+}
+
+layer {
+ name: "inception_3a/3x3"
+ type: "Convolution"
+ bottom: "inception_3a/3x3_reduce"
+ top: "inception_3a/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_3a/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_3a/3x3"
+ top: "inception_3a/3x3"
+}
+
+layer {
+ name: "inception_3a/5x5_reduce"
+ type: "Convolution"
+ bottom: "pool2/3x3_s2"
+ top: "inception_3a/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 16
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.2
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3a/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_3a/5x5_reduce"
+ top: "inception_3a/5x5_reduce"
+}
+layer {
+ name: "inception_3a/5x5"
+ type: "Convolution"
+ bottom: "inception_3a/5x5_reduce"
+ top: "inception_3a/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 32
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3a/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_3a/5x5"
+ top: "inception_3a/5x5"
+}
+
+layer {
+ name: "inception_3a/pool"
+ type: "Pooling"
+ bottom: "pool2/3x3_s2"
+ top: "inception_3a/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+
+layer {
+ name: "inception_3a/pool_proj"
+ type: "Convolution"
+ bottom: "inception_3a/pool"
+ top: "inception_3a/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 32
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3a/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_3a/pool_proj"
+ top: "inception_3a/pool_proj"
+}
+
+layer {
+ name: "inception_3a/output"
+ type: "Concat"
+ bottom: "inception_3a/1x1"
+ bottom: "inception_3a/3x3"
+ bottom: "inception_3a/5x5"
+ bottom: "inception_3a/pool_proj"
+ top: "inception_3a/output"
+}
+
+layer {
+ name: "inception_3b/1x1"
+ type: "Convolution"
+ bottom: "inception_3a/output"
+ top: "inception_3b/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_3b/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_3b/1x1"
+ top: "inception_3b/1x1"
+}
+
+layer {
+ name: "inception_3b/3x3_reduce"
+ type: "Convolution"
+ bottom: "inception_3a/output"
+ top: "inception_3b/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.09
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3b/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_3b/3x3_reduce"
+ top: "inception_3b/3x3_reduce"
+}
+layer {
+ name: "inception_3b/3x3"
+ type: "Convolution"
+ bottom: "inception_3b/3x3_reduce"
+ top: "inception_3b/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 192
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3b/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_3b/3x3"
+ top: "inception_3b/3x3"
+}
+
+layer {
+ name: "inception_3b/5x5_reduce"
+ type: "Convolution"
+ bottom: "inception_3a/output"
+ top: "inception_3b/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 32
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.2
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3b/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_3b/5x5_reduce"
+ top: "inception_3b/5x5_reduce"
+}
+layer {
+ name: "inception_3b/5x5"
+ type: "Convolution"
+ bottom: "inception_3b/5x5_reduce"
+ top: "inception_3b/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 96
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3b/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_3b/5x5"
+ top: "inception_3b/5x5"
+}
+
+layer {
+ name: "inception_3b/pool"
+ type: "Pooling"
+ bottom: "inception_3a/output"
+ top: "inception_3b/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_3b/pool_proj"
+ type: "Convolution"
+ bottom: "inception_3b/pool"
+ top: "inception_3b/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_3b/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_3b/pool_proj"
+ top: "inception_3b/pool_proj"
+}
+layer {
+ name: "inception_3b/output"
+ type: "Concat"
+ bottom: "inception_3b/1x1"
+ bottom: "inception_3b/3x3"
+ bottom: "inception_3b/5x5"
+ bottom: "inception_3b/pool_proj"
+ top: "inception_3b/output"
+}
+
+layer {
+ name: "pool3/3x3_s2"
+ type: "Pooling"
+ bottom: "inception_3b/output"
+ top: "pool3/3x3_s2"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 2
+ }
+}
+
+layer {
+ name: "inception_4a/1x1"
+ type: "Convolution"
+ bottom: "pool3/3x3_s2"
+ top: "inception_4a/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 192
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_4a/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_4a/1x1"
+ top: "inception_4a/1x1"
+}
+
+layer {
+ name: "inception_4a/3x3_reduce"
+ type: "Convolution"
+ bottom: "pool3/3x3_s2"
+ top: "inception_4a/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 96
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.09
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_4a/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_4a/3x3_reduce"
+ top: "inception_4a/3x3_reduce"
+}
+
+layer {
+ name: "inception_4a/3x3"
+ type: "Convolution"
+ bottom: "inception_4a/3x3_reduce"
+ top: "inception_4a/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 208
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_4a/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_4a/3x3"
+ top: "inception_4a/3x3"
+}
+
+layer {
+ name: "inception_4a/5x5_reduce"
+ type: "Convolution"
+ bottom: "pool3/3x3_s2"
+ top: "inception_4a/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 16
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.2
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4a/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_4a/5x5_reduce"
+ top: "inception_4a/5x5_reduce"
+}
+layer {
+ name: "inception_4a/5x5"
+ type: "Convolution"
+ bottom: "inception_4a/5x5_reduce"
+ top: "inception_4a/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 48
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4a/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_4a/5x5"
+ top: "inception_4a/5x5"
+}
+layer {
+ name: "inception_4a/pool"
+ type: "Pooling"
+ bottom: "pool3/3x3_s2"
+ top: "inception_4a/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_4a/pool_proj"
+ type: "Convolution"
+ bottom: "inception_4a/pool"
+ top: "inception_4a/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4a/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_4a/pool_proj"
+ top: "inception_4a/pool_proj"
+}
+layer {
+ name: "inception_4a/output"
+ type: "Concat"
+ bottom: "inception_4a/1x1"
+ bottom: "inception_4a/3x3"
+ bottom: "inception_4a/5x5"
+ bottom: "inception_4a/pool_proj"
+ top: "inception_4a/output"
+}
+
+layer {
+ name: "inception_4b/1x1"
+ type: "Convolution"
+ bottom: "inception_4a/output"
+ top: "inception_4b/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 160
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_4b/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_4b/1x1"
+ top: "inception_4b/1x1"
+}
+layer {
+ name: "inception_4b/3x3_reduce"
+ type: "Convolution"
+ bottom: "inception_4a/output"
+ top: "inception_4b/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 112
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.09
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4b/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_4b/3x3_reduce"
+ top: "inception_4b/3x3_reduce"
+}
+layer {
+ name: "inception_4b/3x3"
+ type: "Convolution"
+ bottom: "inception_4b/3x3_reduce"
+ top: "inception_4b/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 224
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4b/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_4b/3x3"
+ top: "inception_4b/3x3"
+}
+layer {
+ name: "inception_4b/5x5_reduce"
+ type: "Convolution"
+ bottom: "inception_4a/output"
+ top: "inception_4b/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 24
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.2
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4b/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_4b/5x5_reduce"
+ top: "inception_4b/5x5_reduce"
+}
+layer {
+ name: "inception_4b/5x5"
+ type: "Convolution"
+ bottom: "inception_4b/5x5_reduce"
+ top: "inception_4b/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4b/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_4b/5x5"
+ top: "inception_4b/5x5"
+}
+layer {
+ name: "inception_4b/pool"
+ type: "Pooling"
+ bottom: "inception_4a/output"
+ top: "inception_4b/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_4b/pool_proj"
+ type: "Convolution"
+ bottom: "inception_4b/pool"
+ top: "inception_4b/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4b/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_4b/pool_proj"
+ top: "inception_4b/pool_proj"
+}
+layer {
+ name: "inception_4b/output"
+ type: "Concat"
+ bottom: "inception_4b/1x1"
+ bottom: "inception_4b/3x3"
+ bottom: "inception_4b/5x5"
+ bottom: "inception_4b/pool_proj"
+ top: "inception_4b/output"
+}
+
+layer {
+ name: "inception_4c/1x1"
+ type: "Convolution"
+ bottom: "inception_4b/output"
+ top: "inception_4c/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_4c/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_4c/1x1"
+ top: "inception_4c/1x1"
+}
+
+layer {
+ name: "inception_4c/3x3_reduce"
+ type: "Convolution"
+ bottom: "inception_4b/output"
+ top: "inception_4c/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.09
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+
+layer {
+ name: "inception_4c/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_4c/3x3_reduce"
+ top: "inception_4c/3x3_reduce"
+}
+layer {
+ name: "inception_4c/3x3"
+ type: "Convolution"
+ bottom: "inception_4c/3x3_reduce"
+ top: "inception_4c/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 256
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4c/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_4c/3x3"
+ top: "inception_4c/3x3"
+}
+layer {
+ name: "inception_4c/5x5_reduce"
+ type: "Convolution"
+ bottom: "inception_4b/output"
+ top: "inception_4c/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 24
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.2
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4c/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_4c/5x5_reduce"
+ top: "inception_4c/5x5_reduce"
+}
+layer {
+ name: "inception_4c/5x5"
+ type: "Convolution"
+ bottom: "inception_4c/5x5_reduce"
+ top: "inception_4c/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4c/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_4c/5x5"
+ top: "inception_4c/5x5"
+}
+layer {
+ name: "inception_4c/pool"
+ type: "Pooling"
+ bottom: "inception_4b/output"
+ top: "inception_4c/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_4c/pool_proj"
+ type: "Convolution"
+ bottom: "inception_4c/pool"
+ top: "inception_4c/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4c/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_4c/pool_proj"
+ top: "inception_4c/pool_proj"
+}
+layer {
+ name: "inception_4c/output"
+ type: "Concat"
+ bottom: "inception_4c/1x1"
+ bottom: "inception_4c/3x3"
+ bottom: "inception_4c/5x5"
+ bottom: "inception_4c/pool_proj"
+ top: "inception_4c/output"
+}
+
+layer {
+ name: "inception_4d/1x1"
+ type: "Convolution"
+ bottom: "inception_4c/output"
+ top: "inception_4d/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 112
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4d/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_4d/1x1"
+ top: "inception_4d/1x1"
+}
+layer {
+ name: "inception_4d/3x3_reduce"
+ type: "Convolution"
+ bottom: "inception_4c/output"
+ top: "inception_4d/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 144
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4d/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_4d/3x3_reduce"
+ top: "inception_4d/3x3_reduce"
+}
+layer {
+ name: "inception_4d/3x3"
+ type: "Convolution"
+ bottom: "inception_4d/3x3_reduce"
+ top: "inception_4d/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 288
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4d/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_4d/3x3"
+ top: "inception_4d/3x3"
+}
+layer {
+ name: "inception_4d/5x5_reduce"
+ type: "Convolution"
+ bottom: "inception_4c/output"
+ top: "inception_4d/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 32
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4d/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_4d/5x5_reduce"
+ top: "inception_4d/5x5_reduce"
+}
+layer {
+ name: "inception_4d/5x5"
+ type: "Convolution"
+ bottom: "inception_4d/5x5_reduce"
+ top: "inception_4d/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4d/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_4d/5x5"
+ top: "inception_4d/5x5"
+}
+layer {
+ name: "inception_4d/pool"
+ type: "Pooling"
+ bottom: "inception_4c/output"
+ top: "inception_4d/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_4d/pool_proj"
+ type: "Convolution"
+ bottom: "inception_4d/pool"
+ top: "inception_4d/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 64
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4d/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_4d/pool_proj"
+ top: "inception_4d/pool_proj"
+}
+layer {
+ name: "inception_4d/output"
+ type: "Concat"
+ bottom: "inception_4d/1x1"
+ bottom: "inception_4d/3x3"
+ bottom: "inception_4d/5x5"
+ bottom: "inception_4d/pool_proj"
+ top: "inception_4d/output"
+}
+
+layer {
+ name: "inception_4e/1x1"
+ type: "Convolution"
+ bottom: "inception_4d/output"
+ top: "inception_4e/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 256
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4e/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_4e/1x1"
+ top: "inception_4e/1x1"
+}
+layer {
+ name: "inception_4e/3x3_reduce"
+ type: "Convolution"
+ bottom: "inception_4d/output"
+ top: "inception_4e/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 160
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.09
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4e/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_4e/3x3_reduce"
+ top: "inception_4e/3x3_reduce"
+}
+layer {
+ name: "inception_4e/3x3"
+ type: "Convolution"
+ bottom: "inception_4e/3x3_reduce"
+ top: "inception_4e/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 320
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4e/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_4e/3x3"
+ top: "inception_4e/3x3"
+}
+layer {
+ name: "inception_4e/5x5_reduce"
+ type: "Convolution"
+ bottom: "inception_4d/output"
+ top: "inception_4e/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 32
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.2
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4e/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_4e/5x5_reduce"
+ top: "inception_4e/5x5_reduce"
+}
+layer {
+ name: "inception_4e/5x5"
+ type: "Convolution"
+ bottom: "inception_4e/5x5_reduce"
+ top: "inception_4e/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4e/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_4e/5x5"
+ top: "inception_4e/5x5"
+}
+layer {
+ name: "inception_4e/pool"
+ type: "Pooling"
+ bottom: "inception_4d/output"
+ top: "inception_4e/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_4e/pool_proj"
+ type: "Convolution"
+ bottom: "inception_4e/pool"
+ top: "inception_4e/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_4e/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_4e/pool_proj"
+ top: "inception_4e/pool_proj"
+}
+layer {
+ name: "inception_4e/output"
+ type: "Concat"
+ bottom: "inception_4e/1x1"
+ bottom: "inception_4e/3x3"
+ bottom: "inception_4e/5x5"
+ bottom: "inception_4e/pool_proj"
+ top: "inception_4e/output"
+}
+
+
+
+layer {
+ name: "inception_5a/1x1"
+ type: "Convolution"
+ bottom: "inception_4e/output"
+ top: "inception_5a/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 256
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5a/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_5a/1x1"
+ top: "inception_5a/1x1"
+}
+
+layer {
+ name: "inception_5a/3x3_reduce"
+ type: "Convolution"
+ bottom: "inception_4e/output"
+ top: "inception_5a/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 160
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.09
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5a/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_5a/3x3_reduce"
+ top: "inception_5a/3x3_reduce"
+}
+
+layer {
+ name: "inception_5a/3x3"
+ type: "Convolution"
+ bottom: "inception_5a/3x3_reduce"
+ top: "inception_5a/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 320
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5a/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_5a/3x3"
+ top: "inception_5a/3x3"
+}
+layer {
+ name: "inception_5a/5x5_reduce"
+ type: "Convolution"
+ bottom: "inception_4e/output"
+ top: "inception_5a/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 32
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.2
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5a/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_5a/5x5_reduce"
+ top: "inception_5a/5x5_reduce"
+}
+layer {
+ name: "inception_5a/5x5"
+ type: "Convolution"
+ bottom: "inception_5a/5x5_reduce"
+ top: "inception_5a/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5a/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_5a/5x5"
+ top: "inception_5a/5x5"
+}
+layer {
+ name: "inception_5a/pool"
+ type: "Pooling"
+ bottom: "inception_4e/output"
+ top: "inception_5a/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_5a/pool_proj"
+ type: "Convolution"
+ bottom: "inception_5a/pool"
+ top: "inception_5a/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5a/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_5a/pool_proj"
+ top: "inception_5a/pool_proj"
+}
+layer {
+ name: "inception_5a/output"
+ type: "Concat"
+ bottom: "inception_5a/1x1"
+ bottom: "inception_5a/3x3"
+ bottom: "inception_5a/5x5"
+ bottom: "inception_5a/pool_proj"
+ top: "inception_5a/output"
+}
+
+layer {
+ name: "inception_5b/1x1"
+ type: "Convolution"
+ bottom: "inception_5a/output"
+ top: "inception_5b/1x1"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 384
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5b/relu_1x1"
+ type: "ReLU"
+ bottom: "inception_5b/1x1"
+ top: "inception_5b/1x1"
+}
+layer {
+ name: "inception_5b/3x3_reduce"
+ type: "Convolution"
+ bottom: "inception_5a/output"
+ top: "inception_5b/3x3_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 1
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 192
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5b/relu_3x3_reduce"
+ type: "ReLU"
+ bottom: "inception_5b/3x3_reduce"
+ top: "inception_5b/3x3_reduce"
+}
+layer {
+ name: "inception_5b/3x3"
+ type: "Convolution"
+ bottom: "inception_5b/3x3_reduce"
+ top: "inception_5b/3x3"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 384
+ pad: 1
+ kernel_size: 3
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5b/relu_3x3"
+ type: "ReLU"
+ bottom: "inception_5b/3x3"
+ top: "inception_5b/3x3"
+}
+layer {
+ name: "inception_5b/5x5_reduce"
+ type: "Convolution"
+ bottom: "inception_5a/output"
+ top: "inception_5b/5x5_reduce"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 48
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5b/relu_5x5_reduce"
+ type: "ReLU"
+ bottom: "inception_5b/5x5_reduce"
+ top: "inception_5b/5x5_reduce"
+}
+layer {
+ name: "inception_5b/5x5"
+ type: "Convolution"
+ bottom: "inception_5b/5x5_reduce"
+ top: "inception_5b/5x5"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ pad: 2
+ kernel_size: 5
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5b/relu_5x5"
+ type: "ReLU"
+ bottom: "inception_5b/5x5"
+ top: "inception_5b/5x5"
+}
+layer {
+ name: "inception_5b/pool"
+ type: "Pooling"
+ bottom: "inception_5a/output"
+ top: "inception_5b/pool"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3
+ stride: 1
+ pad: 1
+ }
+}
+layer {
+ name: "inception_5b/pool_proj"
+ type: "Convolution"
+ bottom: "inception_5b/pool"
+ top: "inception_5b/pool_proj"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 128
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.1
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.2
+ }
+ }
+}
+layer {
+ name: "inception_5b/relu_pool_proj"
+ type: "ReLU"
+ bottom: "inception_5b/pool_proj"
+ top: "inception_5b/pool_proj"
+}
+layer {
+ name: "inception_5b/output"
+ type: "Concat"
+ bottom: "inception_5b/1x1"
+ bottom: "inception_5b/3x3"
+ bottom: "inception_5b/5x5"
+ bottom: "inception_5b/pool_proj"
+ top: "inception_5b/output"
+}
+layer {
+ name: "pool5/drop_s1"
+ type: "Dropout"
+ bottom: "inception_5b/output"
+ top: "pool5/drop_s1"
+ dropout_param {
+ dropout_ratio: 0.4
+ }
+}
+layer {
+ name: "cvg/classifier"
+ type: "Convolution"
+ bottom: "pool5/drop_s1"
+ top: "cvg/classifier"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 1
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.
+ }
+ }
+}
+layer {
+ name: "coverage/sig"
+ type: "Sigmoid"
+ bottom: "cvg/classifier"
+ top: "coverage"
+}
+layer {
+ name: "bbox/regressor"
+ type: "Convolution"
+ bottom: "pool5/drop_s1"
+ top: "bboxes"
+ param {
+ lr_mult: 1
+ decay_mult: 1
+ }
+ param {
+ lr_mult: 2
+ decay_mult: 0
+ }
+ convolution_param {
+ num_output: 4
+ kernel_size: 1
+ weight_filler {
+ type: "xavier"
+ std: 0.03
+ }
+ bias_filler {
+ type: "constant"
+ value: 0.
+ }
+ }
+}
+
+######################################################################
+# End of convolutional network
+######################################################################
+
+# Convert bboxes
+layer {
+ name: "bbox_mask"
+ type: "Eltwise"
+ bottom: "bboxes"
+ bottom: "coverage-block"
+ top: "bboxes-masked"
+ eltwise_param {
+ operation: PROD
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "bbox-norm"
+ type: "Eltwise"
+ bottom: "bboxes-masked"
+ bottom: "size-block"
+ top: "bboxes-masked-norm"
+ eltwise_param {
+ operation: PROD
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "bbox-obj-norm"
+ type: "Eltwise"
+ bottom: "bboxes-masked-norm"
+ bottom: "obj-block"
+ top: "bboxes-obj-masked-norm"
+ eltwise_param {
+ operation: PROD
+ }
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+
+# Loss layers
+layer {
+ name: "bbox_loss"
+ type: "L1Loss"
+ bottom: "bboxes-obj-masked-norm"
+ bottom: "bbox-obj-label-norm"
+ top: "loss_bbox"
+ loss_weight: 2
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+layer {
+ name: "coverage_loss"
+ type: "EuclideanLoss"
+ bottom: "coverage"
+ bottom: "coverage-label"
+ top: "loss_coverage"
+ include { phase: TRAIN }
+ include { phase: TEST stage: "val" }
+}
+
+# Cluster bboxes
+layer {
+ type: 'Python'
+ name: 'cluster'
+ bottom: 'coverage'
+ bottom: 'bboxes'
+ top: 'bbox-list'
+ python_param {
+ module: 'caffe.layers.detectnet.clustering'
+ layer: 'ClusterDetections'
+ param_str : '640, 640, 16, 0.6, 2, 0.02, 22, 1'
+ }
+ include: { phase: TEST }
+}
+
+# Calculate mean average precision
+layer {
+ type: 'Python'
+ name: 'cluster_gt'
+ bottom: 'coverage-label'
+ bottom: 'bbox-label'
+ top: 'bbox-list-label'
+ python_param {
+ module: 'caffe.layers.detectnet.clustering'
+ layer: 'ClusterGroundtruth'
+ param_str : '640, 640, 16, 1'
+ }
+ include: { phase: TEST stage: "val" }
+}
+layer {
+ type: 'Python'
+ name: 'score'
+ bottom: 'bbox-list-label'
+ bottom: 'bbox-list'
+ top: 'bbox-list-scored'
+ python_param {
+ module: 'caffe.layers.detectnet.mean_ap'
+ layer: 'ScoreDetections'
+ }
+ include: { phase: TEST stage: "val" }
+}
+layer {
+ type: 'Python'
+ name: 'mAP'
+ bottom: 'bbox-list-scored'
+ top: 'mAP'
+ top: 'precision'
+ top: 'recall'
+ python_param {
+ module: 'caffe.layers.detectnet.mean_ap'
+ layer: 'mAP'
+ param_str : '640, 640, 16'
+ }
+ include: { phase: TEST stage: "val" }
+}
diff --git a/docs/images/detectnet-digits-custom-network.jpg b/docs/images/detectnet-digits-custom-network.jpg
new file mode 100644
index 000000000..711724b45
Binary files /dev/null and b/docs/images/detectnet-digits-custom-network.jpg differ
diff --git a/docs/images/detectnet-digits-infer-dog.png b/docs/images/detectnet-digits-infer-dog.png
new file mode 100644
index 000000000..5407cd7bf
Binary files /dev/null and b/docs/images/detectnet-digits-infer-dog.png differ
diff --git a/docs/images/detectnet-digits-model-dog.png b/docs/images/detectnet-digits-model-dog.png
new file mode 100644
index 000000000..ab1ed6f92
Binary files /dev/null and b/docs/images/detectnet-digits-model-dog.png differ
diff --git a/docs/images/detectnet-digits-model-download-dog.png b/docs/images/detectnet-digits-model-download-dog.png
new file mode 100644
index 000000000..88fe40110
Binary files /dev/null and b/docs/images/detectnet-digits-model-download-dog.png differ
diff --git a/docs/images/detectnet-digits-new-dataset-dog.png b/docs/images/detectnet-digits-new-dataset-dog.png
new file mode 100644
index 000000000..d02166550
Binary files /dev/null and b/docs/images/detectnet-digits-new-dataset-dog.png differ
diff --git a/docs/images/detectnet-digits-new-model-dog.png b/docs/images/detectnet-digits-new-model-dog.png
new file mode 100644
index 000000000..3eb231d04
Binary files /dev/null and b/docs/images/detectnet-digits-new-model-dog.png differ
diff --git a/docs/images/detectnet-digits-visualization-options-dog.png b/docs/images/detectnet-digits-visualization-options-dog.png
new file mode 100644
index 000000000..c9095a586
Binary files /dev/null and b/docs/images/detectnet-digits-visualization-options-dog.png differ
diff --git a/docs/images/detectnet-tensorRT-airplane-0.jpg b/docs/images/detectnet-tensorRT-airplane-0.jpg
new file mode 100644
index 000000000..09f4c588d
Binary files /dev/null and b/docs/images/detectnet-tensorRT-airplane-0.jpg differ
diff --git a/docs/images/detectnet-tensorRT-bottle-0.jpg b/docs/images/detectnet-tensorRT-bottle-0.jpg
new file mode 100644
index 000000000..a67c64030
Binary files /dev/null and b/docs/images/detectnet-tensorRT-bottle-0.jpg differ
diff --git a/docs/images/detectnet-tensorRT-dog-0.jpg b/docs/images/detectnet-tensorRT-dog-0.jpg
new file mode 100644
index 000000000..dcdccca5b
Binary files /dev/null and b/docs/images/detectnet-tensorRT-dog-0.jpg differ
diff --git a/docs/images/detectnet-tensorRT-dog-1.jpg b/docs/images/detectnet-tensorRT-dog-1.jpg
new file mode 100644
index 000000000..8264b178f
Binary files /dev/null and b/docs/images/detectnet-tensorRT-dog-1.jpg differ
diff --git a/tools/coco2kitti.py b/tools/coco2kitti.py
new file mode 100644
index 000000000..50d3e0828
--- /dev/null
+++ b/tools/coco2kitti.py
@@ -0,0 +1,68 @@
+"""coco2kitti.py: Converts MS COCO annotation files to
+ Kitti format bounding box label files
+__author__ = "Jon Barker"
+"""
+
+import os
+from pycocotools.coco import COCO
+
+def coco2kitti(catNms, annFile):
+
+ # initialize COCO api for instance annotations
+ coco = COCO(annFile)
+
+ # Create an index for the category names
+ cats = coco.loadCats(coco.getCatIds())
+ cat_idx = {}
+ for c in cats:
+ cat_idx[c['id']] = c['name']
+
+ for img in coco.imgs:
+
+ # Get all annotation IDs for the image
+ catIds = coco.getCatIds(catNms=catNms)
+ annIds = coco.getAnnIds(imgIds=[img], catIds=catIds)
+
+ # If there are annotations, create a label file
+ if len(annIds) > 0:
+ # Get image filename
+ img_fname = coco.imgs[img]['file_name']
+ # open text file
+ with open('./labels/' + img_fname.split('.')[0] + '.txt','w') as label_file:
+ anns = coco.loadAnns(annIds)
+ for a in anns:
+ bbox = a['bbox']
+ # Convert COCO bbox coords to Kitti ones
+ bbox = [bbox[0], bbox[1], bbox[2] + bbox[0], bbox[3] + bbox[1]]
+ bbox = [str(b) for b in bbox]
+ catname = cat_idx[a['category_id']]
+ # Format line in label file
+ # Note: all whitespace will be removed from class names
+ out_str = [catname.replace(" ","")
+ + ' ' + ' '.join(['0']*3)
+ + ' ' + ' '.join([b for b in bbox])
+ + ' ' + ' '.join(['0']*8)
+ +'\n']
+ label_file.write(out_str[0])
+
+if __name__ == '__main__':
+
+ # These settings assume this script is in the annotations directory
+ dataDir = '..'
+ dataType = 'train2014'
+ annFile = '%s/annotations/instances_%s.json' % (dataDir, dataType)
+
+ # If this list is populated then label files will only be produced
+ # for images containing the listed classes and only the listed classes
+ # will be in the label file
+ # EXAMPLE:
+ #catNms = ['person', 'dog', 'skateboard']
+ catNms = []
+
+ # Check if a labels file exists and, if not, make one
+ # If it exists already, exit to avoid overwriting
+ if os.path.isdir('./labels'):
+ print('Labels folder already exists - exiting to prevent badness')
+ else:
+ os.mkdir('./labels')
+ coco2kitti(catNms, annFile)