Skip to content


Latest commit

1ddda6a · Sep 20, 2021


This branch is 7 commits ahead of, 53 commits behind onnx/models:main.


Folders and files

Last commit message
Last commit date

parent directory

Jul 23, 2020
Sep 20, 2021



SSD-MobilenetV1 is an object detection model that uses a Single Shot MultiBox Detector (SSD) approach to predict object classes for boundary boxes.

SSD is a CNN that enables the model to only need to take one single shot to detect multiple objects in an image, and MobileNet is a CNN base network that provides high-level features for object detection. The combination of these two model frameworks produces an efficient, high-accuracy detection model that requires less computational cost.

The SSD-MobilenetV1 is suitable for mobile and embedded vision applications.


Model Download Download (with sample test data) ONNX version Opset version
SSD-MobilenetV1 29.3 MB 27.9 MB 1.7.0 10


Tensorflow SSD-MobileNetV1 ==> ONNX SSD-MobileNetV1


Running inference

Refer to this conversion and inference notebook for more details on how to inference this model using onnxruntime and define environment variables for the model.

import onnxruntime as rt

# Load model and run inference
# Start from ORT 1.10, ORT requires explicitly setting the providers parameter if you want to use execution providers
# other than the default CPU provider (as opposed to the previous behavior of providers getting set/registered by default
# based on the build flags) when instantiating InferenceSession.
# For example, if NVIDIA GPU is available and ORT Python package is built with CUDA, then call API as following:
# rt.InferenceSession(path/to/model, providers=['CUDAExecutionProvider'])
sess = rt.InferenceSession(os.path.join(WORK, MODEL + ".onnx"))
result =, {"image_tensor:0": img_data})
num_detections, detection_boxes, detection_scores, detection_classes = result

# print number of detections

# produce outputs in this order
outputs = ["num_detections:0", "detection_boxes:0", "detection_scores:0", "detection_classes:0"]


This model does not require fixed image dimensions. Input batch size is 1, with 3 color channels. Image has these variables: (batch_size, height, width, channels).


The following code shows how preprocessing is done. For more information and an example on how preprocessing is done, please visit the tf2onnx conversion and inference notebook for this model.

import numpy as np
from PIL import Image, ImageDraw, ImageColor
import math
import matplotlib.pyplot as plt

# open and display image file
img ="image file")

# reshape the flat array returned by img.getdata() to HWC and than add an additial
dimension to make NHWC, aka a batch of images with 1 image in it
img_data = np.array(img.getdata()).reshape(img.size[1], img.size[0], 3)
img_data = np.expand_dims(img_data.astype(np.uint8), axis=0)


It outputs the image with boundary boxes and labels. The full list of classes can be found in the COCO dataset.

Given each batch of images, the model returns 4 tensor arrays:

num_detections: the number of detections.

detection_boxes: a list of bounding boxes. Each list item describes a box with top, left, bottom, right relative to the image size.

detection_scores: the score for each detection with values between 0 and 1 representing probability that a class was detected.

detection_classes: Array of 10 integers (floating point values) indicating the index of a class label from the COCO class.


# draw boundary boxes and label for each detection
def draw_detection(draw, d, c):
    width, height =
    # the box is relative to the image size so we multiply with height and width to get pixels
    top = d[0] * height
    left = d[1] * width
    bottom = d[2] * height
    right = d[3] * width
    top = max(0, np.floor(top + 0.5).astype('int32'))
    left = max(0, np.floor(left + 0.5).astype('int32'))
    bottom = min(height, np.floor(bottom + 0.5).astype('int32'))
    right = min(width, np.floor(right + 0.5).astype('int32'))
    label = coco_classes[c]
    label_size = draw.textsize(label)
    if top - label_size[1] >= 0:
        text_origin = tuple(np.array([left, top - label_size[1]]))
        text_origin = tuple(np.array([left, top + 1]))
    color = ImageColor.getrgb("red")
    thickness = 0
    draw.rectangle([left + thickness, top + thickness, right - thickness, bottom - thickness],
    draw.text(text_origin, label, fill=color), font=font)

# loop over the results - each returned tensor is a batch
batch_size = num_detections.shape[0]
draw = ImageDraw.Draw(img)
for batch in range(0, batch_size):
    for detection in range(0, int(num_detections[batch])):
        c = detection_classes[batch][detection]
        d = detection_boxes[batch][detection]
        draw_detection(draw, d, c)

# show image file with object detection boundary boxes and labels
plt.figure(figsize=(80, 40))

Model Creation

Dataset (Train and validation)

The model was trained using MS COCO 2017 Train Images, Val Images, and Train/Val annotations.


Training details for the SSD-MobileNet model's preprocessing is found in this tutorial notebook. The notebook also details how the ONNX model was converted.


Tensorflow to ONNX conversion tutorial. The notebook references how to run an evaluation on the SSD-MobilenetV1 model and export it as a saved model. It also details how to convert the tensorflow model into onnx, and how to run its preprocessing and postprocessing code for the inputs and outputs.


Shirley Su


MIT License