A beginner-friendly tutorial to start an object detection deep learning project with PyTorch & the Faster-RCNN architecture. Based on the blog series "Train your own object detector with Faster-RCNN & PyTorch"
A complete jupyter notebook for training can be found in the training script. Alternatively,
there is the same training script as a .py
file.
Besides the training script, I provide jupyter-notebooks to create & explore a dataset, run inference and visualize anchor boxes:
The visual.py script contains the code to visualize a dataset, a list of images, anchor boxes or to create annotations for a dataset. The provided code for this script was written around napari 0.4.9. Other dependencies can be found in requirements.txt.
- Set up a new environment with an environment manager (recommended):
- Install the libraries:
pip install -r requirements.txt
- Start a jupyter server:
jupyter-notebook
ORjupyter-lab
Note: This will install the CPU-version of torch. If you want to use a GPU or TPU, please refer to the instructions
on the PyTorch website. To check whether pytorch uses the nvidia gpu, check
if torch.cuda.is_available()
returns True
in a python shell.
Windows user: If you can not start jupyter-lab or jupyter-notebook on Windows because of
ImportError: DLL load failed while importing win32api
, try to run conda install pywin32
with the conda package
manager.
These are the libraries that are used in this project:
- High-level deep learning library for PyTorch: PyTorch Lightning
- Visualization software: Custom code with the image-viewer Napari
- [OPTIONAL] Experiment tracking software/logging module: Neptune
If you want to use Neptune for your own experiments, add the NEPTUNE
environment variable to
your system. Otherwise, deactivate it in the scripts.
The dataset consists of 20 selfie-images randomly selected from the internet.
Most of the model's code is based on PyTorch's Faster-RCNN implementation. Metrics can be computed based on the PASCAL VOC (Visual Object Classes) evaluator in the metrics section.
Anchor sizes/aspect ratios are really important for training a Faster-RCNN model (but also similar models like SSD, YOLO). These "default" boxes are compared to those outputted by the network, therefore choosing adequate sizes/ratios can be critical for the success of a project. The PyTorch implementation of the AnchorGenerator (and also the helper classes here) generally expect the following format:
- anchor_size:
Tuple[Tuple[int, ...], ...]
- aspect_ratios:
Tuple[Tuple[float, ...]]
The ResNet backbone without the FPN always returns a single feature map that is used to create anchor boxes. Because of
that we must create a Tuple
that contains a single Tuple
: e.g. ((32, 64, 128, 256, 512),)
or (((32, 64),)
With FPN we can use 4 feature maps (output from a ResNet + FPN) and map our anchor sizes with the feature maps. Because
of that we must create a Tuple
that contains exactly 4 Tuples
: e.g. ((32,), (64,), (128,), (256,))
or ((8, 16, 32), (32, 64), (32, 64, 128, 256, 512), (200, 300))
Examples on how to create a Faster-RCNN model with pretrained ResNet backbone (ImageNet), examples are given in the tests section. Pay special attention to the test_faster_RCNN.py::test_get_faster_rcnn_resnet. Recommendation: Run the test in debugger mode.
- Sliders in the inference script do not work right now due to dependency updates.