This repo gives you a tutorial on how to use a custom backbone for Panoptic-DeepLab with Detectron2.
- Install Detectron2 following the instructions.
- Install panopticapi by:
pip install git+https://github.com/cocodataset/panopticapi.git
. - Note: you will need to install the latest Detectron2 after commit id fa1bc0. The latest v0.3 release of Detectron2 does not support DepthwiseSeparableConv2d and COCO dataset.
Visualization of Panoptic-DeepLab predictions from demo.py
.
Detectron2 has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
DETECTRON2_DATASETS
.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
$DETECTRON2_DATASETS/
coco/
lvis/
cityscapes/
VOC20{07,12}/
You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets
.
If left unset, the default is ./datasets
relative to your current working directory.
First, prepare the Cityscapes dataset following this expected dataset structure
cityscapes/
gtFine/
train/
aachen/
color.png, instanceIds.png, labelIds.png, polygons.json,
labelTrainIds.png
...
val/
test/
cityscapes_panoptic_train.json
cityscapes_panoptic_train/
cityscapes_panoptic_val.json
cityscapes_panoptic_val/
cityscapes_panoptic_test.json
cityscapes_panoptic_test/
leftImg8bit/
train/
val/
test/
Install cityscapes scripts by:
pip install git+https://github.com/mcordts/cityscapesScripts.git
Note: to create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with:
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createTrainIdLabelImgs.py
Note: to generate Cityscapes panoptic dataset, run cityscapesescript with:
CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createPanopticImgs.py
You probably need to use convert-pretrain-model-to-d2.py
to convert your pre-trained backbone to the correct format first.
For Xception-65:
# download your pretrained model:
wget https://github.com/LikeLy-Journey/SegmenTron/releases/download/v0.1.0/tf-xception65-270e81cf.pth -O x65.pth
# run the conversion
python convert-pretrain-model-to-d2.py x65.pth x65.pkl
For HRNet-48:
# download your pretrained model:
wget https://optgaw.dm.files.1drv.com/y4mWNpya38VArcDInoPaL7GfPMgcop92G6YRkabO1QTSWkCbo7djk8BFZ6LK_KHHIYE8wqeSAChU58NVFOZEvqFaoz392OgcyBrq_f8XGkusQep_oQsuQ7DPQCUrdLwyze_NlsyDGWot0L9agkQ-M_SfNr10ETlCF5R7BdKDZdupmcMXZc-IE3Ysw1bVHdOH4l-XEbEKFAi6ivPUbeqlYkRMQ -O h48.pth
# run the conversion
python convert-pretrain-model-to-d2.py h48.pth h48.pkl
Note: the only difference is we rename train_net.py
to train_panoptic_deeplab.py
.
To train a model with 8 GPUs run:
python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --num-gpus 8
Model evaluation can be done similarly:
python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint
If you want to benchmark the network speed without post-processing, you can run the evaluation script with MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True
:
python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True
The decoder for Panoptic-DeepLab is defined in this file: https://github.com/facebookresearch/detectron2/blob/master/projects/Panoptic-DeepLab/panoptic_deeplab/panoptic_seg.py.
It includes both semantic branch and instance branch.
Cityscapes models are trained with ImageNet pretraining.
Method | Backbone | Output resolution |
PQ | SQ | RQ | mIoU | AP | download |
---|---|---|---|---|---|---|---|---|
Panoptic-DeepLab | X65-DC5 | 1024×2048 | 62.6 | 81.5 | 75.7 | 79.4 | 32.8 | model |
Panoptic-DeepLab | HRNet-48 | 1024×2048 | 63.3 | 82.2 | 76.0 | 80.3 | 35.9 | model |
Note:
- X65: Xception-65. It is converted from TensorFlow model. You need to convert it with
convert-pretrained-model-to-d2.py
first. - DC5 means using dilated convolution in
res5
. - HRNet-48: HighResolutionNet-w48. This checkpoint comes form its original implementation. You need to convert it with
convert-pretrained-model-to-d2.py
first. - This implementation currently uses a much heavier head (with regular Conv2d) than the original paper.
- This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.
Method | Backbone | Output resolution |
PQ | SQ | RQ | mIoU | AP | download |
---|---|---|---|---|---|---|---|---|
Panoptic-DeepLab (DSConv) | X65-DC5 | 1024×2048 | 61.4 | 81.4 | 74.3 | 79.8 | 32.6 | model |
Panoptic-DeepLab (DSConv) | HRNet-48 | 1024×2048 | 63.4 | 81.9 | 76.4 | 80.6 | 36.2 | model |
Note:
- This implementation uses DepthwiseSeparableConv2d (DSConv) in ASPP and decoder, which is same as the original paper.
- This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.
COCO models are trained with ImageNet pretraining.
Method | Backbone | Output resolution |
PQ | SQ | RQ | Box AP | Mask AP | download |
---|---|---|---|---|---|---|---|---|
Panoptic-DeepLab (DSConv) | X65-DC5 | 640×640 | 36.7 | 77.4 | 45.8 | 19.9 | 20.5 | model |
Panoptic-DeepLab (DSConv) | HRNet-48 | 640×640 | 37.8 | 78.1 | 46.9 | 21.6 | 22.3 | model |
Note:
- These results are trained with old COCO config files (with
MAX_SIZE_TRAIN
set to 640 instead of 960), I will try to update these numbers as soon as I have machines to train models - This implementation uses DepthwiseSeparableConv2d (DSConv) in ASPP and decoder, which is same as the original paper.
- This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.
- The reproduced numbers are still lower than the original paper, this is probably due to slightly different data preprocessing.
Note: the only difference is we rename train_net.py
to train_deeplab.py
.
To train a model with 8 GPUs run:
python train_deeplab.py --config-file config/Cityscapes-SemanticSegmentation/deeplab_v3_plus_X_65_os16_mg124_poly_90k_bs16.yaml --num-gpus 8
Model evaluation can be done similarly:
python train_deeplab.py --config-file config/Cityscapes-SemanticSegmentation/deeplab_v3_plus_X_65_os16_mg124_poly_90k_bs16.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint
Cityscapes models are trained with ImageNet pretraining.
Method | Backbone | Output resolution |
mIoU | download |
---|---|---|---|---|
DeepLabV3+ | X65-DC5 | 1024×2048 | 80.1 | model |
DeepLabV3+ | HRNet-48 | 1024×2048 | 80.9 | model |
Note:
- X65: Xception-65. It is converted from TensorFlow model. You need to convert it with
convert-pretrained-model-to-d2.py
first. - DC5 means using dilated convolution in
res5
. - HRNet-48: HighResolutionNet-w48. This checkpoint comes form its original implementation. You need to convert it with
convert-pretrained-model-to-d2.py
first.