Introduction

This repo gives you a tutorial on how to use a custom backbone for Panoptic-DeepLab with Detectron2.

Installation

Install Detectron2 following the instructions.
Install panopticapi by: pip install git+https://github.com/cocodataset/panopticapi.git.
Note: you will need to install the latest Detectron2 after commit id fa1bc0. The latest v0.3 release of Detectron2 does not support DepthwiseSeparableConv2d and COCO dataset.

Demo

Visualization of Panoptic-DeepLab predictions from demo.py.

Dataset

Detectron2 has builtin support for a few datasets. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

$DETECTRON2_DATASETS/
  coco/
  lvis/
  cityscapes/
  VOC20{07,12}/

You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.

First, prepare the Cityscapes dataset following this expected dataset structure

cityscapes/
  gtFine/
    train/
      aachen/
        color.png, instanceIds.png, labelIds.png, polygons.json,
        labelTrainIds.png
      ...
    val/
    test/
    cityscapes_panoptic_train.json
    cityscapes_panoptic_train/
    cityscapes_panoptic_val.json
    cityscapes_panoptic_val/
    cityscapes_panoptic_test.json
    cityscapes_panoptic_test/
  leftImg8bit/
    train/
    val/
    test/

Install cityscapes scripts by:

pip install git+https://github.com/mcordts/cityscapesScripts.git

Note: to create labelTrainIds.png, first prepare the above structure, then run cityscapesescript with:

CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createTrainIdLabelImgs.py

Note: to generate Cityscapes panoptic dataset, run cityscapesescript with:

CITYSCAPES_DATASET=/path/to/abovementioned/cityscapes python cityscapesscripts/preparation/createPanopticImgs.py

Backbone pre-trained weights

You probably need to use convert-pretrain-model-to-d2.py to convert your pre-trained backbone to the correct format first.

For Xception-65:

# download your pretrained model:
wget https://github.com/LikeLy-Journey/SegmenTron/releases/download/v0.1.0/tf-xception65-270e81cf.pth -O x65.pth
# run the conversion
python convert-pretrain-model-to-d2.py x65.pth x65.pkl

For HRNet-48:

# download your pretrained model:
wget https://optgaw.dm.files.1drv.com/y4mWNpya38VArcDInoPaL7GfPMgcop92G6YRkabO1QTSWkCbo7djk8BFZ6LK_KHHIYE8wqeSAChU58NVFOZEvqFaoz392OgcyBrq_f8XGkusQep_oQsuQ7DPQCUrdLwyze_NlsyDGWot0L9agkQ-M_SfNr10ETlCF5R7BdKDZdupmcMXZc-IE3Ysw1bVHdOH4l-XEbEKFAi6ivPUbeqlYkRMQ -O h48.pth
# run the conversion
python convert-pretrain-model-to-d2.py h48.pth h48.pkl

Panoptic-DeepLab example

Note: the only difference is we rename train_net.py to train_panoptic_deeplab.py.

Training

To train a model with 8 GPUs run:

python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --num-gpus 8

Evaluation

Model evaluation can be done similarly:

python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint

Benchmark network speed

If you want to benchmark the network speed without post-processing, you can run the evaluation script with MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True:

python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True

Detectron2 code structure

The decoder for Panoptic-DeepLab is defined in this file: https://github.com/facebookresearch/detectron2/blob/master/projects/Panoptic-DeepLab/panoptic_deeplab/panoptic_seg.py.
It includes both semantic branch and instance branch.

Cityscapes Panoptic Segmentation

Cityscapes models are trained with ImageNet pretraining.

Regular Conv2d in ASPP and Decoder

Method	Backbone	Output resolution	PQ	SQ	RQ	mIoU	AP	download
Panoptic-DeepLab	X65-DC5	1024×2048	62.6	81.5	75.7	79.4	32.8	model
Panoptic-DeepLab	HRNet-48	1024×2048	63.3	82.2	76.0	80.3	35.9	model

Note:

X65: Xception-65. It is converted from TensorFlow model. You need to convert it with convert-pretrained-model-to-d2.py first.
DC5 means using dilated convolution in res5.
HRNet-48: HighResolutionNet-w48. This checkpoint comes form its original implementation. You need to convert it with convert-pretrained-model-to-d2.py first.
This implementation currently uses a much heavier head (with regular Conv2d) than the original paper.
This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.

DepthwiseSeparableConv2d in ASPP and Decoder

Method	Backbone	Output resolution	PQ	SQ	RQ	mIoU	AP	download
Panoptic-DeepLab (DSConv)	X65-DC5	1024×2048	61.4	81.4	74.3	79.8	32.6	model
Panoptic-DeepLab (DSConv)	HRNet-48	1024×2048	63.4	81.9	76.4	80.6	36.2	model

Note:

This implementation uses DepthwiseSeparableConv2d (DSConv) in ASPP and decoder, which is same as the original paper.
This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.

COCO Panoptic Segmentation

COCO models are trained with ImageNet pretraining.

DepthwiseSeparableConv2d in ASPP and Decoder

Method	Backbone	Output resolution	PQ	SQ	RQ	Box AP	Mask AP	download
Panoptic-DeepLab (DSConv)	X65-DC5	640×640	36.7	77.4	45.8	19.9	20.5	model
Panoptic-DeepLab (DSConv)	HRNet-48	640×640	37.8	78.1	46.9	21.6	22.3	model

Note:

These results are trained with old COCO config files (with MAX_SIZE_TRAIN set to 640 instead of 960), I will try to update these numbers as soon as I have machines to train models
This implementation uses DepthwiseSeparableConv2d (DSConv) in ASPP and decoder, which is same as the original paper.
This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes more time than the network itself.
The reproduced numbers are still lower than the original paper, this is probably due to slightly different data preprocessing.

DeepLab example

Note: the only difference is we rename train_net.py to train_deeplab.py.

Training

To train a model with 8 GPUs run:

python train_deeplab.py --config-file config/Cityscapes-SemanticSegmentation/deeplab_v3_plus_X_65_os16_mg124_poly_90k_bs16.yaml --num-gpus 8

Evaluation

Model evaluation can be done similarly:

python train_deeplab.py --config-file config/Cityscapes-SemanticSegmentation/deeplab_v3_plus_X_65_os16_mg124_poly_90k_bs16.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint

Cityscapes Semantic Segmentation

Cityscapes models are trained with ImageNet pretraining.

Method	Backbone	Output resolution	mIoU	download
DeepLabV3+	X65-DC5	1024×2048	80.1	model
DeepLabV3+	HRNet-48	1024×2048	80.9	model

Note:

X65: Xception-65. It is converted from TensorFlow model. You need to convert it with convert-pretrained-model-to-d2.py first.
DC5 means using dilated convolution in res5.
HRNet-48: HighResolutionNet-w48. This checkpoint comes form its original implementation. You need to convert it with convert-pretrained-model-to-d2.py first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

Installation

Demo

Dataset

Backbone pre-trained weights

Panoptic-DeepLab example

Training

Evaluation

Benchmark network speed

Detectron2 code structure

Cityscapes Panoptic Segmentation

Regular Conv2d in ASPP and Decoder

DepthwiseSeparableConv2d in ASPP and Decoder

COCO Panoptic Segmentation

DepthwiseSeparableConv2d in ASPP and Decoder

DeepLab example

Training

Evaluation

Cityscapes Semantic Segmentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

Installation

Demo

Dataset

Backbone pre-trained weights

Panoptic-DeepLab example

Training

Evaluation

Benchmark network speed

Detectron2 code structure

Cityscapes Panoptic Segmentation

Regular Conv2d in ASPP and Decoder

DepthwiseSeparableConv2d in ASPP and Decoder

COCO Panoptic Segmentation

DepthwiseSeparableConv2d in ASPP and Decoder

DeepLab example

Training

Evaluation

Cityscapes Semantic Segmentation