This repository contains the public release of my Master Thesis : the python implementation of Camera-LiDAR-Map-Fusionmodel network for 3D object detection.
If you use this code, please cite as follows:
@mastersthesis{SYang2022,
author = {Shilu Yang},
title = {Implementation of a machine learning algorithm for heterogeneous data},
school = {University of Stuttgart},
year = 2022,
month = 7
}
The Camera-LiDAR-Map-Fusionmodel is a multi-modal 3D detection network, which contains one feature extraction stage and two fusion stages: a) Feature Extraction: Using the OpenPCDet and MMDetection codebases assures a modular framework, allowing for support of a variety of popular 2D and 3D feature extractors as well as 3D datasets. b) Fist Fusion (MapFusion): feature-level LiDAR and map data fusion. c) Second Fusion (Proposal Fusion): late fusion of camera 2D proposals with LiDAR 3D proposals. The experiment on a subset nuScenes dataset showed that, as compared to the baseline SOTA 3D detector in 2021 (CenterPoint), the MapFusion model improves accuracy by 2.4% mAP, while the usage of the Late Fusion improves precision by 5.7% mAP. Aside from that, the outcomes on the KITTI dataset and the self-built Robotino dataset showed similar accuracy enhancement performance. These results suggest that the Camera-LiDAR-Map-Fusionmodel is a feasible fusion model for 3D object detection, with good usability and extensibility.
Baseline: CenterPoint + YOLOX
Car detection
Model [email protected] 1.0 2.0 4.0 mean Improvement
Baseline 53.31 65.24 72.21 74.50 66.32 0.00
MapFusion 54.52 68.14 74.89 77.26 68.70 2.38
Late Fusion 59.35 74.01 81.15 82.82 74.33 8.02
ATE ASE AOE AVE AAE NDS
Baseline 0.2510 0.1850 0.3040 0.6170 0.2430 0.6716
MapFusion 0.2570 0.1860 0.2870 0.7110 0.2510 0.6743
Late Fusion 0.2580 0.1830 0.3110 0.7370 0.2580 0.6970
Baseline: Second + YOLOX
Car AP_R40 @0.70 0.70 0.70
Benchmark Easy Moderate Hard
bbox [email protected] 96.83 / 95.42 91.16 / 89.21 88.24 / 88.03
bev [email protected] 93.60 / 93.46 87.23 / 86.58 84.16 / 84.27
3d [email protected] 86.48 / 84.08 74.24 / 72.04 70.78 / 68.83
aos AP 96.76 / 95.35 90.90 / 88.98 87.85 / 87.65
Baseline: PointPillar + EfficientDet
Robotino [email protected], 0.70, 0.70 / Robotino [email protected], 0.50, 0.50
Benchmark Overlap_0.7 Benchmark Overlap_0.5
bbox [email protected] 77.45 / 74.82 bbox [email protected] 77.45 / 74.82
bev [email protected] 38.83 / 35.61 bev [email protected] 42.11 / 38.78
3d [email protected] 31.25 / 31.43 3d [email protected] 42.11 / 38.78
aos AP 66.42 / 61.48 aos AP 66.42 / 61.48
This repository is the implementation of the LiDAR-Camera-Map Fusion Model on nuScenes dataset, with baseline 3D detector Centerpoint.
Codebase: Centerpoint
This is the implementation of late fusion model CLOCs on KITTI and Robotino dataset, with baseline 3D detector SECOND and PointPillars.
Codebase: OpenPCDet
This is the repository we used for 2D detector (YOLOX, Cascade RCNN, YOLOV3) training.
It is forked from open-mmlab / mmdetection, we have implemented KITTI dataset in the corresponding trainable 2D dataset format. For the training with nuScenes and Robotino, just follow the COCO format.
The used SOTA detector EfficientDet is also included.
For the use of mmdetection, please follow the official tutorial. It is also OK to use the newest version from official repo.
MapFusion: A General Framework for 3D Object Detection with HDMaps
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
Leveraging HD Maps for 3d Object Detection