This repo provides a reference implementation of “SGDCL: Semantic-Guided Dynamic Correlation Learning for Explainable Autonomous Driving”.
By learning expressive representations, deep learning (DL) has revolutionized autonomous driving (AD). Despite significant advancements, the inherent opacity of DL models engenders public distrust, impeding their widespread adoption. For explainable autonomous driving, current studies primarily concentrate on extracting features from input scenes to predict driving actions and their corresponding explanations. However, these methods underutilize semantics and correlation information within actions and explanations (collectively called categories in this work), leading to suboptimal performance. To address this issue, we propose Semantic-Guided Dynamic Correlation Learning (SGDCL), a novel approach that effectively exploits semantic richness and dynamic interactions intrinsic to categories. SGDCL employs a semantic-guided learning module to obtain category-specific representations and a dynamic correlation learning module to adaptively capture intricate correlations among categories. Additionally, we introduce an innovative loss term to leverage fine-grained co-occurrence statistics of categories for refined regularization. We extensively evaluate SGDCL on two well-established benchmarks, demonstrating its superiority over seven state-of-the-art baselines and a large vision-language model. SGDCL significantly promotes explainable autonomous driving with up to
BDD-OIA, a subset of BDD100K, contains 22,924 video frames, each annotated with 4 action decisions and 21 human-defined explanations.
Head to X-OIA to download the dataset.
Following Xu et al., only the final frame of each video clip is used, leading to a training set of 16,082 images, a validation set of 2,270 images and a test set of 4,572 images.
PSI includes 11,902 keyframes, each annotated with 3 actions and explanations provided in natural language.
Head to PSI to download the dataset.
Following Zhang et al., all samples are split into training, validation, and test sets with a ratio of 7/1/2.
Our experiments are conducted on Ubuntu 22.04, a single NVIDIA GeForce RTX 3080 GPU, 64GB RAM, and Intel i70-11700K. SDDCL is implemented by Python 3.9
, PyTorch 2.0.1
, and Cuda 11.7
.
Step 1: Install Anaconda
Step 2: Create a virtual environment and install the required packages
# create a new environment
conda create -n SGDCL python=3.9
# activate environment
conda activate SGDCL
# install Pytorch
pip install torch torchvision torchaudio
# install other required packages
pip install -r requirements.txt
Step 0: Create some folder
mkdir bddoia log save_model weight
Step 1: Download datasets: put BDD-OIA data in bddoia.
Step 2: Download pre-trained weight from NLE-DM and put it in folder "weight".
Step 3: Generate adjacency information.
python GenAdj.py
Step 4: Generate sentence embeddings.
python LabelSemantic.py
Step 5: Train model.
python train_OIA.py
Step 6: Test model.
python prediction_OIA.py
Unless otherwise specified, we use the following default hyperparameter settings.
Param | Value | Param | Value |
---|---|---|---|
learning rate | 0.001 | batch_size | 2 |
momentum | 0.9 | epoches | 50 |
weight decay | 0.0001 | cross attention dim | 8 |
GNN hidden dim | 8 | GNN output dim | 16 |
GNN attention head | 8 | classifier hidden dim | 64 |
We would like to thank the great work: NLE-DM and its code repo.
If you find our paper or code are useful for your research, please consider citing us:
@inproceedings{cao2024sdgcl,
author = {Chengtai Cao and Xinhong Chen and Jianping Wang and Qun Song and Rui Tan and Yung-Hui Li},
title = {SGDCL: Semantic-Guided Dynamic Correlation Learning for Explainable Autonomous Driving},
booktitle={International Joint Conference on Artificial Intelligence},
pages={596--604},
year = {2024}
}
Any comments and feedbacks are appreciated.