This is an official implementation of our paper accepted to ACM Multimedia'2023: ATM: Action Temporality Modeling for Video Question Answering
The code is mainly developed from VGT. Thanks the authors for the great work and code.
Assume you have installed Anaconda, please do the following to setup the envs:
>conda create -n videoqa python==3.8
>conda activate videoqa
>pip install -r requirements.txt
Create the data annotation folder inside data/
. Download the csv files from annotations into data/dataset/nextqa
.
Download the folders from features into data/features/nextqa'''. Donwload the checkpoints into
data/save_models/nextqa/```.
sh ./shell/next_test.sh 0
sh ./shells/next_train.sh 0
sh ./shells/next_ft.sh 0
@article{chen2023atm,
title={ATM: Action Temporality Modeling for Video Question Answering},
author={Chen, Junwen and Zhu, Jie and Kong, Yu},
journal={ACM Multimedia},
year={2023}
}