📄 Citation

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, and Limin Wang

🦜 Introduction

This paper proposes TimeSuite, a collection of new designs to adapt the existing short-form video MLLMs for long video understanding, including a simple yet efficient framework to process long video sequence, a high-quality video dataset for grounded tuning of MLLMs, and a carefully-designed instruction tuning task to explicitly incorporate the grounding supervision in the traditional QA format.

State-of-the-art performance: VideoChat-T demonstrates high performance for both long-form video question answering and temporal grounding.

Highly efficient model architecture with exceptional inference speed, encoding each video frame into just 3 tokens, leading to the flops of our VideoChat-T are 5.1% of Llava-OneVision

High-quality data

We introduced the comprehensive dataset TimePro, which includes 9 task types with video sources from 15 different datasets.
We designed a novel Temporal Grounded Caption fine-tuning task to effectively mitigate hallucinations in MLLM.

🔥 Updates

TODO

Inference & Demo

TODO

Evaluation Results

TODO

Grounded Training

TODO

📄 Citation

If you find this project useful in your research, please consider cite:

@article{zeng2024timesuite,
  title={Timesuite: Improving mllms for long video understanding via grounded tuning},
  author={Zeng, Xiangyu and Li, Kunchang and Wang, Chenting and Li, Xinhao and Jiang, Tianxiang and Yan, Ziang and Li, Songze and Shi, Yansong and Yue, Zhengrong and Wang, Yi and others},
  journal={arXiv preprint arXiv:2410.19702},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

🦜 Introduction

🔥 Updates

Inference & Demo

Evaluation Results

Grounded Training

📄 Citation

💫 Acknowledgement

About

Releases

Packages

Contributors 2

License

OpenGVLab/TimeSuite

Folders and files

Latest commit

History

Repository files navigation

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

🦜 Introduction

🔥 Updates

Inference & Demo

Evaluation Results

Grounded Training

📄 Citation

💫 Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages