Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

🏠 LMMs-Lab Homepage | discord/lmms-eval | 🎓 Project Page |📝 Arxiv Paper

Annoucement

[2025-1] 🎉🎉 We introduce VideoMMMU, a massive, multi-modal, multi-disciplinary video benchmark that evaluates the knowledge acquisition capability from educational videos.

Evaluation

The evaluation of VideoMMMU is integrated into LMMs-Eval. Below is a detailed instruction of the evaluation.

Installation

For formal usage, you can install the package from PyPI by running the following command:

pip install lmms-eval

For development, you can install the package by cloning the repository and running the following command:

git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
cd lmms-eval
pip install -e .

If you want to test LLaVA, you will have to clone their repo from LLaVA and

git clone https://github.com/LLaVA-VL/LLaVA-NeXT
cd LLaVA-NeXT
pip install -e .

Command

Evaluation of LLaVA-OneVision on VideoMMMU

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32,torch_dype=bfloat16 \
    --tasks video_mmmu \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Evaluate a single track of VideoMMMU

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=32,torch_dype=bfloat16 \
    --tasks video_mmmu_perception \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Evaluate the question_only track of VideoMMMU (Knowledge Acquisition Experiment)

accelerate launch --num_processes=1 --main_process_port 12345 -m lmms_eval \
--model llava_onevision \
--model_args pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=1,torch_dype=bfloat16 \
    --tasks video_mmmu_adaptation_question_only \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix debug \
    --output_path ./logs/

Video-MMMU Leaderboard

We evaluate various open-source and proprietary LMMs. The table below provides a detailed comparison. For submitting your model results, please contact Kairui Hu at kairuih12@gmail.com or Bo Li at drluodian@gmail.com.

Model	Overall	Perception	Comprehension	Adaptation	Δknowledge
Human Expert	74.44	84.33	78.67	60.33	+33.1
Claude-3.5-Sonnet	65.78	72.00	69.67	55.67	+11.4
GPT-4o	61.22	66.00	62.00	55.67	+15.6
Qwen-2.5-VL-72B	60.22	69.33	61.00	50.33	+9.7
Gemini 1.5 Pro	53.89	59.00	53.33	49.33	+8.7
Aria	50.78	65.67	46.67	40.00	+3.2
Gemini 1.5 Flash	49.78	57.33	49.00	43.00	-3.3
LLaVA-Video-72B	49.67	59.67	46.00	43.33	+7.1
LLaVA-OneVision-72B	48.33	59.67	42.33	43.00	+6.6
Qwen-2.5-VL-7B	47.44	58.33	44.33	39.67	+2.2
mPLUG-Owl3-7B	42.00	49.33	38.67	38.00	+7.5
MAmmoTH-VL-8B	41.78	51.67	40.00	33.67	+1.5
InternVL2-8B	37.44	47.33	33.33	31.67	-8.5
LLaVA-Video-7B	36.11	41.67	33.33	33.33	-5.3
VILA1.5-40B	34.00	38.67	30.67	32.67	+9.4
Llama-3.2-11B	30.00	35.67	32.33	22.00	-
LongVA-7B	23.98	24.00	24.33	23.67	-7.0
VILA1.5-8B	20.89	20.33	17.33	25.00	+5.9

Citation

@article{hu2025videommmu,
    title={Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos},
    author={Kairui Hu and Penghao Wu and Fanyi Pu and Wang Xiao and Yuanhan Zhang and Xiang Yue and Bo Li and Ziwei Liu},
    booktitle={arXiv preprint arXiv:2501.13826},
    year={2025},
    url={https://arxiv.org/abs/2501.13826}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Annoucement

Evaluation

Installation

Command

Video-MMMU Leaderboard

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Annoucement

Evaluation

Installation

Command

Video-MMMU Leaderboard

Citation