[WIP][Benchmark] Add MEGA-Bench core and core_single_image support #724

TianhaoLiang2000 · 2025-01-15T17:33:14Z

Hi, VLMEvalKit team,

This PR incorporates our recent work, MEGA-Bench, a multimodal evaluation suite with over 500 real-world tasks and 45 metrics.

The evaluation process involves two steps: 1) run VLMEvalKit to produce the response/submission file; 2) run our evaluator with 45 diverse metrics to get the scores.

Example usage:

python3 run.py
--data MEGABench_core_single_image_16frame
--model Qwen2-VL-7B-Instruct
--verbose
--work-dir ~/LMUData

This PR implements the MEGA-Bench core subset and the core_single_image subset, and the core_single_image subset has been successfully tested using the Qwen2VL-7B model, confirming its functionality and compatibility. The open-ended subset will be committed soon in the same PR.

TianhaoLiang2000 added 3 commits January 15, 2025 12:21

add MEGA-Bench core dataset support

298a8fa

add MEGA-Bench core dataset support

39fed6c

add MEGA-Bench core dataset support

512a821

TianhaoLiang2000 closed this Jan 15, 2025

TianhaoLiang2000 reopened this Jan 15, 2025

TianhaoLiang2000 changed the title ~~[Benchmark] Add MEGA-Bench core and core_single_image support~~ [WIP][Benchmark] Add MEGA-Bench core and core_single_image support Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Benchmark] Add MEGA-Bench core and core_single_image support #724

[WIP][Benchmark] Add MEGA-Bench core and core_single_image support #724

TianhaoLiang2000 commented Jan 15, 2025 •

edited

Loading

[WIP][Benchmark] Add MEGA-Bench core and core_single_image support #724

Are you sure you want to change the base?

[WIP][Benchmark] Add MEGA-Bench core and core_single_image support #724

Conversation

TianhaoLiang2000 commented Jan 15, 2025 • edited Loading

TianhaoLiang2000 commented Jan 15, 2025 •

edited

Loading