Skip to content

Commit

Permalink
[Improvement] Launch Evaluation w. Config (open-compass#610)
Browse files Browse the repository at this point in the history
* [Improvement] Support Launching Eval w. --config

* Fix Inference Scripts

* [Fix] Fix build_from_config issues

* improve run.py -h help message

* Refactor Doc

* add doc for the config system

* [Fix] aria on mvbench and llava_onevision in video benchmark

* [Fix] Idefics on mlvu, tempcompass and mvbench

* [Model] Support SmolVLM (open-compass#615)

* adding smolvlm

* changing model path

* fix

* pre-commit fixes

* [Fix] Enable --reuse with resume from original pkl files with same commit id (open-compass#613)

* [Fix] Enable --reuse with resume from original pkl files with same commit id

* only transfer target dataset .pkl file

* update reuse logic

---------

Co-authored-by: kennymckormick <[email protected]>

* [Benchmark] Support MM-Math (open-compass#618)

* [Benchmark] Support MM-Math Evaluation

* update README

* update README

* [Benchmark] Measuring Quantitative Spatial Reasoning with the Q-Spatial Bench🔥 (open-compass#569)

* add Q-Spatial Bench

* perform pre-commit

* use spatialprompt_single

---------

Co-authored-by: Haodong Duan <[email protected]>

* update README

* [Improvement] Support Launching Eval w. --config

---------

Co-authored-by: FangXinyu-0913 <[email protected]>
Co-authored-by: Miquel Farré <[email protected]>
Co-authored-by: Andrew <[email protected]>
  • Loading branch information
4 people authored Nov 21, 2024
1 parent f9bbe27 commit 1306be9
Show file tree
Hide file tree
Showing 15 changed files with 357 additions and 124 deletions.
57 changes: 57 additions & 0 deletions docs/en/ConfigSystem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Config System

By default, VLMEvalKit launches the evaluation by setting the model name(s) (defined in `/vlmeval/config.py`) and dataset name(s) (defined in `vlmeval/dataset/__init__.py`) in the `run.py` script with the `--model` and `--data` arguments. Such approach is simple and efficient in most scenarios, however, it may not be flexible enough when the user wants to evaluate multiple models / datasets with different settings.

To address this, VLMEvalKit provides a more flexible config system. The user can specify the model and dataset settings in a json file, and pass the path to the config file to the `run.py` script with the `--config` argument. Here is a sample config json:

```json
{
"model": {
"GPT4o_20240806_T00_HIGH": {
"class": "GPT4V",
"model": "gpt-4o-2024-08-06",
"temperature": 0,
"img_detail": "high"
},
"GPT4o_20240806_T10_Low": {
"class": "GPT4V",
"model": "gpt-4o-2024-08-06",
"temperature": 1.0,
"img_detail": "low"
}
},
"data": {
"MME-RealWorld-Lite": {
"class": "MMERealWorld",
"dataset": "MME-RealWorld-Lite"
},
"MMBench_DEV_EN_V11": {
"class": "ImageMCQDataset",
"dataset": "MMBench_DEV_EN_V11"
}
}
}
```

Explanation of the config json:

1. Now we support two fields: `model` and `data`, each of which is a dictionary. The key of the dictionary is the name of the model / dataset (set by the user), and the value is the setting of the model / dataset.
2. For items in `model`, the value is a dictionary containing the following keys:
- `class`: The class name of the model, which should be a class name defined in `vlmeval/vlm/__init__.py` (open-source models) or `vlmeval/api/__init__.py` (API models).
- Other kwargs: Other kwargs are model-specific parameters, please refer to the definition of the model class for detailed usage. For example, `model`, `temperature`, `img_detail` are arguments of the `GPT4V` class. It's noteworthy that the `model` argument is required by most model classes.
3. For the dictionary `data`, we suggest users to use the official dataset name as the key (or part of the key), since we frequently determine the post-processing / judging settings based on the dataset name. For items in `data`, the value is a dictionary containing the following keys:
- `class`: The class name of the dataset, which should be a class name defined in `vlmeval/dataset/__init__.py`.
- Other kwargs: Other kwargs are dataset-specific parameters, please refer to the definition of the dataset class for detailed usage. Typically, the `dataset` argument is required by most dataset classes.

Saving the example config json to `config.json`, you can launch the evaluation by:

```bash
python run.py --config config.json
```

That will generate the following output files under the working directory `$WORK_DIR` (Following the format `{$WORK_DIR}/{$MODEL_NAME}/{$MODEL_NAME}_{$DATASET_NAME}_*`):

- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MME-RealWorld-Lite*`
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MME-RealWorld-Lite*`
- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MMBench_DEV_EN_V11*`
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MMBench_DEV_EN_V11*`
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# 🛠️ How to implement a new Benchmark / VLM in VLMEvalKit?
# Develop new Benchmark / MLLM

> 🛠️ How to implement a new Benchmark / VLM in VLMEvalKit?
## Implement a new benchmark

Expand Down
File renamed without changes.
23 changes: 8 additions & 15 deletions docs/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,29 +17,22 @@ We always welcome users' PRs (Pull Requests) and Issues to improve VLMEvalKit!
:maxdepth: 1
:caption: Start Your First Step

get_started/Quickstart.md


.. .. _Tutorials:
.. .. toctree::
.. :maxdepth: 1
.. :caption: Tutorials
.. user_guides/framework_overview.md
Quickstart.md

.. _Advanced Tutorial:
.. toctree::
:maxdepth: 1
:caption: Advanced Tutorial

advanced_guides/Development.md
Development.md
ConfigSystem.md

.. .. _Other Notes:
.. .. toctree::
.. :maxdepth: 1
.. :caption: Other Notes
.. _Other Notes:
.. toctree::
:maxdepth: 1
:caption: Other Notes

.. notes/contribution_guide.md
Contributors.md

Index and Tables
==================
Expand Down
59 changes: 59 additions & 0 deletions docs/zh-CN/ConfigSystem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@

# 配置系统

默认情况下,VLMEvalKit通过在`run.py`脚本中使用`--model``--data`参数设置模型名称(在`/vlmeval/config.py`中定义)和数据集名称(在`vlmeval/dataset/__init__.py`中定义)来启动评估。这种方法在大多数情况下简单且高效,但当用户希望使用不同设置评估多个模型/数据集时,可能不够灵活。

为了解决这个问题,VLMEvalKit提供了一个更灵活的配置系统。用户可以在json文件中指定模型和数据集设置,并通过`--config`参数将配置文件的路径传递给`run.py`脚本。以下是一个示例配置json:

```json
{
"model": {
"GPT4o_20240806_T00_HIGH": {
"class": "GPT4V",
"model": "gpt-4o-2024-08-06",
"temperature": 0,
"img_detail": "high"
},
"GPT4o_20240806_T10_Low": {
"class": "GPT4V",
"model": "gpt-4o-2024-08-06",
"temperature": 1.0,
"img_detail": "low"
}
},
"data": {
"MME-RealWorld-Lite": {
"class": "MMERealWorld",
"dataset": "MME-RealWorld-Lite"
},
"MMBench_DEV_EN_V11": {
"class": "ImageMCQDataset",
"dataset": "MMBench_DEV_EN_V11"
}
}
}
```

配置json的解释:

1. 现在我们支持两个字段:`model``data`,每个字段都是一个字典。字典的键是模型/数据集的名称(由用户设置),值是模型/数据集的设置。
2. 对于`model`中的项目,值是一个包含以下键的字典:
- `class`:模型的类名,应该是`vlmeval/vlm/__init__.py`(开源模型)或`vlmeval/api/__init__.py`(API模型)中定义的类名。
- 其他kwargs:其他kwargs是模型特定的参数,请参考模型类的定义以获取详细用法。例如,`model``temperature``img_detail``GPT4V`类的参数。值得注意的是,大多数模型类都需要`model`参数。
3. 对于字典`data`,我们建议用户使用官方数据集名称作为键(或键的一部分),因为我们经常根据数据集名称确定后处理/判断设置。对于`data`中的项目,值是一个包含以下键的字典:
- `class`:数据集的类名,应该是`vlmeval/dataset/__init__.py`中定义的类名。
- 其他kwargs:其他kwargs是数据集特定的参数,请参考数据集类的定义以获取详细用法。通常,大多数数据集类都需要`dataset`参数。

将示例配置json保存为`config.json`,您可以通过以下命令启动评估:

```bash
python run.py --config config.json
```

这将在工作目录`$WORK_DIR`下生成以下输出文件(格式为`{$WORK_DIR}/{$MODEL_NAME}/{$MODEL_NAME}_{$DATASET_NAME}_*`):

- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MME-RealWorld-Lite*`
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MME-RealWorld-Lite*`
- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MMBench_DEV_EN_V11*`
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MMBench_DEV_EN_V11*`
-
File renamed without changes.
File renamed without changes.
9 changes: 5 additions & 4 deletions docs/zh-CN/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ VLMEvalKit 上手路线

我们始终非常欢迎用户的 PRs 和 Issues 来完善 VLMEvalKit!

.. _开始你的第一步:
.. _快速开始:
.. toctree::
:maxdepth: 1
:caption: 开始你的第一步
:caption: 快速开始

get_started/Quickstart.md
Quickstart.md


.. .. _教程:
Expand All @@ -32,7 +32,8 @@ VLMEvalKit 上手路线
:maxdepth: 1
:caption: 进阶教程

advanced_guides/Development.md
Development.md
ConfigSystem.md

.. .. _其他说明:
.. .. toctree::
Expand Down
Loading

0 comments on commit 1306be9

Please sign in to comment.