[Improvement] Launch Evaluation w. Config (open-compass#610)

* [Improvement] Support Launching Eval w. --config * Fix Inference Scripts * [Fix] Fix build_from_config issues * improve run.py -h help message * Refactor Doc * add doc for the config system * [Fix] aria on mvbench and llava_onevision in video benchmark * [Fix] Idefics on mlvu, tempcompass and mvbench * [Model] Support SmolVLM (open-compass#615) * adding smolvlm * changing model path * fix * pre-commit fixes * [Fix] Enable --reuse with resume from original pkl files with same commit id (open-compass#613) * [Fix] Enable --reuse with resume from original pkl files with same commit id * only transfer target dataset .pkl file * update reuse logic --------- Co-authored-by: kennymckormick <[email protected]> * [Benchmark] Support MM-Math (open-compass#618) * [Benchmark] Support MM-Math Evaluation * update README * update README * [Benchmark] Measuring Quantitative Spatial Reasoning with the Q-Spatial Bench🔥 (open-compass#569) * add Q-Spatial Bench * perform pre-commit * use spatialprompt_single --------- Co-authored-by: Haodong Duan <[email protected]> * update README * [Improvement] Support Launching Eval w. --config --------- Co-authored-by: FangXinyu-0913 <[email protected]> Co-authored-by: Miquel Farré <[email protected]> Co-authored-by: Andrew <[email protected]>
Myhs-phz · Nov 21, 2024 · 1306be9 · 1306be9
1 parent f9bbe27
commit 1306be9
Show file tree

Hide file tree

Showing 15 changed files with 357 additions and 124 deletions.
diff --git a/docs/en/ConfigSystem.md b/docs/en/ConfigSystem.md
@@ -0,0 +1,57 @@
+# Config System
+
+By default, VLMEvalKit launches the evaluation by setting the model name(s) (defined in `/vlmeval/config.py`) and dataset name(s) (defined in `vlmeval/dataset/__init__.py`) in the `run.py` script with the `--model` and `--data` arguments. Such approach is simple and efficient in most scenarios, however, it may not be flexible enough when the user wants to evaluate multiple models / datasets with different settings.
+
+To address this, VLMEvalKit provides a more flexible config system. The user can specify the model and dataset settings in a json file, and pass the path to the config file to the `run.py` script with the `--config` argument. Here is a sample config json:
+
+```json
+{
+    "model": {
+        "GPT4o_20240806_T00_HIGH": {
+            "class": "GPT4V",
+            "model": "gpt-4o-2024-08-06",
+            "temperature": 0,
+            "img_detail": "high"
+        },
+        "GPT4o_20240806_T10_Low": {
+            "class": "GPT4V",
+            "model": "gpt-4o-2024-08-06",
+            "temperature": 1.0,
+            "img_detail": "low"
+        }
+    },
+    "data": {
+        "MME-RealWorld-Lite": {
+            "class": "MMERealWorld",
+            "dataset": "MME-RealWorld-Lite"
+        },
+        "MMBench_DEV_EN_V11": {
+            "class": "ImageMCQDataset",
+            "dataset": "MMBench_DEV_EN_V11"
+        }
+    }
+}
+```
+
+Explanation of the config json:
+
+1. Now we support two fields: `model` and `data`, each of which is a dictionary. The key of the dictionary is the name of the model / dataset (set by the user), and the value is the setting of the model / dataset.
+2. For items in `model`, the value is a dictionary containing the following keys:
+    - `class`: The class name of the model, which should be a class name defined in `vlmeval/vlm/__init__.py` (open-source models) or `vlmeval/api/__init__.py` (API models).
+    - Other kwargs: Other kwargs are model-specific parameters, please refer to the definition of the model class for detailed usage. For example, `model`, `temperature`, `img_detail` are arguments of the `GPT4V` class. It's noteworthy that the `model` argument is required by most model classes.
+3. For the dictionary `data`, we suggest users to use the official dataset name as the key (or part of the key), since we frequently determine the post-processing / judging settings based on the dataset name. For items in `data`, the value is a dictionary containing the following keys:
+    - `class`: The class name of the dataset, which should be a class name defined in `vlmeval/dataset/__init__.py`.
+    - Other kwargs: Other kwargs are dataset-specific parameters, please refer to the definition of the dataset class for detailed usage. Typically, the `dataset` argument is required by most dataset classes.
+
+Saving the example config json to `config.json`, you can launch the evaluation by:
+
+```bash
+python run.py --config config.json
+```
+
+That will generate the following output files under the working directory `$WORK_DIR` (Following the format `{$WORK_DIR}/{$MODEL_NAME}/{$MODEL_NAME}_{$DATASET_NAME}_*`):
+
+- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MME-RealWorld-Lite*`
+- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MME-RealWorld-Lite*`
+- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MMBench_DEV_EN_V11*`
+- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MMBench_DEV_EN_V11*`
diff --git a/docs/en/advanced_guides/Contributors.md → docs/en/Contributors.md b/docs/en/advanced_guides/Contributors.md → docs/en/Contributors.md
diff --git a/docs/en/advanced_guides/Development.md → docs/en/Development.md b/docs/en/advanced_guides/Development.md → docs/en/Development.md
@@ -1,4 +1,6 @@
-# 🛠️ How to implement a new Benchmark / VLM in VLMEvalKit?
+# Develop new Benchmark / MLLM
+
+>  🛠️ How to implement a new Benchmark / VLM in VLMEvalKit?
 
 ## Implement a new benchmark
 

diff --git a/docs/en/get_started/Quickstart.md → docs/en/Quickstart.md b/docs/en/get_started/Quickstart.md → docs/en/Quickstart.md
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -17,29 +17,22 @@ We always welcome users' PRs (Pull Requests) and Issues to improve VLMEvalKit!
    :maxdepth: 1
    :caption: Start Your First Step
 
-   get_started/Quickstart.md
-
-
-.. .. _Tutorials:
-.. .. toctree::
-..    :maxdepth: 1
-..    :caption: Tutorials
-
-..    user_guides/framework_overview.md
+   Quickstart.md
 
 .. _Advanced Tutorial:
 .. toctree::
    :maxdepth: 1
    :caption: Advanced Tutorial
 
-   advanced_guides/Development.md
+   Development.md
+   ConfigSystem.md
 
-.. .. _Other Notes:
-.. .. toctree::
-..    :maxdepth: 1
-..    :caption: Other Notes
+.. _Other Notes:
+.. toctree::
+   :maxdepth: 1
+   :caption: Other Notes
 
-..    notes/contribution_guide.md
+   Contributors.md
 
 Index and Tables
 ==================

diff --git a/docs/zh-CN/ConfigSystem.md b/docs/zh-CN/ConfigSystem.md
@@ -0,0 +1,59 @@
+
+# 配置系统
+
+默认情况下，VLMEvalKit通过在`run.py`脚本中使用`--model`和`--data`参数设置模型名称（在`/vlmeval/config.py`中定义）和数据集名称（在`vlmeval/dataset/__init__.py`中定义）来启动评估。这种方法在大多数情况下简单且高效，但当用户希望使用不同设置评估多个模型/数据集时，可能不够灵活。
+
+为了解决这个问题，VLMEvalKit提供了一个更灵活的配置系统。用户可以在json文件中指定模型和数据集设置，并通过`--config`参数将配置文件的路径传递给`run.py`脚本。以下是一个示例配置json：
+
+```json
+{
+    "model": {
+        "GPT4o_20240806_T00_HIGH": {
+            "class": "GPT4V",
+            "model": "gpt-4o-2024-08-06",
+            "temperature": 0,
+            "img_detail": "high"
+        },
+        "GPT4o_20240806_T10_Low": {
+            "class": "GPT4V",
+            "model": "gpt-4o-2024-08-06",
+            "temperature": 1.0,
+            "img_detail": "low"
+        }
+    },
+    "data": {
+        "MME-RealWorld-Lite": {
+            "class": "MMERealWorld",
+            "dataset": "MME-RealWorld-Lite"
+        },
+        "MMBench_DEV_EN_V11": {
+            "class": "ImageMCQDataset",
+            "dataset": "MMBench_DEV_EN_V11"
+        }
+    }
+}
+```
+
+配置json的解释：
+
+1. 现在我们支持两个字段：`model`和`data`，每个字段都是一个字典。字典的键是模型/数据集的名称（由用户设置），值是模型/数据集的设置。
+2. 对于`model`中的项目，值是一个包含以下键的字典：
+    - `class`：模型的类名，应该是`vlmeval/vlm/__init__.py`（开源模型）或`vlmeval/api/__init__.py`（API模型）中定义的类名。
+    - 其他kwargs：其他kwargs是模型特定的参数，请参考模型类的定义以获取详细用法。例如，`model`、`temperature`、`img_detail`是`GPT4V`类的参数。值得注意的是，大多数模型类都需要`model`参数。
+3. 对于字典`data`，我们建议用户使用官方数据集名称作为键（或键的一部分），因为我们经常根据数据集名称确定后处理/判断设置。对于`data`中的项目，值是一个包含以下键的字典：
+    - `class`：数据集的类名，应该是`vlmeval/dataset/__init__.py`中定义的类名。
+    - 其他kwargs：其他kwargs是数据集特定的参数，请参考数据集类的定义以获取详细用法。通常，大多数数据集类都需要`dataset`参数。
+
+将示例配置json保存为`config.json`，您可以通过以下命令启动评估：
+
+```bash
+python run.py --config config.json
+```
+
+这将在工作目录`$WORK_DIR`下生成以下输出文件（格式为`{$WORK_DIR}/{$MODEL_NAME}/{$MODEL_NAME}_{$DATASET_NAME}_*`）：
+
+- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MME-RealWorld-Lite*`
+- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MME-RealWorld-Lite*`
+- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MMBench_DEV_EN_V11*`
+- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MMBench_DEV_EN_V11*`
+-
diff --git a/docs/zh-CN/advanced_guides/Development.md → docs/zh-CN/Development.md b/docs/zh-CN/advanced_guides/Development.md → docs/zh-CN/Development.md
diff --git a/docs/zh-CN/get_started/Quickstart.md → docs/zh-CN/Quickstart.md b/docs/zh-CN/get_started/Quickstart.md → docs/zh-CN/Quickstart.md
diff --git a/docs/zh-CN/index.rst b/docs/zh-CN/index.rst
@@ -12,12 +12,12 @@ VLMEvalKit 上手路线
 
 我们始终非常欢迎用户的 PRs 和 Issues 来完善 VLMEvalKit！
 
-.. _开始你的第一步:
+.. _快速开始:
 .. toctree::
    :maxdepth: 1
-   :caption: 开始你的第一步
+   :caption: 快速开始
 
-   get_started/Quickstart.md
+   Quickstart.md
 
 
 .. .. _教程:
@@ -32,7 +32,8 @@ VLMEvalKit 上手路线
    :maxdepth: 1
    :caption: 进阶教程
 
-   advanced_guides/Development.md
+   Development.md
+   ConfigSystem.md
 
 .. .. _其他说明:
 .. .. toctree::