forked from open-compass/VLMEvalKit
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Improvement] Launch Evaluation w. Config (open-compass#610)
* [Improvement] Support Launching Eval w. --config * Fix Inference Scripts * [Fix] Fix build_from_config issues * improve run.py -h help message * Refactor Doc * add doc for the config system * [Fix] aria on mvbench and llava_onevision in video benchmark * [Fix] Idefics on mlvu, tempcompass and mvbench * [Model] Support SmolVLM (open-compass#615) * adding smolvlm * changing model path * fix * pre-commit fixes * [Fix] Enable --reuse with resume from original pkl files with same commit id (open-compass#613) * [Fix] Enable --reuse with resume from original pkl files with same commit id * only transfer target dataset .pkl file * update reuse logic --------- Co-authored-by: kennymckormick <[email protected]> * [Benchmark] Support MM-Math (open-compass#618) * [Benchmark] Support MM-Math Evaluation * update README * update README * [Benchmark] Measuring Quantitative Spatial Reasoning with the Q-Spatial Bench🔥 (open-compass#569) * add Q-Spatial Bench * perform pre-commit * use spatialprompt_single --------- Co-authored-by: Haodong Duan <[email protected]> * update README * [Improvement] Support Launching Eval w. --config --------- Co-authored-by: FangXinyu-0913 <[email protected]> Co-authored-by: Miquel Farré <[email protected]> Co-authored-by: Andrew <[email protected]>
- Loading branch information
1 parent
f9bbe27
commit 1306be9
Showing
15 changed files
with
357 additions
and
124 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Config System | ||
|
||
By default, VLMEvalKit launches the evaluation by setting the model name(s) (defined in `/vlmeval/config.py`) and dataset name(s) (defined in `vlmeval/dataset/__init__.py`) in the `run.py` script with the `--model` and `--data` arguments. Such approach is simple and efficient in most scenarios, however, it may not be flexible enough when the user wants to evaluate multiple models / datasets with different settings. | ||
|
||
To address this, VLMEvalKit provides a more flexible config system. The user can specify the model and dataset settings in a json file, and pass the path to the config file to the `run.py` script with the `--config` argument. Here is a sample config json: | ||
|
||
```json | ||
{ | ||
"model": { | ||
"GPT4o_20240806_T00_HIGH": { | ||
"class": "GPT4V", | ||
"model": "gpt-4o-2024-08-06", | ||
"temperature": 0, | ||
"img_detail": "high" | ||
}, | ||
"GPT4o_20240806_T10_Low": { | ||
"class": "GPT4V", | ||
"model": "gpt-4o-2024-08-06", | ||
"temperature": 1.0, | ||
"img_detail": "low" | ||
} | ||
}, | ||
"data": { | ||
"MME-RealWorld-Lite": { | ||
"class": "MMERealWorld", | ||
"dataset": "MME-RealWorld-Lite" | ||
}, | ||
"MMBench_DEV_EN_V11": { | ||
"class": "ImageMCQDataset", | ||
"dataset": "MMBench_DEV_EN_V11" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
Explanation of the config json: | ||
|
||
1. Now we support two fields: `model` and `data`, each of which is a dictionary. The key of the dictionary is the name of the model / dataset (set by the user), and the value is the setting of the model / dataset. | ||
2. For items in `model`, the value is a dictionary containing the following keys: | ||
- `class`: The class name of the model, which should be a class name defined in `vlmeval/vlm/__init__.py` (open-source models) or `vlmeval/api/__init__.py` (API models). | ||
- Other kwargs: Other kwargs are model-specific parameters, please refer to the definition of the model class for detailed usage. For example, `model`, `temperature`, `img_detail` are arguments of the `GPT4V` class. It's noteworthy that the `model` argument is required by most model classes. | ||
3. For the dictionary `data`, we suggest users to use the official dataset name as the key (or part of the key), since we frequently determine the post-processing / judging settings based on the dataset name. For items in `data`, the value is a dictionary containing the following keys: | ||
- `class`: The class name of the dataset, which should be a class name defined in `vlmeval/dataset/__init__.py`. | ||
- Other kwargs: Other kwargs are dataset-specific parameters, please refer to the definition of the dataset class for detailed usage. Typically, the `dataset` argument is required by most dataset classes. | ||
|
||
Saving the example config json to `config.json`, you can launch the evaluation by: | ||
|
||
```bash | ||
python run.py --config config.json | ||
``` | ||
|
||
That will generate the following output files under the working directory `$WORK_DIR` (Following the format `{$WORK_DIR}/{$MODEL_NAME}/{$MODEL_NAME}_{$DATASET_NAME}_*`): | ||
|
||
- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MME-RealWorld-Lite*` | ||
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MME-RealWorld-Lite*` | ||
- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MMBench_DEV_EN_V11*` | ||
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MMBench_DEV_EN_V11*` |
File renamed without changes.
4 changes: 3 additions & 1 deletion
4
docs/en/advanced_guides/Development.md → docs/en/Development.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
|
||
# 配置系统 | ||
|
||
默认情况下,VLMEvalKit通过在`run.py`脚本中使用`--model`和`--data`参数设置模型名称(在`/vlmeval/config.py`中定义)和数据集名称(在`vlmeval/dataset/__init__.py`中定义)来启动评估。这种方法在大多数情况下简单且高效,但当用户希望使用不同设置评估多个模型/数据集时,可能不够灵活。 | ||
|
||
为了解决这个问题,VLMEvalKit提供了一个更灵活的配置系统。用户可以在json文件中指定模型和数据集设置,并通过`--config`参数将配置文件的路径传递给`run.py`脚本。以下是一个示例配置json: | ||
|
||
```json | ||
{ | ||
"model": { | ||
"GPT4o_20240806_T00_HIGH": { | ||
"class": "GPT4V", | ||
"model": "gpt-4o-2024-08-06", | ||
"temperature": 0, | ||
"img_detail": "high" | ||
}, | ||
"GPT4o_20240806_T10_Low": { | ||
"class": "GPT4V", | ||
"model": "gpt-4o-2024-08-06", | ||
"temperature": 1.0, | ||
"img_detail": "low" | ||
} | ||
}, | ||
"data": { | ||
"MME-RealWorld-Lite": { | ||
"class": "MMERealWorld", | ||
"dataset": "MME-RealWorld-Lite" | ||
}, | ||
"MMBench_DEV_EN_V11": { | ||
"class": "ImageMCQDataset", | ||
"dataset": "MMBench_DEV_EN_V11" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
配置json的解释: | ||
|
||
1. 现在我们支持两个字段:`model`和`data`,每个字段都是一个字典。字典的键是模型/数据集的名称(由用户设置),值是模型/数据集的设置。 | ||
2. 对于`model`中的项目,值是一个包含以下键的字典: | ||
- `class`:模型的类名,应该是`vlmeval/vlm/__init__.py`(开源模型)或`vlmeval/api/__init__.py`(API模型)中定义的类名。 | ||
- 其他kwargs:其他kwargs是模型特定的参数,请参考模型类的定义以获取详细用法。例如,`model`、`temperature`、`img_detail`是`GPT4V`类的参数。值得注意的是,大多数模型类都需要`model`参数。 | ||
3. 对于字典`data`,我们建议用户使用官方数据集名称作为键(或键的一部分),因为我们经常根据数据集名称确定后处理/判断设置。对于`data`中的项目,值是一个包含以下键的字典: | ||
- `class`:数据集的类名,应该是`vlmeval/dataset/__init__.py`中定义的类名。 | ||
- 其他kwargs:其他kwargs是数据集特定的参数,请参考数据集类的定义以获取详细用法。通常,大多数数据集类都需要`dataset`参数。 | ||
|
||
将示例配置json保存为`config.json`,您可以通过以下命令启动评估: | ||
|
||
```bash | ||
python run.py --config config.json | ||
``` | ||
|
||
这将在工作目录`$WORK_DIR`下生成以下输出文件(格式为`{$WORK_DIR}/{$MODEL_NAME}/{$MODEL_NAME}_{$DATASET_NAME}_*`): | ||
|
||
- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MME-RealWorld-Lite*` | ||
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MME-RealWorld-Lite*` | ||
- `$WORK_DIR/GPT4o_20240806_T00_HIGH/GPT4o_20240806_T00_HIGH_MMBench_DEV_EN_V11*` | ||
- `$WORK_DIR/GPT4o_20240806_T10_Low/GPT4o_20240806_T10_Low_MMBench_DEV_EN_V11*` | ||
- |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.