[Feature] Add support for MiniMax API (#548)

* update requirement * update requirement * update with minimax * update api model * Update readme * fix error --------- Co-authored-by: zhangsongyang <[email protected]>
open-compass · Nov 6, 2023 · 239c2a3 · 239c2a3
1 parent 1ccdfaa
commit 239c2a3
Show file tree

Hide file tree

Showing 17 changed files with 368 additions and 8 deletions.
diff --git a/README.md b/README.md
@@ -38,6 +38,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 
 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 
+- **\[2023.11.06\]** We have supported several API-based models, include  ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei. Welcome to [Models](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) section for more details. 🔥🔥🔥.
 - **\[2023.10.24\]** We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to [BotChat](https://github.com/open-compass/BotChat) for more details. 🔥🔥🔥.
 - **\[2023.09.26\]** We update the leaderboard with [Qwen](https://github.com/QwenLM/Qwen), one of the best-performing open-source models currently available, welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
 - **\[2023.09.20\]** We update the leaderboard with [InternLM-20B](https://github.com/InternLM/InternLM), welcome to our [homepage](https://opencompass.org.cn) for more details. 🔥🔥🔥.
@@ -46,7 +47,6 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 - **\[2023.09.08\]** We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our [homepage](https://opencompass.org.cn) for more details.
 - **\[2023.09.06\]**  [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 - **\[2023.09.02\]** We have supported the evaluation of [Qwen-VL](https://github.com/QwenLM/Qwen-VL) in OpenCompass.
-- **\[2023.08.25\]**  [**TigerBot**](https://github.com/TigerResearch/TigerBot) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 
 > [More](docs/en/notes/news.md)
 

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -38,6 +38,7 @@
 
 ## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
 
+- **\[2023.11.06\]** 我们已经支持了多个基于 API 的模型，包括ChatGLM Pro@智谱清言、ABAB-Chat@MiniMax 和讯飞。欢迎查看 [模型](https://opencompass.readthedocs.io/en/latest/user_guides/models.html) 部分以获取更多详细信息。🔥🔥🔥。
 - **\[2023.10.24\]** 我们发布了一个全新的评测集，BotChat，用于评估大语言模型的多轮对话能力，欢迎查看 [BotChat](https://github.com/open-compass/BotChat) 获取更多信息. 🔥🔥🔥.
 - **\[2023.09.26\]** 我们在评测榜单上更新了[Qwen](https://github.com/QwenLM/Qwen), 这是目前表现最好的开源模型之一, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
 - **\[2023.09.20\]** 我们在评测榜单上更新了[InternLM-20B](https://github.com/InternLM/InternLM), 欢迎访问[官方网站](https://opencompass.org.cn)获取详情.🔥🔥🔥.
@@ -46,7 +47,6 @@
 - **\[2023.09.08\]** 我们在评测榜单上更新了Baichuan-2/Tigerbot-2/Vicuna-v1.5, 欢迎访问[官方网站](https://opencompass.org.cn)获取详情。
 - **\[2023.09.06\]** 欢迎 [**Baichuan2**](https://github.com/baichuan-inc/Baichuan2) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 - **\[2023.09.02\]** 我们加入了[Qwen-VL](https://github.com/QwenLM/Qwen-VL)的评测支持。
-- **\[2023.08.25\]** 欢迎 [**TigerBot**](https://github.com/TigerResearch/TigerBot) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 
 > [更多](docs/zh_cn/notes/news.md)
 

diff --git a/configs/eval_minimax.py b/configs/eval_minimax.py
@@ -0,0 +1,37 @@
+from mmengine.config import read_base
+from opencompass.models.minimax import MiniMax
+from opencompass.partitioners import NaivePartitioner
+from opencompass.runners import LocalRunner
+from opencompass.runners.local_api import LocalAPIRunner
+from opencompass.tasks import OpenICLInferTask
+
+with read_base():
+    # from .datasets.collections.chat_medium import datasets
+    from .summarizers.medium import summarizer
+    from .datasets.ceval.ceval_gen import ceval_datasets
+
+datasets = [
+    *ceval_datasets,
+]
+
+models = [
+    dict(
+        abbr='minimax_abab5.5-chat',
+        type=MiniMax,
+        path='abab5.5-chat',
+        key='xxxxxxx', # please give you key
+        group_id='xxxxxxxx', # please give your group_id
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=2048,
+        batch_size=8),
+]
+
+infer = dict(
+    partitioner=dict(type=NaivePartitioner),
+    runner=dict(
+        type=LocalAPIRunner,
+        max_num_workers=4,
+        concurrent_users=4,
+        task=dict(type=OpenICLInferTask)),
+)
diff --git a/configs/eval_xunfei.py b/configs/eval_xunfei.py
@@ -0,0 +1,50 @@
+from mmengine.config import read_base
+from opencompass.models.xunfei_api import XunFei
+from opencompass.partitioners import NaivePartitioner
+from opencompass.runners import LocalRunner
+from opencompass.runners.local_api import LocalAPIRunner
+from opencompass.tasks import OpenICLInferTask
+
+with read_base():
+    # from .datasets.collections.chat_medium import datasets
+    from .summarizers.medium import summarizer
+    from .datasets.ceval.ceval_gen import ceval_datasets
+
+datasets = [
+    *ceval_datasets,
+]
+
+models = [
+    dict(
+        abbr='Spark-v1-1',
+        type=XunFei,
+        appid="xxxx",
+        path='ws://spark-api.xf-yun.com/v1.1/chat',
+        api_secret = "xxxxxxx",
+        api_key = "xxxxxxx",
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=2048,
+        batch_size=8),
+    dict(
+        abbr='Spark-v3-1',
+        type=XunFei,
+        appid="xxxx",
+        domain='generalv3',
+        path='ws://spark-api.xf-yun.com/v3.1/chat',
+        api_secret = "xxxxxxxx",
+        api_key = "xxxxxxxxx",
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=2048,
+        batch_size=8),
+]
+
+infer = dict(
+    partitioner=dict(type=NaivePartitioner),
+    runner=dict(
+        type=LocalAPIRunner,
+        max_num_workers=2,
+        concurrent_users=2,
+        task=dict(type=OpenICLInferTask)),
+)
diff --git a/configs/eval_zhihu.py b/configs/eval_zhihu.py
@@ -0,0 +1,36 @@
+from mmengine.config import read_base
+from opencompass.models import ZhiPuAI
+from opencompass.partitioners import NaivePartitioner
+from opencompass.runners import LocalRunner
+from opencompass.runners.local_api import LocalAPIRunner
+from opencompass.tasks import OpenICLInferTask
+
+with read_base():
+    # from .datasets.collections.chat_medium import datasets
+    from .summarizers.medium import summarizer
+    from .datasets.ceval.ceval_gen import ceval_datasets
+
+datasets = [
+    *ceval_datasets,
+]
+
+models = [
+    dict(
+        abbr='chatglm_pro',
+        type=ZhiPuAI,
+        path='chatglm_pro',
+        key='xxxxxxxxxxxx', 
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=2048,
+        batch_size=8),
+]
+
+infer = dict(
+    partitioner=dict(type=NaivePartitioner),
+    runner=dict(
+        type=LocalAPIRunner,
+        max_num_workers=2,
+        concurrent_users=2,
+        task=dict(type=OpenICLInferTask)),
+)
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -69,7 +69,7 @@ We always welcome *PRs* and *Issues* for the betterment of OpenCompass.
 .. _Tools:
 .. toctree::
    :maxdepth: 1
-   :caption: tools
+   :caption: Tools
 
    tools.md
 

diff --git a/docs/en/notes/news.md b/docs/en/notes/news.md
@@ -1,5 +1,6 @@
 # News
 
+- **\[2023.08.25\]**  [**TigerBot**](https://github.com/TigerResearch/TigerBot) team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
 - **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) has been released, which is a lightweight framework for building LLM-based agents. We are working with Lagent team to support the evaluation of general tool-use capability, stay tuned!
 - **\[2023.08.18\]** We have supported evaluation for **multi-modality learning**, include **MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA** and so on. Leaderboard is on the road. Feel free to try multi-modality evaluation with  OpenCompass !
 - **\[2023.08.18\]** [Dataset card](https://opencompass.org.cn/dataset-detail/MMLU) is now online. Welcome new evaluation benchmark  OpenCompass !

diff --git a/docs/en/user_guides/models.md b/docs/en/user_guides/models.md
@@ -70,7 +70,9 @@ model = HuggingFaceCausalLM(
 Currently, OpenCompass supports API-based model inference for the following:
 
 - OpenAI (`opencompass.models.OpenAI`)
-- More coming soon
+- ChatGLM (`opencompass.models.ZhiPuAI`)
+- ABAB-Chat from MiniMax (`opencompass.models.MiniMax`)
+- XunFei from XunFei (`opencompass.models.XunFei`)
 
 Let's take the OpenAI configuration file as an example to see how API-based models are used in the
 configuration file.
@@ -94,6 +96,15 @@ models = [
 ]
 ```
 
+We have provided several examples for API-based models. Please refer to
+
+```bash
+configs
+├── eval_zhihu.py
+├── eval_xunfei.py
+└── eval_minimax.py
+```
+
 ## Custom Models
 
 If the above methods do not support your model evaluation requirements, you can refer to

diff --git a/docs/zh_cn/notes/news.md b/docs/zh_cn/notes/news.md
@@ -1,5 +1,6 @@
 # 新闻
 
+- **\[2023.08.25\]** 欢迎 [**TigerBot**](https://github.com/TigerResearch/TigerBot) 团队采用OpenCompass对模型进行系统评估。我们非常感谢社区在提升LLM评估的透明度和可复现性上所做的努力。
 - **\[2023.08.21\]** [**Lagent**](https://github.com/InternLM/lagent) 正式发布，它是一个轻量级、开源的基于大语言模型的智能体（agent）框架。我们正与Lagent团队紧密合作，推进支持基于Lagent的大模型工具能力评测 !
 - **\[2023.08.18\]** OpenCompass现已支持**多模态评测**，支持10+多模态评测数据集，包括 **MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA** 等. 多模态评测榜单即将上线，敬请期待!
 - **\[2023.08.18\]** [数据集页面](https://opencompass.org.cn/dataset-detail/MMLU) 现已在OpenCompass官网上线，欢迎更多社区评测数据集加入OpenCompass !

diff --git a/docs/zh_cn/user_guides/models.md b/docs/zh_cn/user_guides/models.md
@@ -63,7 +63,9 @@ model = HuggingFaceCausalLM(
 OpenCompass 目前支持以下基于 API 的模型推理：
 
 - OpenAI（`opencompass.models.OpenAI`）
-- Coming soon
+- ChatGLM@智谱清言 (`opencompass.models.ZhiPuAI`)
+- ABAB-Chat@MiniMax (`opencompass.models.MiniMax`)
+- XunFei@科大讯飞 (`opencompass.models.XunFei`)
 
 以下，我们以 OpenAI 的配置文件为例，模型如何在配置文件中使用基于 API 的模型。
 
@@ -86,6 +88,15 @@ models = [
 ]
 ```
 
+我们也提供了API模型的评测示例，请参考
+
+```bash
+configs
+├── eval_zhihu.py
+├── eval_xunfei.py
+└── eval_minimax.py
+```
+
 ## 自定义模型
 
 如果以上方式无法支持你的模型评测需求，请参考 [支持新模型](../advanced_guides/new_model.md) 在 OpenCompass 中增添新的模型支持。
diff --git a/opencompass/models/__init__.py b/opencompass/models/__init__.py
@@ -6,6 +6,7 @@
 from .huggingface import HuggingFaceCausalLM  # noqa: F401, F403
 from .intern_model import InternLM  # noqa: F401, F403
 from .llama2 import Llama2, Llama2Chat  # noqa: F401, F403
+from .minimax_api import MiniMax  # noqa: F401
 from .openai_api import OpenAI  # noqa: F401
 from .xunfei_api import XunFei  # noqa: F401
-from .zhipuai import ZhiPuAI  # noqa: F401
+from .zhipuai_api import ZhiPuAI  # noqa: F401