[Feature] Support downloading dataset from OpenMind #1792
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Modelers is a popular open-source community that has recently gained a lot of attention. It includes some popular datasets and Ascend-NPU supported models. By using accompanying openMind library, users can train model on Ascend NPU easily.
We hope to integrate the dataset resources of the OpenMind community into opencompass through this PR. After that, we plan to integrate the evaluation capabilities of opencompass into the OpenMind library to facilitate users in conducting model evaluations, and play a role in promoting opencompass at the same time.
By simply setting the environment variable
DATASET_SOURCE=OpenMind
, users can use dataset from openMind community when using opencompass.Modification
This PR aims to establish the process of integrating opencompass with the dataset resources from the OpenMind community, and use the GSM8K dataset as a pilot for this integration. More other datasets will be supported soon.
The modifications are:
opencompass/datasets/gsm8k.py
: Support using openMind library to automatically download GSM8K dataset in OpenMind community when environment variableDATASET_SOURCE=OpenMind
opencompass/utils/datasets.py
: Support getting dataset id from OpenMind community in variableDATASETS_MAPPING
with a new keyom_id
, om is short for OpenMind.opencompass/utils/datasets_info.py
: Add"om_id": "OpenCompass/gsm8k",
intoopencompass/gsm8k
dict, stringOpenCompass/gsm8k
comes from GSM8K dataset in OpenMind community.tests/dataset/test_om_datasets.py
: Add test script for datasets from OpenMind community.BC-breaking (Optional)
Not related.
Use cases (Optional)
We verify this PR in python 3.10.16 on Windows as follows:
Install dependencies and launch:
Running result:
Checklist
Before PR:
After PR: