-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] new_dataset.md update #1827
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,5 +54,43 @@ | |
eval_cfg=mydataset_eval_cfg) | ||
] | ||
``` | ||
|
||
- 为了使用户提供的数据集能够被其他使用者更方便的获取,需要用户在配置文件中给出下载数据集的渠道。具体的方式是首先在`mydataset_datasets`配置中的`path`字段填写用户指定的数据集名称,该名称将以mapping的方式映射到`opencompass/utils/datasets_info.py`中的实际下载路径。具体示例如下: | ||
|
||
```python | ||
mmlu_datasets = [ | ||
dict( | ||
..., | ||
path='opencompass/mmlu', | ||
..., | ||
) | ||
] | ||
``` | ||
|
||
- 接着,需要在`opencompass/utils/datasets_info.py`中创建对应名称的字典字段。如果用户已将数据集托管到huggingface或modelscope,那么请在`DATASETS_MAPPING`字典中添加对应名称的字段,并将对应的huggingface或modelscope数据集地址填入`ms_id`和`hf_id`;另外,还允许指定一个默认的`local`地址。具体示例如下: | ||
|
||
```python | ||
"opencompass/mmlu": { | ||
"ms_id": "opencompass/mmlu", | ||
"hf_id": "opencompass/mmlu", | ||
"local": "./data/mmlu/", | ||
} | ||
``` | ||
|
||
Comment on lines
+70
to
+79
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 用户需要在 load 函数中实现根据不同的环境变量切换数据源的操作,具体可以参考 cmmlu 中的实现 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 已添加 |
||
- 如果希望提供的数据集在其他用户使用时能够直接从OpenCompass官方的OSS仓库获取,则需要在Pull Request阶段向我们提交数据集文件,我们将代为传输数据集至OSS,并在`DATASET_URL`新建字段。 | ||
|
||
- 为了确保数据来源的可选择性,用户需要根据所提供数据集的下载路径类型来完善数据集脚本`mydataset.py`中的`load`方法的功能。具体而言,需要用户实现根据环境变量`DATASET_SOURCE`的不同设置来切换不同的下载数据源的功能。需要注意的是,若未设置`DATASET_SOURCE`的值,将默认从OSS仓库下载数据。`opencompass/dataset/cmmlu.py`中的具体示例如下: | ||
|
||
```python | ||
def load(path: str, name: str, **kwargs): | ||
... | ||
if environ.get('DATASET_SOURCE') == 'ModelScope': | ||
... | ||
else: | ||
... | ||
return dataset | ||
``` | ||
|
||
|
||
|
||
详细的数据集配置文件以及其他需要的配置文件可以参考[配置文件](../user_guides/config.md)教程,启动任务相关的教程可以参考[快速开始](../get_started/quick_start.md)教程。 | ||
详细的数据集配置文件以及其他需要的配置文件可以参考[配置文件](../user_guides/config.md)教程,启动任务相关的教程可以参考[快速开始](../get_started/quick_start.md)教程。 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
强调一下这里是个 mapping 的映射