forked from huggingface/lm-evaluation-harness
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new dataset MMLU-SR tasks (EleutherAI#2032)
* add mmlusr tasks * renamed all tasks names in mmlusr * edit format and readme * added mmlu_sr * mmlu_sr -> mmlusr * update --------- Co-authored-by: lintangsutawika <[email protected]>
- Loading branch information
1 parent
cdd954f
commit d5f39bf
Showing
183 changed files
with
1,653 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# MMLU-SR | ||
|
||
## Paper | ||
Title: [Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models](https://arxiv.org/abs/2406.15468v1) | ||
|
||
|
||
We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms. We reasoned that an agent that ``truly'' understands a concept can still evaluate it when key terms are replaced by suitably defined alternate terms, and sought to differentiate such comprehension from mere text replacement. In our study, we modified standardized test questions by replacing a key term with a dummy word along with its definition. The key term could be in the context of questions, answers, or both questions and answers. | ||
Notwithstanding the high scores achieved by recent popular LLMs on the MMLU leaderboard, we found a substantial reduction in model performance after such replacement, suggesting poor comprehension. This new benchmark provides a rigorous benchmark for testing true model comprehension, and poses a challenge to the broader scientific community. | ||
|
||
Github Homepage: [https://github.com/Wang-ML-Lab/MMLU-SR](https://github.com/Wang-ML-Lab/MMLU-SR) | ||
Huggingface Dataset: [https://huggingface.co/datasets/NiniCat/MMLU-SR]([https://huggingface.co/datasets/NiniCat/MMLU-SR) | ||
|
||
|
||
## Citation | ||
```bib | ||
@misc{wang2024reasoningsimplytokenprediction, | ||
title={Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models}, | ||
author={Wentian Wang and Paul Kantor and Jacob Feldman and Lazaros Gallos and Hao Wang}, | ||
year={2024}, | ||
eprint={2406.15468}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CL}, | ||
url={https://arxiv.org/abs/2406.15468}, | ||
} | ||
``` | ||
|
||
### Groups and Tasks | ||
|
||
#### Groups | ||
|
||
- `mmlusr`: MMLU variant where the terminology in the question and answers are modified. | ||
- `mmlusr_answer_only`: MMLU variant where the terminology in the answers are modified. | ||
- `mmlusr_question_only`: MMLU variant where the terminology in the question is modified. | ||
|
||
#### Tasks | ||
|
||
There are 57 symbol replaced subjects in each group. You can run a single task by: | ||
|
||
* `mmlusr_question_only_abstract_algebra` | ||
|
||
Or by categories: | ||
|
||
* `mmlusr_question_only_stem_tasks ` | ||
|
||
|
||
### Checklist | ||
|
||
The checklist is the following: | ||
|
||
For adding novel benchmarks/datasets to the library: | ||
* [x] Is the task an existing benchmark in the literature? | ||
* [x] Have you referenced the original paper that introduced the task? | ||
* [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test? | ||
* The implementation in the original paper is one where the model is first fine-tuned on the data. They do have a few-shot evaluation for GPT-3, however the few-shot context used here is sourced from [Lewkowycz et al](https://arxiv.org/abs/2206.14858). The achieved accuracy on Llama-2 models is comparable to that provided in the paper, though not identical. | ||
|
||
|
||
If other tasks on this dataset are already supported: | ||
* [x] Is the "Main" variant of this task clearly denoted? | ||
* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates? | ||
* [x] Have you noted which, if any, published evaluation setups are matched by this variant? | ||
|
||
### Variant Wishlist | ||
|
||
- [ ] zero-shot variant |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
group: mmlusr_answer_only | ||
group_alias: MMLU-SR (Answer Only) | ||
task: | ||
- group: mmlusr_ao_stem | ||
group_alias: STEM (Answer Only) | ||
task: | ||
- mmlusr_answer_only_stem_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
weight_by_size: True | ||
metadata: | ||
version: 1 | ||
- group: mmlusr_ao_other | ||
group_alias: Other (Answer Only) | ||
task: | ||
- mmlusr_answer_only_other_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
weight_by_size: True | ||
metadata: | ||
version: 1 | ||
- group: mmlusr_ao_social_sciences | ||
group_alias: Social Sciences (Answer Only) | ||
task: | ||
- mmlusr_answer_only_social_sciences_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
weight_by_size: True | ||
metadata: | ||
version: 1 | ||
- group: mmlusr_ao_humanities | ||
group_alias: Humanities (Answer Only) | ||
task: | ||
- mmlusr_answer_only_humanities_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
weight_by_size: True | ||
metadata: | ||
version: 1 | ||
aggregate_metric_list: | ||
- metric: acc | ||
weight_by_size: True | ||
metadata: | ||
version: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
dataset_path: NiniCat/MMLU-SR | ||
test_split: test | ||
fewshot_split: train | ||
fewshot_config: | ||
sampler: first_n | ||
output_type: multiple_choice | ||
process_docs: !function utils.process_docs | ||
doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:" | ||
doc_to_choice: ["A", "B", "C", "D"] | ||
doc_to_target: answer | ||
metric_list: | ||
- metric: acc | ||
aggregation: mean | ||
higher_is_better: true | ||
metadata: | ||
version: 0.0 |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_abstract_algebra.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_abstract_algebra" | ||
"description": "The following are multiple choice questions (with answers) about abstract\ | ||
\ algebra.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_abstract_algebra" | ||
"task_alias": "abstract algebra" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_anatomy" | ||
"description": "The following are multiple choice questions (with answers) about anatomy.\n\ | ||
\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_anatomy" | ||
"task_alias": "anatomy" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_astronomy" | ||
"description": "The following are multiple choice questions (with answers) about astronomy.\n\ | ||
\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_astronomy" | ||
"task_alias": "astronomy" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_business_ethics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_business_ethics" | ||
"description": "The following are multiple choice questions (with answers) about business\ | ||
\ ethics.\n\n" | ||
"tag": "mmlusr_answer_only_other_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_business_ethics" | ||
"task_alias": "business ethics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_clinical_knowledge.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_clinical_knowledge" | ||
"description": "The following are multiple choice questions (with answers) about clinical\ | ||
\ knowledge.\n\n" | ||
"tag": "mmlusr_answer_only_other_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_clinical_knowledge" | ||
"task_alias": "clinical knowledge" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_college_biology.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_college_biology" | ||
"description": "The following are multiple choice questions (with answers) about college\ | ||
\ biology.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_college_biology" | ||
"task_alias": "college biology" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_college_chemistry.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_college_chemistry" | ||
"description": "The following are multiple choice questions (with answers) about college\ | ||
\ chemistry.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_college_chemistry" | ||
"task_alias": "college chemistry" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_college_computer_science.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_college_computer_science" | ||
"description": "The following are multiple choice questions (with answers) about college\ | ||
\ computer science.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_college_computer_science" | ||
"task_alias": "college computer science" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_college_mathematics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_college_mathematics" | ||
"description": "The following are multiple choice questions (with answers) about college\ | ||
\ mathematics.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_college_mathematics" | ||
"task_alias": "college mathematics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_college_medicine.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_college_medicine" | ||
"description": "The following are multiple choice questions (with answers) about college\ | ||
\ medicine.\n\n" | ||
"tag": "mmlusr_answer_only_other_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_college_medicine" | ||
"task_alias": "college medicine" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_college_physics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_college_physics" | ||
"description": "The following are multiple choice questions (with answers) about college\ | ||
\ physics.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_college_physics" | ||
"task_alias": "college physics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_computer_security.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_computer_security" | ||
"description": "The following are multiple choice questions (with answers) about computer\ | ||
\ security.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_computer_security" | ||
"task_alias": "computer security" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_conceptual_physics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_conceptual_physics" | ||
"description": "The following are multiple choice questions (with answers) about conceptual\ | ||
\ physics.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_conceptual_physics" | ||
"task_alias": "conceptual physics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_econometrics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_econometrics" | ||
"description": "The following are multiple choice questions (with answers) about econometrics.\n\ | ||
\n" | ||
"tag": "mmlusr_answer_only_social_sciences_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_econometrics" | ||
"task_alias": "econometrics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_electrical_engineering.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_electrical_engineering" | ||
"description": "The following are multiple choice questions (with answers) about electrical\ | ||
\ engineering.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_electrical_engineering" | ||
"task_alias": "electrical engineering" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_elementary_mathematics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_elementary_mathematics" | ||
"description": "The following are multiple choice questions (with answers) about elementary\ | ||
\ mathematics.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_elementary_mathematics" | ||
"task_alias": "elementary mathematics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_formal_logic.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_formal_logic" | ||
"description": "The following are multiple choice questions (with answers) about formal\ | ||
\ logic.\n\n" | ||
"tag": "mmlusr_answer_only_humanities_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_formal_logic" | ||
"task_alias": "formal logic" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_global_facts.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_global_facts" | ||
"description": "The following are multiple choice questions (with answers) about global\ | ||
\ facts.\n\n" | ||
"tag": "mmlusr_answer_only_other_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_global_facts" | ||
"task_alias": "global facts" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_biology.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_biology" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school biology.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_biology" | ||
"task_alias": "high school biology" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_chemistry.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_chemistry" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school chemistry.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_chemistry" | ||
"task_alias": "high school chemistry" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_computer_science.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_computer_science" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school computer science.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_computer_science" | ||
"task_alias": "high school computer science" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_european_history.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_european_history" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school european history.\n\n" | ||
"tag": "mmlusr_answer_only_humanities_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_european_history" | ||
"task_alias": "high school european history" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_geography.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_geography" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school geography.\n\n" | ||
"tag": "mmlusr_answer_only_social_sciences_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_geography" | ||
"task_alias": "high school geography" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_government_and_politics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_government_and_politics" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school government and politics.\n\n" | ||
"tag": "mmlusr_answer_only_social_sciences_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_government_and_politics" | ||
"task_alias": "high school government and politics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_macroeconomics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_macroeconomics" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school macroeconomics.\n\n" | ||
"tag": "mmlusr_answer_only_social_sciences_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_macroeconomics" | ||
"task_alias": "high school macroeconomics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_mathematics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_mathematics" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school mathematics.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_mathematics" | ||
"task_alias": "high school mathematics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_microeconomics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_microeconomics" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school microeconomics.\n\n" | ||
"tag": "mmlusr_answer_only_social_sciences_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_microeconomics" | ||
"task_alias": "high school microeconomics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_physics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_physics" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school physics.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_physics" | ||
"task_alias": "high school physics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_psychology.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_psychology" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school psychology.\n\n" | ||
"tag": "mmlusr_answer_only_social_sciences_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_psychology" | ||
"task_alias": "high school psychology" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_statistics.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_statistics" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school statistics.\n\n" | ||
"tag": "mmlusr_answer_only_stem_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_statistics" | ||
"task_alias": "high school statistics" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_us_history.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_us_history" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school us history.\n\n" | ||
"tag": "mmlusr_answer_only_humanities_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_us_history" | ||
"task_alias": "high school us history" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_world_history.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_high_school_world_history" | ||
"description": "The following are multiple choice questions (with answers) about high\ | ||
\ school world history.\n\n" | ||
"tag": "mmlusr_answer_only_humanities_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_high_school_world_history" | ||
"task_alias": "high school world history" |
7 changes: 7 additions & 0 deletions
7
lm_eval/tasks/mmlusr/answer_only/answer_only_human_aging.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
"dataset_name": "answer_only_human_aging" | ||
"description": "The following are multiple choice questions (with answers) about human\ | ||
\ aging.\n\n" | ||
"tag": "mmlusr_answer_only_other_tasks" | ||
"include": "_mmlusr_a_yml" | ||
"task": "mmlusr_answer_only_human_aging" | ||
"task_alias": "human aging" |
Oops, something went wrong.