diff --git a/lm_eval/tasks/README.md b/lm_eval/tasks/README.md index 1eb7fdc57c..c6e4c4d73f 100644 --- a/lm_eval/tasks/README.md +++ b/lm_eval/tasks/README.md @@ -67,6 +67,7 @@ | [mgsm](mgsm/README.md) | Benchmark of multilingual grade-school math problems. | Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, Telugu | | [minerva_math](minerva_math/README.md) | Mathematics-focused tasks requiring numerical reasoning and problem-solving skills. | English | | mmlu | Massive Multitask Language Understanding benchmark for broad domain language evaluation. Several variants are supported. | English | +| [mmlusr](mmlusr/README.md) | Variation of MMLU designed to be more rigourous. | English | | model_written_evals | Evaluation tasks auto-generated for evaluating a collection of AI Safety concerns. | | | [mutual](mutual/README.md) | A retrieval-based dataset for multi-turn dialogue reasoning. | English | | [nq_open](nq_open/README.md) | Open domain question answering tasks based on the Natural Questions dataset. | English | diff --git a/lm_eval/tasks/mmlusr/README.md b/lm_eval/tasks/mmlusr/README.md new file mode 100644 index 0000000000..6d8a79fbab --- /dev/null +++ b/lm_eval/tasks/mmlusr/README.md @@ -0,0 +1,64 @@ +# MMLU-SR + +## Paper +Title: [Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models](https://arxiv.org/abs/2406.15468v1) + + +We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms. We reasoned that an agent that ``truly'' understands a concept can still evaluate it when key terms are replaced by suitably defined alternate terms, and sought to differentiate such comprehension from mere text replacement. In our study, we modified standardized test questions by replacing a key term with a dummy word along with its definition. The key term could be in the context of questions, answers, or both questions and answers. +Notwithstanding the high scores achieved by recent popular LLMs on the MMLU leaderboard, we found a substantial reduction in model performance after such replacement, suggesting poor comprehension. This new benchmark provides a rigorous benchmark for testing true model comprehension, and poses a challenge to the broader scientific community. + +Github Homepage: [https://github.com/Wang-ML-Lab/MMLU-SR](https://github.com/Wang-ML-Lab/MMLU-SR) +Huggingface Dataset: [https://huggingface.co/datasets/NiniCat/MMLU-SR]([https://huggingface.co/datasets/NiniCat/MMLU-SR) + + +## Citation +```bib +@misc{wang2024reasoningsimplytokenprediction, + title={Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models}, + author={Wentian Wang and Paul Kantor and Jacob Feldman and Lazaros Gallos and Hao Wang}, + year={2024}, + eprint={2406.15468}, + archivePrefix={arXiv}, + primaryClass={cs.CL}, + url={https://arxiv.org/abs/2406.15468}, +} +``` + +### Groups and Tasks + +#### Groups + +- `mmlusr`: MMLU variant where the terminology in the question and answers are modified. +- `mmlusr_answer_only`: MMLU variant where the terminology in the answers are modified. +- `mmlusr_question_only`: MMLU variant where the terminology in the question is modified. + +#### Tasks + +There are 57 symbol replaced subjects in each group. You can run a single task by: + +* `mmlusr_question_only_abstract_algebra` + +Or by categories: + +* `mmlusr_question_only_stem_tasks ` + + +### Checklist + +The checklist is the following: + +For adding novel benchmarks/datasets to the library: +* [x] Is the task an existing benchmark in the literature? + * [x] Have you referenced the original paper that introduced the task? + * [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test? + * The implementation in the original paper is one where the model is first fine-tuned on the data. They do have a few-shot evaluation for GPT-3, however the few-shot context used here is sourced from [Lewkowycz et al](https://arxiv.org/abs/2206.14858). The achieved accuracy on Llama-2 models is comparable to that provided in the paper, though not identical. + + +If other tasks on this dataset are already supported: +* [x] Is the "Main" variant of this task clearly denoted? +* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates? +* [x] Have you noted which, if any, published evaluation setups are matched by this variant? + +### Variant Wishlist + +- [ ] zero-shot variant diff --git a/lm_eval/tasks/mmlusr/answer_only/_answer_only.yaml b/lm_eval/tasks/mmlusr/answer_only/_answer_only.yaml new file mode 100644 index 0000000000..eef906aa20 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/_answer_only.yaml @@ -0,0 +1,44 @@ +group: mmlusr_answer_only +group_alias: MMLU-SR (Answer Only) +task: + - group: mmlusr_ao_stem + group_alias: STEM (Answer Only) + task: + - mmlusr_answer_only_stem_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_ao_other + group_alias: Other (Answer Only) + task: + - mmlusr_answer_only_other_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_ao_social_sciences + group_alias: Social Sciences (Answer Only) + task: + - mmlusr_answer_only_social_sciences_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_ao_humanities + group_alias: Humanities (Answer Only) + task: + - mmlusr_answer_only_humanities_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 +aggregate_metric_list: + - metric: acc + weight_by_size: True +metadata: + version: 1 diff --git a/lm_eval/tasks/mmlusr/answer_only/_mmlusr_a_yml b/lm_eval/tasks/mmlusr/answer_only/_mmlusr_a_yml new file mode 100644 index 0000000000..cd307413a8 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/_mmlusr_a_yml @@ -0,0 +1,16 @@ +dataset_path: NiniCat/MMLU-SR +test_split: test +fewshot_split: train +fewshot_config: + sampler: first_n +output_type: multiple_choice +process_docs: !function utils.process_docs +doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:" +doc_to_choice: ["A", "B", "C", "D"] +doc_to_target: answer +metric_list: + - metric: acc + aggregation: mean + higher_is_better: true +metadata: + version: 0.0 diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_abstract_algebra.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_abstract_algebra.yaml new file mode 100644 index 0000000000..527bc9cc1b --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_abstract_algebra.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_abstract_algebra" +"description": "The following are multiple choice questions (with answers) about abstract\ + \ algebra.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_abstract_algebra" +"task_alias": "abstract algebra" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_anatomy.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_anatomy.yaml new file mode 100644 index 0000000000..1e4acc8c38 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_anatomy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_anatomy" +"description": "The following are multiple choice questions (with answers) about anatomy.\n\ + \n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_anatomy" +"task_alias": "anatomy" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_astronomy.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_astronomy.yaml new file mode 100644 index 0000000000..068072de60 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_astronomy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_astronomy" +"description": "The following are multiple choice questions (with answers) about astronomy.\n\ + \n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_astronomy" +"task_alias": "astronomy" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_business_ethics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_business_ethics.yaml new file mode 100644 index 0000000000..1e836e31c5 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_business_ethics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_business_ethics" +"description": "The following are multiple choice questions (with answers) about business\ + \ ethics.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_business_ethics" +"task_alias": "business ethics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_clinical_knowledge.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_clinical_knowledge.yaml new file mode 100644 index 0000000000..1ef709675c --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_clinical_knowledge.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_clinical_knowledge" +"description": "The following are multiple choice questions (with answers) about clinical\ + \ knowledge.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_clinical_knowledge" +"task_alias": "clinical knowledge" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_college_biology.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_biology.yaml new file mode 100644 index 0000000000..b967895a70 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_biology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_college_biology" +"description": "The following are multiple choice questions (with answers) about college\ + \ biology.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_college_biology" +"task_alias": "college biology" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_college_chemistry.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_chemistry.yaml new file mode 100644 index 0000000000..8dd100e7bc --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_chemistry.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_college_chemistry" +"description": "The following are multiple choice questions (with answers) about college\ + \ chemistry.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_college_chemistry" +"task_alias": "college chemistry" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_college_computer_science.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_computer_science.yaml new file mode 100644 index 0000000000..bbd7e4c158 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_computer_science.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_college_computer_science" +"description": "The following are multiple choice questions (with answers) about college\ + \ computer science.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_college_computer_science" +"task_alias": "college computer science" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_college_mathematics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_mathematics.yaml new file mode 100644 index 0000000000..8d85c49dc1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_college_mathematics" +"description": "The following are multiple choice questions (with answers) about college\ + \ mathematics.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_college_mathematics" +"task_alias": "college mathematics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_college_medicine.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_medicine.yaml new file mode 100644 index 0000000000..132e0b6041 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_medicine.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_college_medicine" +"description": "The following are multiple choice questions (with answers) about college\ + \ medicine.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_college_medicine" +"task_alias": "college medicine" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_college_physics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_physics.yaml new file mode 100644 index 0000000000..77b47241d0 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_college_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_college_physics" +"description": "The following are multiple choice questions (with answers) about college\ + \ physics.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_college_physics" +"task_alias": "college physics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_computer_security.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_computer_security.yaml new file mode 100644 index 0000000000..ba3d60d51b --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_computer_security.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_computer_security" +"description": "The following are multiple choice questions (with answers) about computer\ + \ security.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_computer_security" +"task_alias": "computer security" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_conceptual_physics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_conceptual_physics.yaml new file mode 100644 index 0000000000..e0a84ecc3c --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_conceptual_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_conceptual_physics" +"description": "The following are multiple choice questions (with answers) about conceptual\ + \ physics.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_conceptual_physics" +"task_alias": "conceptual physics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_econometrics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_econometrics.yaml new file mode 100644 index 0000000000..996d44f46e --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_econometrics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_econometrics" +"description": "The following are multiple choice questions (with answers) about econometrics.\n\ + \n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_econometrics" +"task_alias": "econometrics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_electrical_engineering.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_electrical_engineering.yaml new file mode 100644 index 0000000000..ab695e6ab4 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_electrical_engineering.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_electrical_engineering" +"description": "The following are multiple choice questions (with answers) about electrical\ + \ engineering.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_electrical_engineering" +"task_alias": "electrical engineering" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_elementary_mathematics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_elementary_mathematics.yaml new file mode 100644 index 0000000000..dff9fbf25b --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_elementary_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_elementary_mathematics" +"description": "The following are multiple choice questions (with answers) about elementary\ + \ mathematics.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_elementary_mathematics" +"task_alias": "elementary mathematics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_formal_logic.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_formal_logic.yaml new file mode 100644 index 0000000000..e26ed865bc --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_formal_logic.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_formal_logic" +"description": "The following are multiple choice questions (with answers) about formal\ + \ logic.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_formal_logic" +"task_alias": "formal logic" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_global_facts.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_global_facts.yaml new file mode 100644 index 0000000000..ec9c0f42b3 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_global_facts.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_global_facts" +"description": "The following are multiple choice questions (with answers) about global\ + \ facts.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_global_facts" +"task_alias": "global facts" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_biology.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_biology.yaml new file mode 100644 index 0000000000..41ed53cb9a --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_biology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_biology" +"description": "The following are multiple choice questions (with answers) about high\ + \ school biology.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_biology" +"task_alias": "high school biology" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_chemistry.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_chemistry.yaml new file mode 100644 index 0000000000..95a3303f3d --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_chemistry.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_chemistry" +"description": "The following are multiple choice questions (with answers) about high\ + \ school chemistry.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_chemistry" +"task_alias": "high school chemistry" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_computer_science.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_computer_science.yaml new file mode 100644 index 0000000000..e665fb3400 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_computer_science.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_computer_science" +"description": "The following are multiple choice questions (with answers) about high\ + \ school computer science.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_computer_science" +"task_alias": "high school computer science" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_european_history.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_european_history.yaml new file mode 100644 index 0000000000..9d7c1cb8da --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_european_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_european_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school european history.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_european_history" +"task_alias": "high school european history" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_geography.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_geography.yaml new file mode 100644 index 0000000000..8a49800601 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_geography.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_geography" +"description": "The following are multiple choice questions (with answers) about high\ + \ school geography.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_geography" +"task_alias": "high school geography" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_government_and_politics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_government_and_politics.yaml new file mode 100644 index 0000000000..bf66e3a3a7 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_government_and_politics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_government_and_politics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school government and politics.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_government_and_politics" +"task_alias": "high school government and politics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_macroeconomics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_macroeconomics.yaml new file mode 100644 index 0000000000..95e35cd8b1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_macroeconomics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_macroeconomics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school macroeconomics.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_macroeconomics" +"task_alias": "high school macroeconomics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_mathematics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_mathematics.yaml new file mode 100644 index 0000000000..7da2d1859a --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_mathematics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school mathematics.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_mathematics" +"task_alias": "high school mathematics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_microeconomics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_microeconomics.yaml new file mode 100644 index 0000000000..e3af9a2c79 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_microeconomics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_microeconomics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school microeconomics.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_microeconomics" +"task_alias": "high school microeconomics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_physics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_physics.yaml new file mode 100644 index 0000000000..52fb737792 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_physics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school physics.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_physics" +"task_alias": "high school physics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_psychology.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_psychology.yaml new file mode 100644 index 0000000000..df77619cbb --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_psychology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_psychology" +"description": "The following are multiple choice questions (with answers) about high\ + \ school psychology.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_psychology" +"task_alias": "high school psychology" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_statistics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_statistics.yaml new file mode 100644 index 0000000000..2119fb39d1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_statistics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_statistics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school statistics.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_statistics" +"task_alias": "high school statistics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_us_history.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_us_history.yaml new file mode 100644 index 0000000000..2287ae457a --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_us_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_us_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school us history.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_us_history" +"task_alias": "high school us history" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_world_history.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_world_history.yaml new file mode 100644 index 0000000000..5b8f4f37e2 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_high_school_world_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_high_school_world_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school world history.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_high_school_world_history" +"task_alias": "high school world history" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_human_aging.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_human_aging.yaml new file mode 100644 index 0000000000..6a188ddb65 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_human_aging.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_human_aging" +"description": "The following are multiple choice questions (with answers) about human\ + \ aging.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_human_aging" +"task_alias": "human aging" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_human_sexuality.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_human_sexuality.yaml new file mode 100644 index 0000000000..18c45333c5 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_human_sexuality.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_human_sexuality" +"description": "The following are multiple choice questions (with answers) about human\ + \ sexuality.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_human_sexuality" +"task_alias": "human sexuality" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_international_law.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_international_law.yaml new file mode 100644 index 0000000000..05e482d168 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_international_law.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_international_law" +"description": "The following are multiple choice questions (with answers) about international\ + \ law.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_international_law" +"task_alias": "international law" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_jurisprudence.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_jurisprudence.yaml new file mode 100644 index 0000000000..73edd6cb29 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_jurisprudence.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_jurisprudence" +"description": "The following are multiple choice questions (with answers) about jurisprudence.\n\ + \n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_jurisprudence" +"task_alias": "jurisprudence" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_logical_fallacies.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_logical_fallacies.yaml new file mode 100644 index 0000000000..ab18c9270e --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_logical_fallacies.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_logical_fallacies" +"description": "The following are multiple choice questions (with answers) about logical\ + \ fallacies.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_logical_fallacies" +"task_alias": "logical fallacies" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_machine_learning.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_machine_learning.yaml new file mode 100644 index 0000000000..1b833c706f --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_machine_learning.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_machine_learning" +"description": "The following are multiple choice questions (with answers) about machine\ + \ learning.\n\n" +"tag": "mmlusr_answer_only_stem_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_machine_learning" +"task_alias": "machine learning" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_management.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_management.yaml new file mode 100644 index 0000000000..26ec67401d --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_management.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_management" +"description": "The following are multiple choice questions (with answers) about management.\n\ + \n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_management" +"task_alias": "management" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_marketing.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_marketing.yaml new file mode 100644 index 0000000000..23fe03659b --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_marketing.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_marketing" +"description": "The following are multiple choice questions (with answers) about marketing.\n\ + \n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_marketing" +"task_alias": "marketing" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_medical_genetics.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_medical_genetics.yaml new file mode 100644 index 0000000000..63355c88aa --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_medical_genetics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_medical_genetics" +"description": "The following are multiple choice questions (with answers) about medical\ + \ genetics.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_medical_genetics" +"task_alias": "medical genetics" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_miscellaneous.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_miscellaneous.yaml new file mode 100644 index 0000000000..1215392998 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_miscellaneous.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_miscellaneous" +"description": "The following are multiple choice questions (with answers) about miscellaneous.\n\ + \n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_miscellaneous" +"task_alias": "miscellaneous" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_moral_disputes.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_moral_disputes.yaml new file mode 100644 index 0000000000..2f09854fba --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_moral_disputes.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_moral_disputes" +"description": "The following are multiple choice questions (with answers) about moral\ + \ disputes.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_moral_disputes" +"task_alias": "moral disputes" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_moral_scenarios.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_moral_scenarios.yaml new file mode 100644 index 0000000000..dee1c01eb1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_moral_scenarios.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_moral_scenarios" +"description": "The following are multiple choice questions (with answers) about moral\ + \ scenarios.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_moral_scenarios" +"task_alias": "moral scenarios" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_nutrition.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_nutrition.yaml new file mode 100644 index 0000000000..a890f9331b --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_nutrition.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_nutrition" +"description": "The following are multiple choice questions (with answers) about nutrition.\n\ + \n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_nutrition" +"task_alias": "nutrition" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_philosophy.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_philosophy.yaml new file mode 100644 index 0000000000..538dea756c --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_philosophy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_philosophy" +"description": "The following are multiple choice questions (with answers) about philosophy.\n\ + \n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_philosophy" +"task_alias": "philosophy" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_prehistory.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_prehistory.yaml new file mode 100644 index 0000000000..a93b5c4ff7 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_prehistory.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_prehistory" +"description": "The following are multiple choice questions (with answers) about prehistory.\n\ + \n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_prehistory" +"task_alias": "prehistory" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_accounting.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_accounting.yaml new file mode 100644 index 0000000000..b9f45995cb --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_accounting.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_professional_accounting" +"description": "The following are multiple choice questions (with answers) about professional\ + \ accounting.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_professional_accounting" +"task_alias": "professional accounting" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_law.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_law.yaml new file mode 100644 index 0000000000..caccccf0de --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_law.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_professional_law" +"description": "The following are multiple choice questions (with answers) about professional\ + \ law.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_professional_law" +"task_alias": "professional law" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_medicine.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_medicine.yaml new file mode 100644 index 0000000000..374b239c3f --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_medicine.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_professional_medicine" +"description": "The following are multiple choice questions (with answers) about professional\ + \ medicine.\n\n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_professional_medicine" +"task_alias": "professional medicine" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_psychology.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_psychology.yaml new file mode 100644 index 0000000000..58a9fc2d31 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_professional_psychology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_professional_psychology" +"description": "The following are multiple choice questions (with answers) about professional\ + \ psychology.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_professional_psychology" +"task_alias": "professional psychology" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_public_relations.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_public_relations.yaml new file mode 100644 index 0000000000..86cc337b06 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_public_relations.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_public_relations" +"description": "The following are multiple choice questions (with answers) about public\ + \ relations.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_public_relations" +"task_alias": "public relations" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_security_studies.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_security_studies.yaml new file mode 100644 index 0000000000..5e72f02f55 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_security_studies.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_security_studies" +"description": "The following are multiple choice questions (with answers) about security\ + \ studies.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_security_studies" +"task_alias": "security studies" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_sociology.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_sociology.yaml new file mode 100644 index 0000000000..58fa3d8de1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_sociology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_sociology" +"description": "The following are multiple choice questions (with answers) about sociology.\n\ + \n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_sociology" +"task_alias": "sociology" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_us_foreign_policy.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_us_foreign_policy.yaml new file mode 100644 index 0000000000..91a6d66340 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_us_foreign_policy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_us_foreign_policy" +"description": "The following are multiple choice questions (with answers) about us\ + \ foreign policy.\n\n" +"tag": "mmlusr_answer_only_social_sciences_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_us_foreign_policy" +"task_alias": "us foreign policy" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_virology.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_virology.yaml new file mode 100644 index 0000000000..1400fb8421 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_virology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_virology" +"description": "The following are multiple choice questions (with answers) about virology.\n\ + \n" +"tag": "mmlusr_answer_only_other_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_virology" +"task_alias": "virology" diff --git a/lm_eval/tasks/mmlusr/answer_only/answer_only_world_religions.yaml b/lm_eval/tasks/mmlusr/answer_only/answer_only_world_religions.yaml new file mode 100644 index 0000000000..6014213538 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/answer_only_world_religions.yaml @@ -0,0 +1,7 @@ +"dataset_name": "answer_only_world_religions" +"description": "The following are multiple choice questions (with answers) about world\ + \ religions.\n\n" +"tag": "mmlusr_answer_only_humanities_tasks" +"include": "_mmlusr_a_yml" +"task": "mmlusr_answer_only_world_religions" +"task_alias": "world religions" diff --git a/lm_eval/tasks/mmlusr/answer_only/utils.py b/lm_eval/tasks/mmlusr/answer_only/utils.py new file mode 100644 index 0000000000..f1341bd590 --- /dev/null +++ b/lm_eval/tasks/mmlusr/answer_only/utils.py @@ -0,0 +1,19 @@ +import datasets + + +def process_docs(dataset: datasets.Dataset) -> datasets.Dataset: + def _helper(doc): + # Assuming that the 'answer' field in the dataset now contains numbers 0-3 instead of 'A', 'B', 'C', 'D' + answer_list = ["A", "B", "C", "D"] + # Convert numeric index to corresponding letter + answer_index = int(doc["answer"]) # Make sure the answer is an integer + answer_letter = answer_list[answer_index] + + out_doc = { + "questions": doc["question"], + "choices": [doc["choice1"], doc["choice2"], doc["choice3"], doc["choice4"]], + "answer": answer_letter, # Include the letter for clarity + } + return out_doc + + return dataset.map(_helper) diff --git a/lm_eval/tasks/mmlusr/config.py b/lm_eval/tasks/mmlusr/config.py new file mode 100644 index 0000000000..527ebf1f2f --- /dev/null +++ b/lm_eval/tasks/mmlusr/config.py @@ -0,0 +1,154 @@ +""" +Take in a YAML, and output all "other" splits with this YAML +""" + +import argparse +import logging +import os + +import yaml +from tqdm import tqdm + + +eval_logger = logging.getLogger("lm-eval") + + +SUBJECTS = { + "abstract_algebra": "stem", + "anatomy": "stem", + "astronomy": "stem", + "business_ethics": "other", + "clinical_knowledge": "other", + "college_biology": "stem", + "college_chemistry": "stem", + "college_computer_science": "stem", + "college_mathematics": "stem", + "college_medicine": "other", + "college_physics": "stem", + "computer_security": "stem", + "conceptual_physics": "stem", + "econometrics": "social_sciences", + "electrical_engineering": "stem", + "elementary_mathematics": "stem", + "formal_logic": "humanities", + "global_facts": "other", + "high_school_biology": "stem", + "high_school_chemistry": "stem", + "high_school_computer_science": "stem", + "high_school_european_history": "humanities", + "high_school_geography": "social_sciences", + "high_school_government_and_politics": "social_sciences", + "high_school_macroeconomics": "social_sciences", + "high_school_mathematics": "stem", + "high_school_microeconomics": "social_sciences", + "high_school_physics": "stem", + "high_school_psychology": "social_sciences", + "high_school_statistics": "stem", + "high_school_us_history": "humanities", + "high_school_world_history": "humanities", + "human_aging": "other", + "human_sexuality": "social_sciences", + "international_law": "humanities", + "jurisprudence": "humanities", + "logical_fallacies": "humanities", + "machine_learning": "stem", + "management": "other", + "marketing": "other", + "medical_genetics": "other", + "miscellaneous": "other", + "moral_disputes": "humanities", + "moral_scenarios": "humanities", + "nutrition": "other", + "philosophy": "humanities", + "prehistory": "humanities", + "professional_accounting": "other", + "professional_law": "humanities", + "professional_medicine": "other", + "professional_psychology": "social_sciences", + "public_relations": "social_sciences", + "security_studies": "social_sciences", + "sociology": "social_sciences", + "us_foreign_policy": "social_sciences", + "virology": "other", + "world_religions": "humanities", +} + +GROUPS = ["question_and_answer"] + + +def parse_args(): + parser = argparse.ArgumentParser( + description="Generate configuration YAML files for LM Evaluation Harness." + ) + # Path to the base YAML file from which to inherit settings + parser.add_argument( + "--base_yaml_path", + required=True, + help="Path to the base YAML configuration file.", + ) + + # Directory where the generated YAML files will be saved + parser.add_argument( + "--save_dir", + default="/data/local/cat/lm-evaluation-harness/lm_eval/tasks/mmlusr/question_and_answer", + ) + + # Optional prefix to add to task names in the YAML files + parser.add_argument("--task_prefix", default="") + + parser.add_argument("--cot_prompt_path", default=None) + + # Optional prefix to add to group names in the YAML files + parser.add_argument("--group_prefix", default="") + + return parser.parse_args() + + +if __name__ == "__main__": + args = parse_args() + + # Load base YAML configuration + base_yaml_name = os.path.basename(args.base_yaml_path) + with open(args.base_yaml_path, "r", encoding="utf-8") as f: + base_yaml = yaml.full_load(f) + + if args.cot_prompt_path is not None: + import json + + with open(args.cot_prompt_path, encoding="utf-8") as f: + cot_file = json.load(f) + + for group in GROUPS: + for subject, category in tqdm(SUBJECTS.items()): + if args.cot_prompt_path is not None: + description = cot_file[subject] + else: + description = f"The following are multiple choice questions (with answers) about {' '.join(subject.split('_'))}.\n\n" + + yaml_dict = { + "include": base_yaml_name, + "tag": f"mmlusr_{args.group_prefix}{group}_{category}" + if args.group_prefix + else f"mmlusr_{group}_{category}", + "task": f"mmlusr_{args.task_prefix}{group}_{subject}" + if args.task_prefix + else f"mmlusr_{group}_{subject}", + "task_alias": subject.replace("_", " "), + "description": description, + "dataset_name": f"{group}_{subject}", + } + + # File path for saving the generated YAML file + file_save_path = os.path.join(args.save_dir, f"{group}_{subject}.yaml") + with open(file_save_path, "w", encoding="utf-8") as yaml_file: + yaml.dump(yaml_dict, yaml_file, allow_unicode=True, default_style='"') + eval_logger.info(f"Saved YAML for {group} {subject} to {file_save_path}") + + # Save group configuration if specified + if args.group_prefix: + file_save_path = os.path.join( + args.save_prefix_path, args.group_prefix + ".yaml" + ) + eval_logger.info(f"Saving benchmark config to {file_save_path}") + with open(file_save_path, "w", encoding="utf-8") as yaml_file: + yaml.dump(yaml_dict, yaml_file, indent=4, default_flow_style=False) diff --git a/lm_eval/tasks/mmlusr/question_and_answer/_mmlusr_qna_yml b/lm_eval/tasks/mmlusr/question_and_answer/_mmlusr_qna_yml new file mode 100644 index 0000000000..cd307413a8 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/_mmlusr_qna_yml @@ -0,0 +1,16 @@ +dataset_path: NiniCat/MMLU-SR +test_split: test +fewshot_split: train +fewshot_config: + sampler: first_n +output_type: multiple_choice +process_docs: !function utils.process_docs +doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:" +doc_to_choice: ["A", "B", "C", "D"] +doc_to_target: answer +metric_list: + - metric: acc + aggregation: mean + higher_is_better: true +metadata: + version: 0.0 diff --git a/lm_eval/tasks/mmlusr/question_and_answer/_question_and_answer.yaml b/lm_eval/tasks/mmlusr/question_and_answer/_question_and_answer.yaml new file mode 100644 index 0000000000..0f757c34dd --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/_question_and_answer.yaml @@ -0,0 +1,44 @@ +group: mmlusr +group_alias: MMLU-SR (Question & Answer) +task: + - group: mmlusr_qa_stem + group_alias: STEM (Question & Answer) + task: + - mmlusr_question_and_answer_stem_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_qa_other + group_alias: Other (Question & Answer) + task: + - mmlusr_question_and_answer_other_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_qa_social_sciences + group_alias: Social Sciences (Question & Answer) + task: + - mmlusr_question_and_answer_social_sciences_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_qa_humanities + group_alias: Humanities (Question & Answer) + task: + - mmlusr_question_and_answer_humanities_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 +aggregate_metric_list: + - metric: acc + weight_by_size: True +metadata: + version: 1 diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_abstract_algebra.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_abstract_algebra.yaml new file mode 100644 index 0000000000..bfdd80e642 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_abstract_algebra.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_abstract_algebra" +"description": "The following are multiple choice questions (with answers) about abstract\ + \ algebra.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_abstract_algebra" +"task_alias": "abstract algebra" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_anatomy.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_anatomy.yaml new file mode 100644 index 0000000000..316bede423 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_anatomy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_anatomy" +"description": "The following are multiple choice questions (with answers) about anatomy.\n\ + \n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_anatomy" +"task_alias": "anatomy" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_astronomy.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_astronomy.yaml new file mode 100644 index 0000000000..e9f89e1c97 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_astronomy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_astronomy" +"description": "The following are multiple choice questions (with answers) about astronomy.\n\ + \n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_astronomy" +"task_alias": "astronomy" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_business_ethics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_business_ethics.yaml new file mode 100644 index 0000000000..4a46298259 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_business_ethics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_business_ethics" +"description": "The following are multiple choice questions (with answers) about business\ + \ ethics.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_business_ethics" +"task_alias": "business ethics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_clinical_knowledge.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_clinical_knowledge.yaml new file mode 100644 index 0000000000..c43c9a3d54 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_clinical_knowledge.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_clinical_knowledge" +"description": "The following are multiple choice questions (with answers) about clinical\ + \ knowledge.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_clinical_knowledge" +"task_alias": "clinical knowledge" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_biology.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_biology.yaml new file mode 100644 index 0000000000..4f00615bdf --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_biology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_college_biology" +"description": "The following are multiple choice questions (with answers) about college\ + \ biology.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_college_biology" +"task_alias": "college biology" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_chemistry.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_chemistry.yaml new file mode 100644 index 0000000000..837bc22538 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_chemistry.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_college_chemistry" +"description": "The following are multiple choice questions (with answers) about college\ + \ chemistry.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_college_chemistry" +"task_alias": "college chemistry" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_computer_science.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_computer_science.yaml new file mode 100644 index 0000000000..bb1c76395f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_computer_science.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_college_computer_science" +"description": "The following are multiple choice questions (with answers) about college\ + \ computer science.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_college_computer_science" +"task_alias": "college computer science" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_mathematics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_mathematics.yaml new file mode 100644 index 0000000000..08c6e334c2 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_college_mathematics" +"description": "The following are multiple choice questions (with answers) about college\ + \ mathematics.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_college_mathematics" +"task_alias": "college mathematics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_medicine.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_medicine.yaml new file mode 100644 index 0000000000..5c44360ab9 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_medicine.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_college_medicine" +"description": "The following are multiple choice questions (with answers) about college\ + \ medicine.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_college_medicine" +"task_alias": "college medicine" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_physics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_physics.yaml new file mode 100644 index 0000000000..372d6b9f20 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_college_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_college_physics" +"description": "The following are multiple choice questions (with answers) about college\ + \ physics.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_college_physics" +"task_alias": "college physics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_computer_security.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_computer_security.yaml new file mode 100644 index 0000000000..8f85146a96 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_computer_security.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_computer_security" +"description": "The following are multiple choice questions (with answers) about computer\ + \ security.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_computer_security" +"task_alias": "computer security" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_conceptual_physics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_conceptual_physics.yaml new file mode 100644 index 0000000000..7b37e1927f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_conceptual_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_conceptual_physics" +"description": "The following are multiple choice questions (with answers) about conceptual\ + \ physics.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_conceptual_physics" +"task_alias": "conceptual physics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_econometrics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_econometrics.yaml new file mode 100644 index 0000000000..d83b0ba9da --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_econometrics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_econometrics" +"description": "The following are multiple choice questions (with answers) about econometrics.\n\ + \n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_econometrics" +"task_alias": "econometrics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_electrical_engineering.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_electrical_engineering.yaml new file mode 100644 index 0000000000..1898c3a97e --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_electrical_engineering.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_electrical_engineering" +"description": "The following are multiple choice questions (with answers) about electrical\ + \ engineering.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_electrical_engineering" +"task_alias": "electrical engineering" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_elementary_mathematics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_elementary_mathematics.yaml new file mode 100644 index 0000000000..c828feb107 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_elementary_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_elementary_mathematics" +"description": "The following are multiple choice questions (with answers) about elementary\ + \ mathematics.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_elementary_mathematics" +"task_alias": "elementary mathematics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_formal_logic.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_formal_logic.yaml new file mode 100644 index 0000000000..294e99a46b --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_formal_logic.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_formal_logic" +"description": "The following are multiple choice questions (with answers) about formal\ + \ logic.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_formal_logic" +"task_alias": "formal logic" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_global_facts.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_global_facts.yaml new file mode 100644 index 0000000000..79c1a879a6 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_global_facts.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_global_facts" +"description": "The following are multiple choice questions (with answers) about global\ + \ facts.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_global_facts" +"task_alias": "global facts" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_biology.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_biology.yaml new file mode 100644 index 0000000000..90fe29fa46 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_biology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_biology" +"description": "The following are multiple choice questions (with answers) about high\ + \ school biology.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_biology" +"task_alias": "high school biology" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_chemistry.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_chemistry.yaml new file mode 100644 index 0000000000..3e4423ae0e --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_chemistry.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_chemistry" +"description": "The following are multiple choice questions (with answers) about high\ + \ school chemistry.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_chemistry" +"task_alias": "high school chemistry" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_computer_science.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_computer_science.yaml new file mode 100644 index 0000000000..fe1e3d4954 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_computer_science.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_computer_science" +"description": "The following are multiple choice questions (with answers) about high\ + \ school computer science.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_computer_science" +"task_alias": "high school computer science" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_european_history.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_european_history.yaml new file mode 100644 index 0000000000..933d46f021 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_european_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_european_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school european history.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_european_history" +"task_alias": "high school european history" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_geography.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_geography.yaml new file mode 100644 index 0000000000..fa99ad15d1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_geography.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_geography" +"description": "The following are multiple choice questions (with answers) about high\ + \ school geography.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_geography" +"task_alias": "high school geography" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_government_and_politics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_government_and_politics.yaml new file mode 100644 index 0000000000..b4835f4e09 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_government_and_politics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_government_and_politics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school government and politics.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_government_and_politics" +"task_alias": "high school government and politics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_macroeconomics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_macroeconomics.yaml new file mode 100644 index 0000000000..252ba9ceaf --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_macroeconomics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_macroeconomics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school macroeconomics.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_macroeconomics" +"task_alias": "high school macroeconomics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_mathematics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_mathematics.yaml new file mode 100644 index 0000000000..f88bf56047 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_mathematics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school mathematics.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_mathematics" +"task_alias": "high school mathematics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_microeconomics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_microeconomics.yaml new file mode 100644 index 0000000000..bef2656aff --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_microeconomics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_microeconomics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school microeconomics.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_microeconomics" +"task_alias": "high school microeconomics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_physics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_physics.yaml new file mode 100644 index 0000000000..f02cc7fa78 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_physics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school physics.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_physics" +"task_alias": "high school physics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_psychology.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_psychology.yaml new file mode 100644 index 0000000000..df87039f7b --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_psychology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_psychology" +"description": "The following are multiple choice questions (with answers) about high\ + \ school psychology.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_psychology" +"task_alias": "high school psychology" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_statistics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_statistics.yaml new file mode 100644 index 0000000000..cbd9244a67 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_statistics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_statistics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school statistics.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_statistics" +"task_alias": "high school statistics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_us_history.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_us_history.yaml new file mode 100644 index 0000000000..efcf6898d2 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_us_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_us_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school us history.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_us_history" +"task_alias": "high school us history" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_world_history.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_world_history.yaml new file mode 100644 index 0000000000..a6488f284c --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_high_school_world_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_high_school_world_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school world history.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_high_school_world_history" +"task_alias": "high school world history" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_human_aging.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_human_aging.yaml new file mode 100644 index 0000000000..b103d60f71 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_human_aging.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_human_aging" +"description": "The following are multiple choice questions (with answers) about human\ + \ aging.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_human_aging" +"task_alias": "human aging" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_human_sexuality.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_human_sexuality.yaml new file mode 100644 index 0000000000..eac93d5626 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_human_sexuality.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_human_sexuality" +"description": "The following are multiple choice questions (with answers) about human\ + \ sexuality.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_human_sexuality" +"task_alias": "human sexuality" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_international_law.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_international_law.yaml new file mode 100644 index 0000000000..5f7d5403cf --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_international_law.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_international_law" +"description": "The following are multiple choice questions (with answers) about international\ + \ law.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_international_law" +"task_alias": "international law" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_jurisprudence.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_jurisprudence.yaml new file mode 100644 index 0000000000..775bee6338 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_jurisprudence.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_jurisprudence" +"description": "The following are multiple choice questions (with answers) about jurisprudence.\n\ + \n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_jurisprudence" +"task_alias": "jurisprudence" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_logical_fallacies.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_logical_fallacies.yaml new file mode 100644 index 0000000000..1f2706a93f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_logical_fallacies.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_logical_fallacies" +"description": "The following are multiple choice questions (with answers) about logical\ + \ fallacies.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_logical_fallacies" +"task_alias": "logical fallacies" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_machine_learning.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_machine_learning.yaml new file mode 100644 index 0000000000..6299c4d06d --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_machine_learning.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_machine_learning" +"description": "The following are multiple choice questions (with answers) about machine\ + \ learning.\n\n" +"tag": "mmlusr_question_and_answer_stem_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_machine_learning" +"task_alias": "machine learning" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_management.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_management.yaml new file mode 100644 index 0000000000..60ae89e289 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_management.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_management" +"description": "The following are multiple choice questions (with answers) about management.\n\ + \n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_management" +"task_alias": "management" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_marketing.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_marketing.yaml new file mode 100644 index 0000000000..4399b96ea6 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_marketing.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_marketing" +"description": "The following are multiple choice questions (with answers) about marketing.\n\ + \n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_marketing" +"task_alias": "marketing" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_medical_genetics.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_medical_genetics.yaml new file mode 100644 index 0000000000..477b6cf9cf --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_medical_genetics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_medical_genetics" +"description": "The following are multiple choice questions (with answers) about medical\ + \ genetics.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_medical_genetics" +"task_alias": "medical genetics" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_miscellaneous.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_miscellaneous.yaml new file mode 100644 index 0000000000..204ea3ae36 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_miscellaneous.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_miscellaneous" +"description": "The following are multiple choice questions (with answers) about miscellaneous.\n\ + \n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_miscellaneous" +"task_alias": "miscellaneous" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_moral_disputes.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_moral_disputes.yaml new file mode 100644 index 0000000000..4ceb216f67 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_moral_disputes.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_moral_disputes" +"description": "The following are multiple choice questions (with answers) about moral\ + \ disputes.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_moral_disputes" +"task_alias": "moral disputes" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_moral_scenarios.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_moral_scenarios.yaml new file mode 100644 index 0000000000..d434fb1cc1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_moral_scenarios.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_moral_scenarios" +"description": "The following are multiple choice questions (with answers) about moral\ + \ scenarios.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_moral_scenarios" +"task_alias": "moral scenarios" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_nutrition.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_nutrition.yaml new file mode 100644 index 0000000000..e564410f96 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_nutrition.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_nutrition" +"description": "The following are multiple choice questions (with answers) about nutrition.\n\ + \n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_nutrition" +"task_alias": "nutrition" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_philosophy.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_philosophy.yaml new file mode 100644 index 0000000000..bf9c19bc01 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_philosophy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_philosophy" +"description": "The following are multiple choice questions (with answers) about philosophy.\n\ + \n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_philosophy" +"task_alias": "philosophy" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_prehistory.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_prehistory.yaml new file mode 100644 index 0000000000..a966669031 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_prehistory.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_prehistory" +"description": "The following are multiple choice questions (with answers) about prehistory.\n\ + \n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_prehistory" +"task_alias": "prehistory" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_accounting.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_accounting.yaml new file mode 100644 index 0000000000..68973e3761 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_accounting.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_professional_accounting" +"description": "The following are multiple choice questions (with answers) about professional\ + \ accounting.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_professional_accounting" +"task_alias": "professional accounting" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_law.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_law.yaml new file mode 100644 index 0000000000..a158fd123b --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_law.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_professional_law" +"description": "The following are multiple choice questions (with answers) about professional\ + \ law.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_professional_law" +"task_alias": "professional law" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_medicine.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_medicine.yaml new file mode 100644 index 0000000000..738e24e91d --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_medicine.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_professional_medicine" +"description": "The following are multiple choice questions (with answers) about professional\ + \ medicine.\n\n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_professional_medicine" +"task_alias": "professional medicine" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_psychology.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_psychology.yaml new file mode 100644 index 0000000000..26f42c50f8 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_professional_psychology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_professional_psychology" +"description": "The following are multiple choice questions (with answers) about professional\ + \ psychology.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_professional_psychology" +"task_alias": "professional psychology" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_public_relations.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_public_relations.yaml new file mode 100644 index 0000000000..c92e67290e --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_public_relations.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_public_relations" +"description": "The following are multiple choice questions (with answers) about public\ + \ relations.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_public_relations" +"task_alias": "public relations" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_security_studies.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_security_studies.yaml new file mode 100644 index 0000000000..9c5ba3c9c3 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_security_studies.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_security_studies" +"description": "The following are multiple choice questions (with answers) about security\ + \ studies.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_security_studies" +"task_alias": "security studies" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_sociology.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_sociology.yaml new file mode 100644 index 0000000000..3d41098618 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_sociology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_sociology" +"description": "The following are multiple choice questions (with answers) about sociology.\n\ + \n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_sociology" +"task_alias": "sociology" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_us_foreign_policy.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_us_foreign_policy.yaml new file mode 100644 index 0000000000..ced65cb6f0 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_us_foreign_policy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_us_foreign_policy" +"description": "The following are multiple choice questions (with answers) about us\ + \ foreign policy.\n\n" +"tag": "mmlusr_question_and_answer_social_sciences_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_us_foreign_policy" +"task_alias": "us foreign policy" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_virology.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_virology.yaml new file mode 100644 index 0000000000..da7c0ca54e --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_virology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_virology" +"description": "The following are multiple choice questions (with answers) about virology.\n\ + \n" +"tag": "mmlusr_question_and_answer_other_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_virology" +"task_alias": "virology" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_world_religions.yaml b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_world_religions.yaml new file mode 100644 index 0000000000..e44bd345d5 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/question_and_answer_world_religions.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_and_answer_world_religions" +"description": "The following are multiple choice questions (with answers) about world\ + \ religions.\n\n" +"tag": "mmlusr_question_and_answer_humanities_tasks" +"include": "_mmlusr_qna_yml" +"task": "mmlusr_question_and_answer_world_religions" +"task_alias": "world religions" diff --git a/lm_eval/tasks/mmlusr/question_and_answer/utils.py b/lm_eval/tasks/mmlusr/question_and_answer/utils.py new file mode 100644 index 0000000000..f1341bd590 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_and_answer/utils.py @@ -0,0 +1,19 @@ +import datasets + + +def process_docs(dataset: datasets.Dataset) -> datasets.Dataset: + def _helper(doc): + # Assuming that the 'answer' field in the dataset now contains numbers 0-3 instead of 'A', 'B', 'C', 'D' + answer_list = ["A", "B", "C", "D"] + # Convert numeric index to corresponding letter + answer_index = int(doc["answer"]) # Make sure the answer is an integer + answer_letter = answer_list[answer_index] + + out_doc = { + "questions": doc["question"], + "choices": [doc["choice1"], doc["choice2"], doc["choice3"], doc["choice4"]], + "answer": answer_letter, # Include the letter for clarity + } + return out_doc + + return dataset.map(_helper) diff --git a/lm_eval/tasks/mmlusr/question_only/_mmlusr_q_yml b/lm_eval/tasks/mmlusr/question_only/_mmlusr_q_yml new file mode 100644 index 0000000000..cd307413a8 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/_mmlusr_q_yml @@ -0,0 +1,16 @@ +dataset_path: NiniCat/MMLU-SR +test_split: test +fewshot_split: train +fewshot_config: + sampler: first_n +output_type: multiple_choice +process_docs: !function utils.process_docs +doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:" +doc_to_choice: ["A", "B", "C", "D"] +doc_to_target: answer +metric_list: + - metric: acc + aggregation: mean + higher_is_better: true +metadata: + version: 0.0 diff --git a/lm_eval/tasks/mmlusr/question_only/_question_only.yaml b/lm_eval/tasks/mmlusr/question_only/_question_only.yaml new file mode 100644 index 0000000000..8d049ade5f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/_question_only.yaml @@ -0,0 +1,44 @@ +group: mmlusr_question_only +group_alias: MMLU-SR (Question Only) +task: + - group: mmlusr_qo_stem + group_alias: STEM (Question Only) + task: + - mmlusr_question_only_stem_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_qo_other + group_alias: Other (Question Only) + task: + - mmlusr_question_only_other_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_qo_social_sciences + group_alias: Social Sciences (Question Only) + task: + - mmlusr_question_only_social_sciences_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 + - group: mmlusr_qo_humanities + group_alias: Humanities (Question Only) + task: + - mmlusr_question_only_humanities_tasks + aggregate_metric_list: + - metric: acc + weight_by_size: True + metadata: + version: 1 +aggregate_metric_list: + - metric: acc + weight_by_size: True +metadata: + version: 1 diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_abstract_algebra.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_abstract_algebra.yaml new file mode 100644 index 0000000000..3ae764f7b5 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_abstract_algebra.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_abstract_algebra" +"description": "The following are multiple choice questions (with answers) about abstract\ + \ algebra.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_abstract_algebra" +"task_alias": "abstract algebra" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_anatomy.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_anatomy.yaml new file mode 100644 index 0000000000..85fe75793d --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_anatomy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_anatomy" +"description": "The following are multiple choice questions (with answers) about anatomy.\n\ + \n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_anatomy" +"task_alias": "anatomy" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_astronomy.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_astronomy.yaml new file mode 100644 index 0000000000..e32ddfed16 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_astronomy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_astronomy" +"description": "The following are multiple choice questions (with answers) about astronomy.\n\ + \n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_astronomy" +"task_alias": "astronomy" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_business_ethics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_business_ethics.yaml new file mode 100644 index 0000000000..2d6404156f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_business_ethics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_business_ethics" +"description": "The following are multiple choice questions (with answers) about business\ + \ ethics.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_business_ethics" +"task_alias": "business ethics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_clinical_knowledge.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_clinical_knowledge.yaml new file mode 100644 index 0000000000..3339834552 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_clinical_knowledge.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_clinical_knowledge" +"description": "The following are multiple choice questions (with answers) about clinical\ + \ knowledge.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_clinical_knowledge" +"task_alias": "clinical knowledge" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_college_biology.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_college_biology.yaml new file mode 100644 index 0000000000..940bddc28f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_college_biology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_college_biology" +"description": "The following are multiple choice questions (with answers) about college\ + \ biology.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_college_biology" +"task_alias": "college biology" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_college_chemistry.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_college_chemistry.yaml new file mode 100644 index 0000000000..dc7b6cdae3 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_college_chemistry.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_college_chemistry" +"description": "The following are multiple choice questions (with answers) about college\ + \ chemistry.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_college_chemistry" +"task_alias": "college chemistry" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_college_computer_science.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_college_computer_science.yaml new file mode 100644 index 0000000000..7feae9f0b1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_college_computer_science.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_college_computer_science" +"description": "The following are multiple choice questions (with answers) about college\ + \ computer science.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_college_computer_science" +"task_alias": "college computer science" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_college_mathematics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_college_mathematics.yaml new file mode 100644 index 0000000000..3c379c5f5f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_college_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_college_mathematics" +"description": "The following are multiple choice questions (with answers) about college\ + \ mathematics.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_college_mathematics" +"task_alias": "college mathematics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_college_medicine.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_college_medicine.yaml new file mode 100644 index 0000000000..3f035787e3 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_college_medicine.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_college_medicine" +"description": "The following are multiple choice questions (with answers) about college\ + \ medicine.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_college_medicine" +"task_alias": "college medicine" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_college_physics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_college_physics.yaml new file mode 100644 index 0000000000..84e9599e5c --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_college_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_college_physics" +"description": "The following are multiple choice questions (with answers) about college\ + \ physics.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_college_physics" +"task_alias": "college physics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_computer_security.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_computer_security.yaml new file mode 100644 index 0000000000..7ac0de044f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_computer_security.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_computer_security" +"description": "The following are multiple choice questions (with answers) about computer\ + \ security.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_computer_security" +"task_alias": "computer security" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_conceptual_physics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_conceptual_physics.yaml new file mode 100644 index 0000000000..75d50b14ca --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_conceptual_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_conceptual_physics" +"description": "The following are multiple choice questions (with answers) about conceptual\ + \ physics.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_conceptual_physics" +"task_alias": "conceptual physics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_econometrics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_econometrics.yaml new file mode 100644 index 0000000000..edd501fa06 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_econometrics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_econometrics" +"description": "The following are multiple choice questions (with answers) about econometrics.\n\ + \n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_econometrics" +"task_alias": "econometrics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_electrical_engineering.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_electrical_engineering.yaml new file mode 100644 index 0000000000..8be2f268be --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_electrical_engineering.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_electrical_engineering" +"description": "The following are multiple choice questions (with answers) about electrical\ + \ engineering.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_electrical_engineering" +"task_alias": "electrical engineering" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_elementary_mathematics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_elementary_mathematics.yaml new file mode 100644 index 0000000000..0681dbc1df --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_elementary_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_elementary_mathematics" +"description": "The following are multiple choice questions (with answers) about elementary\ + \ mathematics.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_elementary_mathematics" +"task_alias": "elementary mathematics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_formal_logic.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_formal_logic.yaml new file mode 100644 index 0000000000..51ae64f4d6 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_formal_logic.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_formal_logic" +"description": "The following are multiple choice questions (with answers) about formal\ + \ logic.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_formal_logic" +"task_alias": "formal logic" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_global_facts.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_global_facts.yaml new file mode 100644 index 0000000000..4fe24005f6 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_global_facts.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_global_facts" +"description": "The following are multiple choice questions (with answers) about global\ + \ facts.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_global_facts" +"task_alias": "global facts" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_biology.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_biology.yaml new file mode 100644 index 0000000000..030fd2e090 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_biology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_biology" +"description": "The following are multiple choice questions (with answers) about high\ + \ school biology.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_biology" +"task_alias": "high school biology" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_chemistry.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_chemistry.yaml new file mode 100644 index 0000000000..0f7b38e0e9 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_chemistry.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_chemistry" +"description": "The following are multiple choice questions (with answers) about high\ + \ school chemistry.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_chemistry" +"task_alias": "high school chemistry" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_computer_science.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_computer_science.yaml new file mode 100644 index 0000000000..12f9d626c3 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_computer_science.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_computer_science" +"description": "The following are multiple choice questions (with answers) about high\ + \ school computer science.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_computer_science" +"task_alias": "high school computer science" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_european_history.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_european_history.yaml new file mode 100644 index 0000000000..746d125e54 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_european_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_european_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school european history.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_european_history" +"task_alias": "high school european history" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_geography.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_geography.yaml new file mode 100644 index 0000000000..abe2d6f5ac --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_geography.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_geography" +"description": "The following are multiple choice questions (with answers) about high\ + \ school geography.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_geography" +"task_alias": "high school geography" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_government_and_politics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_government_and_politics.yaml new file mode 100644 index 0000000000..5a7fb24eed --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_government_and_politics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_government_and_politics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school government and politics.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_government_and_politics" +"task_alias": "high school government and politics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_macroeconomics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_macroeconomics.yaml new file mode 100644 index 0000000000..ecb0772234 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_macroeconomics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_macroeconomics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school macroeconomics.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_macroeconomics" +"task_alias": "high school macroeconomics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_mathematics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_mathematics.yaml new file mode 100644 index 0000000000..aacf362d2f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_mathematics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_mathematics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school mathematics.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_mathematics" +"task_alias": "high school mathematics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_microeconomics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_microeconomics.yaml new file mode 100644 index 0000000000..dc288c976b --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_microeconomics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_microeconomics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school microeconomics.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_microeconomics" +"task_alias": "high school microeconomics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_physics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_physics.yaml new file mode 100644 index 0000000000..aaa4236332 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_physics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_physics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school physics.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_physics" +"task_alias": "high school physics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_psychology.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_psychology.yaml new file mode 100644 index 0000000000..33085c5c2a --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_psychology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_psychology" +"description": "The following are multiple choice questions (with answers) about high\ + \ school psychology.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_psychology" +"task_alias": "high school psychology" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_statistics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_statistics.yaml new file mode 100644 index 0000000000..ae69628a60 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_statistics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_statistics" +"description": "The following are multiple choice questions (with answers) about high\ + \ school statistics.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_statistics" +"task_alias": "high school statistics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_us_history.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_us_history.yaml new file mode 100644 index 0000000000..cf226b5a43 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_us_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_us_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school us history.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_us_history" +"task_alias": "high school us history" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_high_school_world_history.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_world_history.yaml new file mode 100644 index 0000000000..37b67158f4 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_high_school_world_history.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_high_school_world_history" +"description": "The following are multiple choice questions (with answers) about high\ + \ school world history.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_high_school_world_history" +"task_alias": "high school world history" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_human_aging.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_human_aging.yaml new file mode 100644 index 0000000000..2dd67daf3f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_human_aging.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_human_aging" +"description": "The following are multiple choice questions (with answers) about human\ + \ aging.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_human_aging" +"task_alias": "human aging" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_human_sexuality.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_human_sexuality.yaml new file mode 100644 index 0000000000..bfaee537e7 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_human_sexuality.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_human_sexuality" +"description": "The following are multiple choice questions (with answers) about human\ + \ sexuality.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_human_sexuality" +"task_alias": "human sexuality" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_international_law.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_international_law.yaml new file mode 100644 index 0000000000..fde605633b --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_international_law.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_international_law" +"description": "The following are multiple choice questions (with answers) about international\ + \ law.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_international_law" +"task_alias": "international law" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_jurisprudence.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_jurisprudence.yaml new file mode 100644 index 0000000000..e2f95fd2b1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_jurisprudence.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_jurisprudence" +"description": "The following are multiple choice questions (with answers) about jurisprudence.\n\ + \n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_jurisprudence" +"task_alias": "jurisprudence" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_logical_fallacies.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_logical_fallacies.yaml new file mode 100644 index 0000000000..8e07150c7f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_logical_fallacies.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_logical_fallacies" +"description": "The following are multiple choice questions (with answers) about logical\ + \ fallacies.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_logical_fallacies" +"task_alias": "logical fallacies" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_machine_learning.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_machine_learning.yaml new file mode 100644 index 0000000000..5bccaf4a41 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_machine_learning.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_machine_learning" +"description": "The following are multiple choice questions (with answers) about machine\ + \ learning.\n\n" +"tag": "mmlusr_question_only_stem_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_machine_learning" +"task_alias": "machine learning" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_management.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_management.yaml new file mode 100644 index 0000000000..ca72f214c4 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_management.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_management" +"description": "The following are multiple choice questions (with answers) about management.\n\ + \n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_management" +"task_alias": "management" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_marketing.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_marketing.yaml new file mode 100644 index 0000000000..a47f15b6b4 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_marketing.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_marketing" +"description": "The following are multiple choice questions (with answers) about marketing.\n\ + \n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_marketing" +"task_alias": "marketing" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_medical_genetics.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_medical_genetics.yaml new file mode 100644 index 0000000000..88829f61c1 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_medical_genetics.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_medical_genetics" +"description": "The following are multiple choice questions (with answers) about medical\ + \ genetics.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_medical_genetics" +"task_alias": "medical genetics" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_miscellaneous.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_miscellaneous.yaml new file mode 100644 index 0000000000..ad3de69466 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_miscellaneous.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_miscellaneous" +"description": "The following are multiple choice questions (with answers) about miscellaneous.\n\ + \n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_miscellaneous" +"task_alias": "miscellaneous" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_moral_disputes.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_moral_disputes.yaml new file mode 100644 index 0000000000..4a84f61057 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_moral_disputes.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_moral_disputes" +"description": "The following are multiple choice questions (with answers) about moral\ + \ disputes.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_moral_disputes" +"task_alias": "moral disputes" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_moral_scenarios.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_moral_scenarios.yaml new file mode 100644 index 0000000000..56ef60495f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_moral_scenarios.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_moral_scenarios" +"description": "The following are multiple choice questions (with answers) about moral\ + \ scenarios.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_moral_scenarios" +"task_alias": "moral scenarios" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_nutrition.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_nutrition.yaml new file mode 100644 index 0000000000..2518b48dc9 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_nutrition.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_nutrition" +"description": "The following are multiple choice questions (with answers) about nutrition.\n\ + \n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_nutrition" +"task_alias": "nutrition" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_philosophy.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_philosophy.yaml new file mode 100644 index 0000000000..e7c17c5dd8 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_philosophy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_philosophy" +"description": "The following are multiple choice questions (with answers) about philosophy.\n\ + \n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_philosophy" +"task_alias": "philosophy" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_prehistory.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_prehistory.yaml new file mode 100644 index 0000000000..2297b0f122 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_prehistory.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_prehistory" +"description": "The following are multiple choice questions (with answers) about prehistory.\n\ + \n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_prehistory" +"task_alias": "prehistory" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_professional_accounting.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_professional_accounting.yaml new file mode 100644 index 0000000000..a04374117f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_professional_accounting.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_professional_accounting" +"description": "The following are multiple choice questions (with answers) about professional\ + \ accounting.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_professional_accounting" +"task_alias": "professional accounting" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_professional_law.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_professional_law.yaml new file mode 100644 index 0000000000..8b8e572b9e --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_professional_law.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_professional_law" +"description": "The following are multiple choice questions (with answers) about professional\ + \ law.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_professional_law" +"task_alias": "professional law" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_professional_medicine.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_professional_medicine.yaml new file mode 100644 index 0000000000..c25aa01755 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_professional_medicine.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_professional_medicine" +"description": "The following are multiple choice questions (with answers) about professional\ + \ medicine.\n\n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_professional_medicine" +"task_alias": "professional medicine" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_professional_psychology.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_professional_psychology.yaml new file mode 100644 index 0000000000..89ebc81c7f --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_professional_psychology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_professional_psychology" +"description": "The following are multiple choice questions (with answers) about professional\ + \ psychology.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_professional_psychology" +"task_alias": "professional psychology" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_public_relations.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_public_relations.yaml new file mode 100644 index 0000000000..d23cb2b93d --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_public_relations.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_public_relations" +"description": "The following are multiple choice questions (with answers) about public\ + \ relations.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_public_relations" +"task_alias": "public relations" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_security_studies.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_security_studies.yaml new file mode 100644 index 0000000000..0ff913d961 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_security_studies.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_security_studies" +"description": "The following are multiple choice questions (with answers) about security\ + \ studies.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_security_studies" +"task_alias": "security studies" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_sociology.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_sociology.yaml new file mode 100644 index 0000000000..d705e8485c --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_sociology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_sociology" +"description": "The following are multiple choice questions (with answers) about sociology.\n\ + \n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_sociology" +"task_alias": "sociology" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_us_foreign_policy.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_us_foreign_policy.yaml new file mode 100644 index 0000000000..7a9a7b8743 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_us_foreign_policy.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_us_foreign_policy" +"description": "The following are multiple choice questions (with answers) about us\ + \ foreign policy.\n\n" +"tag": "mmlusr_question_only_social_sciences_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_us_foreign_policy" +"task_alias": "us foreign policy" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_virology.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_virology.yaml new file mode 100644 index 0000000000..034cfa8bdb --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_virology.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_virology" +"description": "The following are multiple choice questions (with answers) about virology.\n\ + \n" +"tag": "mmlusr_question_only_other_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_virology" +"task_alias": "virology" diff --git a/lm_eval/tasks/mmlusr/question_only/question_only_world_religions.yaml b/lm_eval/tasks/mmlusr/question_only/question_only_world_religions.yaml new file mode 100644 index 0000000000..4e66514c8a --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/question_only_world_religions.yaml @@ -0,0 +1,7 @@ +"dataset_name": "question_only_world_religions" +"description": "The following are multiple choice questions (with answers) about world\ + \ religions.\n\n" +"tag": "mmlusr_question_only_humanities_tasks" +"include": "_mmlusr_q_yml" +"task": "mmlusr_question_only_world_religions" +"task_alias": "world religions" diff --git a/lm_eval/tasks/mmlusr/question_only/utils.py b/lm_eval/tasks/mmlusr/question_only/utils.py new file mode 100644 index 0000000000..f1341bd590 --- /dev/null +++ b/lm_eval/tasks/mmlusr/question_only/utils.py @@ -0,0 +1,19 @@ +import datasets + + +def process_docs(dataset: datasets.Dataset) -> datasets.Dataset: + def _helper(doc): + # Assuming that the 'answer' field in the dataset now contains numbers 0-3 instead of 'A', 'B', 'C', 'D' + answer_list = ["A", "B", "C", "D"] + # Convert numeric index to corresponding letter + answer_index = int(doc["answer"]) # Make sure the answer is an integer + answer_letter = answer_list[answer_index] + + out_doc = { + "questions": doc["question"], + "choices": [doc["choice1"], doc["choice2"], doc["choice3"], doc["choice4"]], + "answer": answer_letter, # Include the letter for clarity + } + return out_doc + + return dataset.map(_helper)