forked from huggingface/lm-evaluation-harness
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* multimedqa * Update medqa.yaml * move to benchmarks folder * add README.md --------- Co-authored-by: Lintang Sutawika <[email protected]>
- Loading branch information
1 parent
692e0f8
commit 818c056
Showing
6 changed files
with
115 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# MultiMedQA (multiple-choice subset) | ||
|
||
### Paper | ||
|
||
Title: Large Language Models Encode Clinical Knowledge | ||
|
||
Abstract: https://arxiv.org/abs/2212.13138 | ||
|
||
A benchmark combining four existing multiple-choice question answering datasets spanning professional medical exams and research queries. | ||
|
||
### Citation | ||
|
||
``` | ||
@Article{Singhal2023, | ||
author={Singhal, Karan and Azizi, Shekoofeh and Tu, Tao and Mahdavi, S. Sara and Wei, Jason and Chung, Hyung Won and Scales, Nathan and Tanwani, Ajay and Cole-Lewis, Heather and Pfohl, Stephen and Payne, Perry and Seneviratne, Martin and Gamble, Paul and Kelly, Chris and Babiker, Abubakr and Sch{\"a}rli, Nathanael and Chowdhery, Aakanksha and Mansfield, Philip and Demner-Fushman, Dina and Ag{\"u}era y Arcas, Blaise and Webster, Dale and Corrado, Greg S. and Matias, Yossi and Chou, Katherine and Gottweis, Juraj and Tomasev, Nenad and Liu, Yun and Rajkomar, Alvin and Barral, Joelle and Semturs, Christopher and Karthikesalingam, Alan and Natarajan, Vivek}, | ||
title={Large language models encode clinical knowledge}, | ||
journal={Nature}, | ||
year={2023}, | ||
month={Aug}, | ||
day={01}, | ||
volume={620}, | ||
number={7972}, | ||
pages={172-180}, | ||
issn={1476-4687}, | ||
doi={10.1038/s41586-023-06291-2}, | ||
url={https://doi.org/10.1038/s41586-023-06291-2} | ||
} | ||
``` | ||
|
||
### Tasks | ||
|
||
* [PubMedQA](https://pubmedqa.github.io/) - 1,000 expert-labeled Q&A pairs where a question and corresponding PubMed abstract as context is given and the a yes/maybe/no answer must be produced. Unlike the rest of the tasks in this suite, PubMedQA is a closed-domain Q&A task. | ||
* [MedQA](https://github.com/jind11/MedQA) - US Medical License Exam (USMLE) questions with 4 or 5 possible answers. Typically, only the 4-option questions are used. | ||
* [MedMCQA](https://medmcqa.github.io/) - 4-option multiple choice questions from Indian medical entrance examinations, >191k total questions. | ||
* [MMLU](https://arxiv.org/abs/2009.03300) - 4-option multiple choice exam questions from a variety of domains. The following 6 domains are utilized here: | ||
* Anatomy | ||
* Clinical Knowledge | ||
* College Medicine | ||
* Medical Genetics | ||
* Professional Medicine | ||
* College Biology | ||
|
||
Note that MultiMedQA also includes some short-form and long-form Q&A tasks (LiveQA, MedicationQA, HealthSearchQA). Evaluation on these tasks is usually done by experts and is not typically performed automatically, and therefore is ignored here. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
group: multimedqa | ||
task: | ||
- pubmedqa | ||
- medmcqa | ||
- medqa_4options | ||
- mmlu_anatomy | ||
- mmlu_clinical_knowledge | ||
- mmlu_college_medicine | ||
- mmlu_medical_genetics | ||
- mmlu_professional_medicine | ||
- mmlu_college_biology |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
task: medmcqa | ||
dataset_path: medmcqa | ||
output_type: multiple_choice | ||
training_split: train | ||
validation_split: validation | ||
test_split: validation | ||
doc_to_text: !function utils_medmcqa.doc_to_text | ||
doc_to_target: cop | ||
doc_to_choice: [ 'A','B','C','D' ] | ||
should_decontaminate: true | ||
doc_to_decontamination_query: "{{question}}" | ||
metric_list: | ||
- metric: acc | ||
aggregation: mean | ||
higher_is_better: true | ||
- metric: acc_norm | ||
aggregation: mean | ||
higher_is_better: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Copied from Master | ||
def doc_to_text(doc) -> str: | ||
""" | ||
Question: <question> | ||
Choices: | ||
A. <choice1> | ||
B. <choice2> | ||
C. <choice3> | ||
D. <choice4> | ||
Answer: | ||
""" | ||
choices = [doc["opa"], doc["opb"], doc["opc"], doc["opd"]] | ||
option_choices = {'A': choices[0], 'B': choices[1], 'C': choices[2], 'D': choices[3]} | ||
|
||
prompt = "Question: " + doc["question"] + "\nChoices:\n" | ||
for choice, option in option_choices.items(): | ||
prompt += f"{choice.upper()}. {option}\n" | ||
prompt += "Answer:" | ||
return prompt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
task: medqa_4options | ||
dataset_path: GBaker/MedQA-USMLE-4-options-hf | ||
output_type: multiple_choice | ||
training_split: train | ||
validation_split: validation | ||
test_split: test | ||
doc_to_text: !function preprocess_medqa.doc_to_text | ||
doc_to_target: !function preprocess_medqa.doc_to_target | ||
doc_to_choice: [ 'A', 'B', 'C', 'D' ] | ||
metric_list: | ||
- metric: acc | ||
aggregation: mean | ||
higher_is_better: true | ||
- metric: acc_norm | ||
aggregation: mean | ||
higher_is_better: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
def doc_to_text(doc) -> str: | ||
option_choices = {'A': doc["ending0"], 'B': doc["ending1"], 'C': doc["ending2"], 'D': doc["ending3"]} | ||
answers = "".join((f"{k}. {v}\n") for k, v in option_choices.items()) | ||
return f"Question: {doc['sent1']}\n{answers}Answer:" | ||
|
||
|
||
def doc_to_target(doc) -> int: | ||
return doc["label"] |