forked from huggingface/lm-evaluation-harness
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added MedConceptsQA Benchmark (EleutherAI#2010)
* Added MedConceptsQA Benchmark * pre-commit factor * update group name * update in naming * changed name * Changed mcqa to med_concepts_qa prefix * Added med_concepts_qa to README.md * Changed config files according the new format * Updated README --------- Co-authored-by: lintangsutawika <[email protected]>
- Loading branch information
1 parent
a7a2923
commit 2b26690
Showing
25 changed files
with
214 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# MedConceptsQA | ||
|
||
### Paper | ||
|
||
Title: `MedConceptsQA: Open Source Medical Concepts QA Benchmark` | ||
|
||
Abstract: https://arxiv.org/abs/2405.07348 | ||
|
||
MedConceptsQA is a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. | ||
|
||
The questions are categorized into three levels of difficulty: easy, medium, and hard. | ||
|
||
Our benchmark serves as a valuable resource for evaluating the | ||
abilities of Large Language Models to interpret medical codes and distinguish | ||
between medical concepts. | ||
|
||
### Citation | ||
|
||
``` | ||
@article{shoham2024medconceptsqa, | ||
title={MedConceptsQA--Open Source Medical Concepts QA Benchmark}, | ||
author={Shoham, Ofir Ben and Rappoport, Nadav}, | ||
journal={arXiv preprint arXiv:2405.07348}, | ||
year={2024} | ||
} | ||
``` | ||
|
||
### Groups and Tasks | ||
|
||
#### Groups | ||
|
||
* `med_concepts_qa`: Contains all the QA tasks (diagnosis, procedures ,and drugs). | ||
|
||
#### Tasks | ||
|
||
|
||
* `med_concepts_qa_icd9cm` - ICD9-CM (diagnosis codes, ICD9 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical Modification) diagnosis codes. | ||
|
||
|
||
* `med_concepts_qa_icd10cm` - ICD10-CM (diagnosis codes, ICD10 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) diagnosis codes. | ||
|
||
|
||
* `med_concepts_qa_icd9proc` - ICD9-Proc (procedure codes, ICD9 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-9-PCS (International Classification of Diseases, 9th Revision, Procedure Coding System) procedure codes. | ||
|
||
|
||
* `med_concepts_qa_icd10proc` - ICD10-Proc (procedure codes, ICD10 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-10-PCS (International Classification of Diseases, 10th Revision, Procedure Coding System) procedure codes. | ||
|
||
|
||
* `med_concepts_qa_atc` - ATC (Anatomical Therapeutic Chemical Classification System) question-answering. This involves providing information, clarifications, and answering questions related to the ATC classification system, which is used for the classification of drugs and other medical products according to the organ or system on which they act and their therapeutic, pharmacological, and chemical properties. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
dataset_path: ofir408/MedConceptsQA | ||
output_type: multiple_choice | ||
description: "Answer A,B,C,D according to the answer to this multiple choice question.\n" | ||
fewshot_split: dev | ||
fewshot_config: | ||
sampler: first_n | ||
num_fewshot: 4 | ||
test_split: test | ||
doc_to_text: "{{question}}\nAnswer:" | ||
doc_to_target: answer_id | ||
doc_to_choice: ['A', 'B', 'C', 'D'] | ||
metric_list: | ||
- metric: acc | ||
aggregation: mean | ||
higher_is_better: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
from typing import List | ||
|
||
import yaml | ||
|
||
|
||
def generate_yaml_content(vocab_name: str, level: str): | ||
content = { | ||
"dataset_name": f"{vocab_name}_{level}", | ||
"tag": f"med_concepts_qa_{vocab_name}_tasks", | ||
"include": "_default_template_yaml", | ||
"task": f"med_concepts_qa_{vocab_name}_{level}", | ||
"task_alias": f"{vocab_name}_{level}", | ||
} | ||
return content | ||
|
||
|
||
def generate_yaml_files( | ||
vocab_names: List[str], levels: List[str], file_name_prefix: str | ||
): | ||
for vocab_name in vocab_names: | ||
for level in levels: | ||
yaml_content = generate_yaml_content(vocab_name, level) | ||
filename = f"{file_name_prefix}_{vocab_name}_{level}.yaml" | ||
with open(filename, "w") as yaml_file: | ||
yaml.dump(yaml_content, yaml_file, default_flow_style=False) | ||
print(f"Done to generated {filename}") | ||
|
||
|
||
if __name__ == "__main__": | ||
generate_yaml_files( | ||
vocab_names=["icd9cm", "icd10cm", "icd9proc", "icd10proc", "atc"], | ||
levels=["easy", "medium", "hard"], | ||
file_name_prefix="med_concepts_qa", | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
group: med_concepts_qa | ||
task: | ||
- med_concepts_qa_icd9cm | ||
- med_concepts_qa_icd10cm | ||
- med_concepts_qa_icd9proc | ||
- med_concepts_qa_icd10proc | ||
- med_concepts_qa_atc | ||
aggregate_metric_list: | ||
- metric: acc | ||
aggregation: mean |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
group: med_concepts_qa_atc | ||
task: | ||
- med_concepts_qa_atc_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
aggregation: mean |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
group: med_concepts_qa_icd10cm | ||
task: | ||
- med_concepts_qa_icd10cm_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
aggregation: mean |
6 changes: 6 additions & 0 deletions
6
lm_eval/tasks/med_concepts_qa/_med_concepts_qa_icd10proc.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
group: med_concepts_qa_icd10proc | ||
task: | ||
- med_concepts_qa_icd10proc_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
aggregation: mean |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
group: med_concepts_qa_icd9cm | ||
task: | ||
- med_concepts_qa_icd9cm_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
aggregation: mean |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
group: med_concepts_qa_icd9proc | ||
task: | ||
- med_concepts_qa_icd9proc_tasks | ||
aggregate_metric_list: | ||
- metric: acc | ||
aggregation: mean |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: atc_easy | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_atc_tasks | ||
task: med_concepts_qa_atc_easy | ||
task_alias: atc_easy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: atc_hard | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_atc_tasks | ||
task: med_concepts_qa_atc_hard | ||
task_alias: atc_hard |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_atc_medium.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: atc_medium | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_atc_tasks | ||
task: med_concepts_qa_atc_medium | ||
task_alias: atc_medium |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd10cm_easy.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd10cm_easy | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd10cm_tasks | ||
task: med_concepts_qa_icd10cm_easy | ||
task_alias: icd10cm_easy |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd10cm_hard.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd10cm_hard | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd10cm_tasks | ||
task: med_concepts_qa_icd10cm_hard | ||
task_alias: icd10cm_hard |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd10cm_medium.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd10cm_medium | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd10cm_tasks | ||
task: med_concepts_qa_icd10cm_medium | ||
task_alias: icd10cm_medium |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd10proc_easy.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd10proc_easy | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd10proc_tasks | ||
task: med_concepts_qa_icd10proc_easy | ||
task_alias: icd10proc_easy |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd10proc_hard.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd10proc_hard | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd10proc_tasks | ||
task: med_concepts_qa_icd10proc_hard | ||
task_alias: icd10proc_hard |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd10proc_medium.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd10proc_medium | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd10proc_tasks | ||
task: med_concepts_qa_icd10proc_medium | ||
task_alias: icd10proc_medium |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd9cm_easy.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd9cm_easy | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd9cm_tasks | ||
task: med_concepts_qa_icd9cm_easy | ||
task_alias: icd9cm_easy |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd9cm_hard.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd9cm_hard | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd9cm_tasks | ||
task: med_concepts_qa_icd9cm_hard | ||
task_alias: icd9cm_hard |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd9cm_medium.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd9cm_medium | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd9cm_tasks | ||
task: med_concepts_qa_icd9cm_medium | ||
task_alias: icd9cm_medium |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd9proc_easy.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd9proc_easy | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd9proc_tasks | ||
task: med_concepts_qa_icd9proc_easy | ||
task_alias: icd9proc_easy |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd9proc_hard.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd9proc_hard | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd9proc_tasks | ||
task: med_concepts_qa_icd9proc_hard | ||
task_alias: icd9proc_hard |
5 changes: 5 additions & 0 deletions
5
lm_eval/tasks/med_concepts_qa/med_concepts_qa_icd9proc_medium.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
dataset_name: icd9proc_medium | ||
include: _default_template_yaml | ||
tag: med_concepts_qa_icd9proc_tasks | ||
task: med_concepts_qa_icd9proc_medium | ||
task_alias: icd9proc_medium |