Skip to content

Commit

Permalink
Added MedConceptsQA Benchmark (EleutherAI#2010)
Browse files Browse the repository at this point in the history
* Added MedConceptsQA Benchmark

* pre-commit factor

* update group name

* update in naming

* changed name

* Changed mcqa to med_concepts_qa prefix

* Added med_concepts_qa to README.md

* Changed config files according the new format

* Updated README

---------

Co-authored-by: lintangsutawika <[email protected]>
  • Loading branch information
Ofir408 and lintangsutawika authored Jul 14, 2024
1 parent a7a2923 commit 2b26690
Show file tree
Hide file tree
Showing 25 changed files with 214 additions and 0 deletions.
1 change: 1 addition & 0 deletions lm_eval/tasks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@
| [logiqa2](logiqa2/README.md) | Large-scale logical reasoning dataset adapted from the Chinese Civil Service Examination. | English, Chinese |
| [mathqa](mathqa/README.md) | Question answering tasks involving mathematical reasoning and problem-solving. | English |
| [mc_taco](mc_taco/README.md) | Question-answer pairs that require temporal commonsense comprehension. | English |
| [med_concepts_qa](med_concepts_qa/README.md) | Benchmark for evaluating LLMs on their abilities to interpret medical codes and distinguish between medical concept. | English |
| medmcqa | Medical multiple choice questions assessing detailed medical knowledge. | English |
| medqa | Multiple choice question answering based on the United States Medical License Exams. | |
| [mgsm](mgsm/README.md) | Benchmark of multilingual grade-school math problems. | Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, Telugu |
Expand Down
49 changes: 49 additions & 0 deletions lm_eval/tasks/med_concepts_qa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# MedConceptsQA

### Paper

Title: `MedConceptsQA: Open Source Medical Concepts QA Benchmark`

Abstract: https://arxiv.org/abs/2405.07348

MedConceptsQA is a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs.

The questions are categorized into three levels of difficulty: easy, medium, and hard.

Our benchmark serves as a valuable resource for evaluating the
abilities of Large Language Models to interpret medical codes and distinguish
between medical concepts.

### Citation

```
@article{shoham2024medconceptsqa,
title={MedConceptsQA--Open Source Medical Concepts QA Benchmark},
author={Shoham, Ofir Ben and Rappoport, Nadav},
journal={arXiv preprint arXiv:2405.07348},
year={2024}
}
```

### Groups and Tasks

#### Groups

* `med_concepts_qa`: Contains all the QA tasks (diagnosis, procedures ,and drugs).

#### Tasks


* `med_concepts_qa_icd9cm` - ICD9-CM (diagnosis codes, ICD9 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical Modification) diagnosis codes.


* `med_concepts_qa_icd10cm` - ICD10-CM (diagnosis codes, ICD10 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-10-CM (International Classification of Diseases, 10th Revision, Clinical Modification) diagnosis codes.


* `med_concepts_qa_icd9proc` - ICD9-Proc (procedure codes, ICD9 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-9-PCS (International Classification of Diseases, 9th Revision, Procedure Coding System) procedure codes.


* `med_concepts_qa_icd10proc` - ICD10-Proc (procedure codes, ICD10 format) question-answering. This involves providing information, clarifications, and answering questions related to ICD-10-PCS (International Classification of Diseases, 10th Revision, Procedure Coding System) procedure codes.


* `med_concepts_qa_atc` - ATC (Anatomical Therapeutic Chemical Classification System) question-answering. This involves providing information, clarifications, and answering questions related to the ATC classification system, which is used for the classification of drugs and other medical products according to the organ or system on which they act and their therapeutic, pharmacological, and chemical properties.
15 changes: 15 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_default_template_yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
dataset_path: ofir408/MedConceptsQA
output_type: multiple_choice
description: "Answer A,B,C,D according to the answer to this multiple choice question.\n"
fewshot_split: dev
fewshot_config:
sampler: first_n
num_fewshot: 4
test_split: test
doc_to_text: "{{question}}\nAnswer:"
doc_to_target: answer_id
doc_to_choice: ['A', 'B', 'C', 'D']
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
34 changes: 34 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_generate_configs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from typing import List

import yaml


def generate_yaml_content(vocab_name: str, level: str):
content = {
"dataset_name": f"{vocab_name}_{level}",
"tag": f"med_concepts_qa_{vocab_name}_tasks",
"include": "_default_template_yaml",
"task": f"med_concepts_qa_{vocab_name}_{level}",
"task_alias": f"{vocab_name}_{level}",
}
return content


def generate_yaml_files(
vocab_names: List[str], levels: List[str], file_name_prefix: str
):
for vocab_name in vocab_names:
for level in levels:
yaml_content = generate_yaml_content(vocab_name, level)
filename = f"{file_name_prefix}_{vocab_name}_{level}.yaml"
with open(filename, "w") as yaml_file:
yaml.dump(yaml_content, yaml_file, default_flow_style=False)
print(f"Done to generated {filename}")


if __name__ == "__main__":
generate_yaml_files(
vocab_names=["icd9cm", "icd10cm", "icd9proc", "icd10proc", "atc"],
levels=["easy", "medium", "hard"],
file_name_prefix="med_concepts_qa",
)
10 changes: 10 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_med_concepts_qa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
group: med_concepts_qa
task:
- med_concepts_qa_icd9cm
- med_concepts_qa_icd10cm
- med_concepts_qa_icd9proc
- med_concepts_qa_icd10proc
- med_concepts_qa_atc
aggregate_metric_list:
- metric: acc
aggregation: mean
6 changes: 6 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_med_concepts_qa_atc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
group: med_concepts_qa_atc
task:
- med_concepts_qa_atc_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
6 changes: 6 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_med_concepts_qa_icd10cm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
group: med_concepts_qa_icd10cm
task:
- med_concepts_qa_icd10cm_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
6 changes: 6 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_med_concepts_qa_icd10proc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
group: med_concepts_qa_icd10proc
task:
- med_concepts_qa_icd10proc_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
6 changes: 6 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_med_concepts_qa_icd9cm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
group: med_concepts_qa_icd9cm
task:
- med_concepts_qa_icd9cm_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
6 changes: 6 additions & 0 deletions lm_eval/tasks/med_concepts_qa/_med_concepts_qa_icd9proc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
group: med_concepts_qa_icd9proc
task:
- med_concepts_qa_icd9proc_tasks
aggregate_metric_list:
- metric: acc
aggregation: mean
5 changes: 5 additions & 0 deletions lm_eval/tasks/med_concepts_qa/med_concepts_qa_atc_easy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: atc_easy
include: _default_template_yaml
tag: med_concepts_qa_atc_tasks
task: med_concepts_qa_atc_easy
task_alias: atc_easy
5 changes: 5 additions & 0 deletions lm_eval/tasks/med_concepts_qa/med_concepts_qa_atc_hard.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: atc_hard
include: _default_template_yaml
tag: med_concepts_qa_atc_tasks
task: med_concepts_qa_atc_hard
task_alias: atc_hard
5 changes: 5 additions & 0 deletions lm_eval/tasks/med_concepts_qa/med_concepts_qa_atc_medium.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: atc_medium
include: _default_template_yaml
tag: med_concepts_qa_atc_tasks
task: med_concepts_qa_atc_medium
task_alias: atc_medium
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd10cm_easy
include: _default_template_yaml
tag: med_concepts_qa_icd10cm_tasks
task: med_concepts_qa_icd10cm_easy
task_alias: icd10cm_easy
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd10cm_hard
include: _default_template_yaml
tag: med_concepts_qa_icd10cm_tasks
task: med_concepts_qa_icd10cm_hard
task_alias: icd10cm_hard
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd10cm_medium
include: _default_template_yaml
tag: med_concepts_qa_icd10cm_tasks
task: med_concepts_qa_icd10cm_medium
task_alias: icd10cm_medium
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd10proc_easy
include: _default_template_yaml
tag: med_concepts_qa_icd10proc_tasks
task: med_concepts_qa_icd10proc_easy
task_alias: icd10proc_easy
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd10proc_hard
include: _default_template_yaml
tag: med_concepts_qa_icd10proc_tasks
task: med_concepts_qa_icd10proc_hard
task_alias: icd10proc_hard
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd10proc_medium
include: _default_template_yaml
tag: med_concepts_qa_icd10proc_tasks
task: med_concepts_qa_icd10proc_medium
task_alias: icd10proc_medium
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd9cm_easy
include: _default_template_yaml
tag: med_concepts_qa_icd9cm_tasks
task: med_concepts_qa_icd9cm_easy
task_alias: icd9cm_easy
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd9cm_hard
include: _default_template_yaml
tag: med_concepts_qa_icd9cm_tasks
task: med_concepts_qa_icd9cm_hard
task_alias: icd9cm_hard
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd9cm_medium
include: _default_template_yaml
tag: med_concepts_qa_icd9cm_tasks
task: med_concepts_qa_icd9cm_medium
task_alias: icd9cm_medium
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd9proc_easy
include: _default_template_yaml
tag: med_concepts_qa_icd9proc_tasks
task: med_concepts_qa_icd9proc_easy
task_alias: icd9proc_easy
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd9proc_hard
include: _default_template_yaml
tag: med_concepts_qa_icd9proc_tasks
task: med_concepts_qa_icd9proc_hard
task_alias: icd9proc_hard
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
dataset_name: icd9proc_medium
include: _default_template_yaml
tag: med_concepts_qa_icd9proc_tasks
task: med_concepts_qa_icd9proc_medium
task_alias: icd9proc_medium

0 comments on commit 2b26690

Please sign in to comment.