Skip to content

Commit

Permalink
Multiple Choice Questions and Large Languages Models: A Case Study wi…
Browse files Browse the repository at this point in the history
…th Fictional Medical Data (EleutherAI#1867)

* glianorex tasks

* Create README.md

* Update README.md

* Update README.md

* fix formatting

* fix internal formatting
  • Loading branch information
maximegmd authored Jun 5, 2024
1 parent 070d31d commit 7257aa2
Show file tree
Hide file tree
Showing 5 changed files with 87 additions and 0 deletions.
20 changes: 20 additions & 0 deletions lm_eval/tasks/glianorex/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Glianorex

The goal of this benchmark is to isolate the test answering capabilities from the content knowledge.

### Paper

Title: Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data

Abstract: https://arxiv.org/abs/2406.02394

To test the relevance of MCQs to assess LLM performance without prior data exposure, we created a fictional medical benchmark and knowledge base on a non-existent gland, the Glianorex. Using GPT-4 we generated a comprehensive textbook on the Glianorex in both English and French, and created multiple-choice questions in both English and French.

### Tasks

All tasks are multiple choice questions with 4 options, only one correct option.

- `glianorex`: Evaluates all tasks listed below.

- `glianorex_en`: Evaluates the accuracy on 264 questions in English.
- `glianorex_fr`: Evaluates the accuracy on 264 questions in French.
14 changes: 14 additions & 0 deletions lm_eval/tasks/glianorex/glianorex.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
task: glianorex
dataset_path: maximegmd/glianorex
output_type: multiple_choice
test_split: train
doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target
doc_to_choice: [ 'A', 'B', 'C', 'D' ]
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
15 changes: 15 additions & 0 deletions lm_eval/tasks/glianorex/glianorex_en.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
task: glianorex_en
dataset_path: maximegmd/glianorex
output_type: multiple_choice
test_split: train
doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target
process_docs: !function preprocess_glianorex.filter_english
doc_to_choice: [ 'A', 'B', 'C', 'D' ]
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
15 changes: 15 additions & 0 deletions lm_eval/tasks/glianorex/glianorex_fr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
task: glianorex_fr
dataset_path: maximegmd/glianorex
output_type: multiple_choice
test_split: train
doc_to_text: !function preprocess_glianorex.doc_to_text
doc_to_target: !function preprocess_glianorex.doc_to_target
process_docs: !function preprocess_glianorex.filter_french
doc_to_choice: [ 'A', 'B', 'C', 'D' ]
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
23 changes: 23 additions & 0 deletions lm_eval/tasks/glianorex/preprocess_glianorex.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import datasets


def doc_to_text(doc) -> str:
option_choices = doc["options"]
answers = "".join((f"{k}. {v}\n") for k, v in option_choices.items())
return f"Question: {doc['question']}\n{answers}Answer:"


def doc_to_target(doc) -> int:
return doc["answer_idx"]


def filter_dataset(dataset: datasets.Dataset, lang: str) -> datasets.Dataset:
return dataset.filter(lambda example: example["language"].startswith(lang))


def filter_french(dataset: datasets.Dataset) -> datasets.Dataset:
return filter_dataset(dataset, "fr")


def filter_english(dataset: datasets.Dataset) -> datasets.Dataset:
return filter_dataset(dataset, "en")

0 comments on commit 7257aa2

Please sign in to comment.