forked from huggingface/lm-evaluation-harness
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new Lambada translations (EleutherAI#1897)
* added tasks and task family descriptors * configs for the new lambada translations * continue work on task list w/ links; slightly reorganize README * Apply suggestions from code review * Rename file so that it'll preview in Github when viewing lm_eval/tasks folder * Update new_task_guide.md * Update README.md * run linter * Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs * fix typo * update `lm_eval/tasks/README.md` with task description --------- Co-authored-by: Harish Vadaparty <[email protected]> Co-authored-by: anthony <[email protected]> Co-authored-by: Hailey Schoelkopf <[email protected]> Co-authored-by: haileyschoelkopf <[email protected]>
- Loading branch information
1 parent
33eef48
commit b9d96b5
Showing
10 changed files
with
96 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# LAMBADA | ||
|
||
### Paper | ||
The LAMBADA dataset: Word prediction requiring a broad discourse context | ||
https://arxiv.org/pdf/1606.06031.pdf | ||
|
||
LAMBADA is a dataset to evaluate the capabilities of computational models for text | ||
understanding by means of a word prediction task. LAMBADA is a collection of narrative | ||
passages sharing the characteristic that human subjects are able to guess their last | ||
word if they are exposed to the whole passage, but not if they only see the last | ||
sentence preceding the target word. To succeed on LAMBADA, computational models | ||
cannot simply rely on local context, but must be able to keep track of information | ||
in the broader discourse. | ||
|
||
Homepage: https://zenodo.org/record/2630551#.X4Xzn5NKjUI | ||
|
||
### Citation | ||
|
||
@misc{ | ||
author={Paperno, Denis and Kruszewski, Germán and Lazaridou, Angeliki and Pham, Quan Ngoc and Bernardi, Raffaella and Pezzelle, Sandro and Baroni, Marco and Boleda, Gemma and Fernández, Raquel}, | ||
title={The LAMBADA dataset}, | ||
DOI={10.5281/zenodo.2630551}, | ||
publisher={Zenodo}, | ||
year={2016}, | ||
month={Aug} | ||
} | ||
|
||
@article{bellagente2024stable, | ||
title={Stable LM 2 1.6 B Technical Report}, | ||
author={Bellagente, Marco and Tow, Jonathan and Mahan, Dakota and Phung, Duy and Zhuravinskyi, Maksym and Adithyan, Reshinth and Baicoianu, James and Brooks, Ben and Cooper, Nathan and Datta, Ashish and others}, | ||
journal={arXiv preprint arXiv:2402.17834}, | ||
year={2024} | ||
} | ||
|
||
### Groups and Tasks | ||
|
||
#### Groups | ||
|
||
* `lambada_multilingual_stablelm`: Evaluates all `lambada_mt_stablelm_X` tasks | ||
|
||
#### Tasks | ||
|
||
* `lambada_mt_stablelm_{en, fr, de, it, es}`: Machine-translated versions of OpenAI's Lambada variant as reported in "Stable LM 2 1.6 B Technical Report" (Bellagente et. al.). | ||
|
||
### Checklist | ||
|
||
* [x] Is the task an existing benchmark in the literature? | ||
* [x] Have you referenced the original paper that introduced the task? | ||
* [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test? | ||
(This task is novel to the Evaluation Harness, and has been checked against v0.3.0 of the harness.) | ||
|
||
|
||
If other tasks on this dataset are already supported: | ||
* [x] Is the "Main" variant of this task clearly denoted? | ||
* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates? | ||
* [x] Have you noted which, if any, published evaluation setups are matched by this variant? |
3 changes: 3 additions & 0 deletions
3
lm_eval/tasks/lambada_multilingual_stablelm/lambada_mt_stablelm_de.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
include: lambada_mt_stablelm_en.yaml | ||
task: lambada_openai_mt_stablelm_de | ||
dataset_name: de |
20 changes: 20 additions & 0 deletions
20
lm_eval/tasks/lambada_multilingual_stablelm/lambada_mt_stablelm_en.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
group: | ||
- lambada_multilingual_stablelm | ||
task: lambada_openai_mt_stablelm_en | ||
dataset_path: marcob/lambada_multilingual | ||
dataset_name: en | ||
output_type: loglikelihood | ||
test_split: test | ||
doc_to_text: "{{text.split(' ')[:-1]|join(' ')}}" | ||
doc_to_target: "{{' '+text.split(' ')[-1]}}" | ||
should_decontaminate: true | ||
doc_to_decontamination_query: "{{text}}" | ||
metric_list: | ||
- metric: perplexity | ||
aggregation: perplexity | ||
higher_is_better: false | ||
- metric: acc | ||
aggregation: mean | ||
higher_is_better: true | ||
metadata: | ||
version: 1.0 |
3 changes: 3 additions & 0 deletions
3
lm_eval/tasks/lambada_multilingual_stablelm/lambada_mt_stablelm_es.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
include: lambada_mt_stablelm_en.yaml | ||
task: lambada_openai_mt_stablelm_es | ||
dataset_name: es |
3 changes: 3 additions & 0 deletions
3
lm_eval/tasks/lambada_multilingual_stablelm/lambada_mt_stablelm_fr.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
include: lambada_mt_stablelm_en.yaml | ||
task: lambada_openai_mt_stablelm_fr | ||
dataset_name: fr |
3 changes: 3 additions & 0 deletions
3
lm_eval/tasks/lambada_multilingual_stablelm/lambada_mt_stablelm_it.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
include: lambada_mt_stablelm_en.yaml | ||
task: lambada_openai_mt_stablelm_it | ||
dataset_name: it |
3 changes: 3 additions & 0 deletions
3
lm_eval/tasks/lambada_multilingual_stablelm/lambada_mt_stablelm_nl.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
include: lambada_mt_stablelm_en.yaml | ||
task: lambada_openai_mt_stablelm_nl | ||
dataset_name: nl |
3 changes: 3 additions & 0 deletions
3
lm_eval/tasks/lambada_multilingual_stablelm/lambada_mt_stablelm_pt.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
include: lambada_mt_stablelm_en.yaml | ||
task: lambada_openai_mt_stablelm_pt | ||
dataset_name: pt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters