Add the Arabic version with refactor to Arabic pica to be in alghafa …

…folder (EleutherAI#1940)
OpenLLM-France · Jun 10, 2024 · 305fb63 · 305fb63
1 parent bea1a85
commit 305fb63
Show file tree

Hide file tree

Showing 4 changed files with 61 additions and 0 deletions.
diff --git a/lm_eval/tasks/alghafa/copa_ar/README.md b/lm_eval/tasks/alghafa/copa_ar/README.md
@@ -0,0 +1,40 @@
+#Arabic COPA
+
+### Paper
+
+Original Title: `COPA`
+
+
+
+The Choice Of Plausible Alternatives (COPA) evaluation provides researchers with a tool for assessing progress in open-domain commonsense causal reasoning.
+
+[Homepage](https://people.ict.usc.edu/~gordon/copa.html)
+
+AlGhafa has translated this dataset to Arabic[AlGafa](https://aclanthology.org/2023.arabicnlp-1.21.pdf)
+
+The link to the Arabic version of the dataset [PICA](https://gitlab.com/tiiuae/alghafa/-/tree/main/arabic-eval/copa_ar)
+
+### Citation
+
+### Groups and Tasks
+
+#### Groups
+
+* Not part of a group yet.
+
+#### Tasks
+
+* `copa_ar`
+
+### Checklist
+
+For adding novel benchmarks/datasets to the library:
+* [x] Is the task an existing benchmark in the literature?
+  * [x] Have you referenced the original paper that introduced the task?
+  * [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+
+
+If other tasks on this dataset are already supported:
+* [x] Is the "Main" variant of this task clearly denoted?
+* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [x] Have you noted which, if any, published evaluation setups are matched by this variant?
diff --git a/lm_eval/tasks/alghafa/copa_ar/copa_ar.yaml b/lm_eval/tasks/alghafa/copa_ar/copa_ar.yaml
@@ -0,0 +1,21 @@
+task: copa_ar
+dataset_path: Hennara/copa_ar
+dataset_name: null
+output_type: multiple_choice
+training_split: null
+validation_split: null
+test_split: test
+doc_to_text: "السؤال: {{query}}\nالجواب:"
+doc_to_choice: "{{[sol1, sol2]}}"
+doc_to_target: label
+should_decontaminate: true
+doc_to_decontamination_query: query
+metric_list:
+  - metric: acc
+    aggregation: mean
+    higher_is_better: true
+  - metric: acc_norm
+    aggregation: mean
+    higher_is_better: true
+metadata:
+  version: 1.0
diff --git a/lm_eval/tasks/piqa_ar/README.md → lm_eval/tasks/alghafa/piqa_ar/README.md b/lm_eval/tasks/piqa_ar/README.md → lm_eval/tasks/alghafa/piqa_ar/README.md
diff --git a/lm_eval/tasks/piqa_ar/piqa_ar.yaml → lm_eval/tasks/alghafa/piqa_ar/piqa_ar.yaml b/lm_eval/tasks/piqa_ar/piqa_ar.yaml → lm_eval/tasks/alghafa/piqa_ar/piqa_ar.yaml