This repository includes the QQ and QH datasets as described in the paper:
Altammami, S., Atwell, E.(2022) 'Challenging the Transformer-based models with a Classical Arabic dataset: Quran and Hadith'. Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). Marseille, 20–25 June 2022.
Contains 310 balanced set of related and non-related Quran-verse and Hadith-teaching pairs.
Contains 4072 balanced set of related and non-related Quran-verse pairs.
Contains the Arabic Tafseer from Aljlyleen and Almuyaser of each pair in the QQ_Ar_training_4072.csv dataset.
Contains five different English translations of the Quran pairs in the QQ_Ar_training_4072.csv dataset
Contains a balanced dataset of 1024 related and non-related Quran-verse pairs that does not exist in the training dataset QQ_Ar_training_4072.csv