This repository contains the dataset and code for the paper Comparative Study of Multilingual Idioms and Similes in Large Language Models.
Abstract: This study addresses the gap in the literature concerning the comparative performance of LLMs in interpreting different types of figurative language across multiple languages. By evaluating LLMs using two multilingual datasets on simile and idiom interpretation, we explore the effectiveness of various prompt engineering strategies, including chain-of-thought, few-shot, and English translation prompts. We extend the language of these datasets to Persian as well by building two new evaluation sets. Our comprehensive assessment involves both closed-source (GPT-3.5, GPT-4o mini, Gemini 1.5), and open-source models (Llama 3.1, Qwen2), highlighting significant differences in performance across languages and figurative types. Our findings reveal that while prompt engineering methods are generally effective, their success varies by figurative type, language, and model. We also observe that open-source models struggle particularly with low-resource languages in similes. Additionally, idiom interpretation is nearing saturation for many languages, necessitating more challenging evaluations.
The author emails are {paria.khoshtab, namazifard, mostafa.masoudi, ali.akhgary, samin.mehdizadeh, y.yaghoobzadeh}@ut.ac.ir
Please contact [] first if you have any questions.
note: add proceedings version after the conference
@misc{khoshtab2024comparativestudymultilingualidioms,
title={Comparative Study of Multilingual Idioms and Similes in Large Language Models},
author={Paria Khoshtab and Danial Namazifard and Mostafa Masoudi and Ali Akhgary and Samin Mahdizadeh Sani and Yadollah Yaghoobzadeh},
year={2024},
eprint={2410.16461},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.16461},
}