Comparative Study of Multilingual Idioms and Similes in Large Language Models

Introduction

This repository contains the dataset and code for the paper Comparative Study of Multilingual Idioms and Similes in Large Language Models.

Abstract: This study addresses the gap in the literature concerning the comparative performance of LLMs in interpreting different types of figurative language across multiple languages. By evaluating LLMs using two multilingual datasets on simile and idiom interpretation, we explore the effectiveness of various prompt engineering strategies, including chain-of-thought, few-shot, and English translation prompts. We extend the language of these datasets to Persian as well by building two new evaluation sets. Our comprehensive assessment involves both closed-source (GPT-3.5, GPT-4o mini, Gemini 1.5), and open-source models (Llama 3.1, Qwen2), highlighting significant differences in performance across languages and figurative types. Our findings reveal that while prompt engineering methods are generally effective, their success varies by figurative type, language, and model. We also observe that open-source models struggle particularly with low-resource languages in similes. Additionally, idiom interpretation is nearing saturation for many languages, necessitating more challenging evaluations.

Contact

The author emails are {paria.khoshtab, namazifard, mostafa.masoudi, ali.akhgary, samin.mehdizadeh, y.yaghoobzadeh}@ut.ac.ir

Please contact [] first if you have any questions.

Citation

note: add proceedings version after the conference

@misc{khoshtab2024comparativestudymultilingualidioms,
      title={Comparative Study of Multilingual Idioms and Similes in Large Language Models}, 
      author={Paria Khoshtab and Danial Namazifard and Mostafa Masoudi and Ali Akhgary and Samin Mahdizadeh Sani and Yadollah Yaghoobzadeh},
      year={2024},
      eprint={2410.16461},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.16461}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparative Study of Multilingual Idioms and Similes in Large Language Models

Table of Contents

Introduction

Contact

Citation

About

Releases

Packages

Languages

namazifard/Multilingual-Idioms-Similes

Folders and files

Latest commit

History

Repository files navigation

Comparative Study of Multilingual Idioms and Similes in Large Language Models

Table of Contents

Introduction

Contact

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages