Welcome to the Arabic Emojis Dataset Repository!๐
The aim of this repository is to serve as a linguistic resource for NLP researchers focusing on Arabic language and dialects. The included CSV file and associated package provides an easy way to substitute emojis with their corresponding Arabic descriptions, thereby enhancing interpretability and ensuring consistent representation in Arabic language and dialect datasets, including Moroccan and Tunisian Darija.
The dataset was created by processing the emoji dataset available at https://github.com/datasets/emojis. We wholeheartedly encourage contributions to expand and enrich this resource further.
A Python package associated with this dataset! You can programmatically access emoji descriptions using a dedicated function provided by this package.
You can install our package via pip:
pip install arabic-emojipedia
from arabic_emojipedia.emoji_description import get_emoji_description
emoji = "๐"
description = get_emoji_description(emoji)
print(f"Description for {emoji}: {description}")
The CSV file follows a structured format as shown below:
Emoji | Name |
---|---|
๐ | ุฌุงู ูุงููุณ |
๐ | ุดุฌุฑุฉ ุนูุฏ ุงูู ููุงุฏ |
๐ | ุงูุนุงุจ ูุงุฑูู |
๐ | ุงูู ุงุณุฉ |
๐งจ | ู ูุฑูุนุฉ ูุงุฑูุฉ |
โจ | ุจุฑูู |
๐ | ุจุงููู |
๐ | ุจูุจุฑ ุงูุญุฒุจ |
The file contains a total of 4,733 emojis.
๐ Feel free to explore and leverage this dataset to enhance your NLP research in Arabic. We look forward to your contributions to make this resource even more comprehensive and valuable.
This project is licensed under the terms of the MIT License. See the LICENSE file for details.