Skip to content

A multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains.

License

Notifications You must be signed in to change notification settings

McGill-NLP/Injongo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Injongo Dataset

A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
language = [
    "amh", "ewe", "hau", "ibo", "kin", 
    "lin", "lug", "orm", "sna", "sot", 
    "swa", "twi", "wol", "xho", "yor", "zul"
]

Data Location

The Injongo dataset is available at Masakhane-NLU: Conversation AI and Benchmark datasets for African languages

Raw Data: data/output: csv format for the raw dataset, including logical_form and spans

Item Example: split,domain,intent,text,spans,logical_form test,banking,balance,በ አባይ ባንክ አካውንት ለሶፋ የሚሆን ገንዘብ አለኝ,"2:9:SL:BANK_NAME,17:19:SL:SHOPPING_ITEM",[IN:balance [SL:BANK_NAME አባይ ባንክ] [SL:SHOPPING_ITEM ሶፋ] ]

Package Install

pip install -e .

Additional Dependencies:

More details of code can be explored with numbered jupyter notebooks (*.ipynb).

Environment Variables (.env file)

OPENAI_API_KEY=sk-proj-
GEMINI_API_KEY=ABCD

About

A multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages