Skip to content

ltrc/IL-NER

Repository files navigation

IL-NER

  • This annotated corpora and models have been developed under the Bhashini project funded by Ministry of Electronics and Information Technology (MeitY), Government of India. We thank MeitY for funding this work.

  • This dataset and models are licensed under Creative Commons Attribution 4.0 (CC-BY-4.0) license. The details of the dataset are given below. This dataset was developed by three partnering institutes, IIIT Hyderabad, CDAC Noida, and IIIT Bhubaneshwar.

Language Train Test Dev
Hindi 11076 1389 1389
Urdu 8720 1096 1094
Odia 12109 1519 1517
Telugu 2993 384 384
    @inproceedings{bahad-etal-2024-fine,
    title = "Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages",
    author = "Bahad, Sankalp  and 
    Mishra, Pruthwik  and
    Krishnamurthy, Parameswari  and
    Sharma, Dipti",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-srw.9",
    doi = "10.18653/v1/2024.naacl-srw.9",
    pages = "75--82",
    }

About

NER for Hindi, Urdu, Odia, and Telugu

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages