Create a dataset loader for MoNERo #67

hakunanatasha · 2022-01-21T22:40:35Z

From https://www.racai.ro/en/tools/text/

napsternxg · 2022-04-04T22:38:53Z

#self-assign

hakunanatasha · 2022-04-06T16:29:40Z

Hi @napsternxg, can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8, no worries if you are not finished but intend to work on this. Please either ping me here at @hakunanatasha or ping the discord admins (with @admins)

napsternxg · 2022-04-07T06:35:01Z

Hi @hakunanatasha yes I plan to work on this over the weekend.

jason-fries · 2022-04-19T22:00:04Z

Hi @napsternxg
Just a ping on the status of this dataset. Please let us know if you are still working on it and when you plan to submit a PR. Thanks!!

napsternxg · 2022-04-21T11:37:52Z

Hi @jason-fries thanks for the reminder. I have started work on this in my local branch.
Will send a PR early next week.

napsternxg · 2022-04-25T00:14:49Z

Details on the paper:

@inproceedings{mitrofan-etal-2019-monero,
    title = "{M}o{NER}o: a Biomedical Gold Standard Corpus for the {R}omanian Language",
    author = "Mitrofan, Maria  and
      Barbu Mititelu, Verginica  and
      Mitrofan, Grigorina",
    booktitle = "Proceedings of the 18th BioNLP Workshop and Shared Task",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-5008",
    doi = "10.18653/v1/W19-5008",
    pages = "71--79",
}

The corpus is licensed under the Creative Commons License Attribution-ShareAlike 4.0 International. Hence, I have downloaded it and uploaded it in tar.gz format here for usage in the data loader.

MoNERo.tar.gz

The dataset doesn't have any offsets information hence I am going to make a text by joining the tokens via space and computing offsets on the resulting dataset.

napsternxg · 2022-04-25T02:01:19Z

Added PR: #516

ruisi-su added CC BY SA Licence CoNLL Format NER Task Romanian Language labels Jan 27, 2022

jason-fries added this to Biomedical Dataset Hackathon 2022 Jan 31, 2022

github-actions bot assigned qanastek Mar 31, 2022

qanastek removed their assignment Mar 31, 2022

github-actions bot assigned napsternxg Apr 4, 2022

hakunanatasha moved this to In Progress in Biomedical Dataset Hackathon 2022 Apr 8, 2022

napsternxg added a commit to napsternxg/biomedical that referenced this issue Apr 11, 2022

Fixes bigscience-workshop#67 - Add monero

ee234c5

napsternxg mentioned this issue Apr 25, 2022

Closes #67 - Add Monero #516

Merged

8 tasks

phlobo closed this as completed in #516 Dec 9, 2024

phlobo closed this as completed in 0435fcd Dec 9, 2024

github-project-automation bot moved this from PR in Progress to Done in Biomedical Dataset Hackathon 2022 Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a dataset loader for MoNERo #67

Create a dataset loader for MoNERo #67

hakunanatasha commented Jan 21, 2022

napsternxg commented Apr 4, 2022

hakunanatasha commented Apr 6, 2022

napsternxg commented Apr 7, 2022

jason-fries commented Apr 19, 2022

napsternxg commented Apr 21, 2022

napsternxg commented Apr 25, 2022

napsternxg commented Apr 25, 2022

Create a dataset loader for MoNERo #67

Create a dataset loader for MoNERo #67

Comments

hakunanatasha commented Jan 21, 2022

napsternxg commented Apr 4, 2022

hakunanatasha commented Apr 6, 2022

napsternxg commented Apr 7, 2022

jason-fries commented Apr 19, 2022

napsternxg commented Apr 21, 2022

napsternxg commented Apr 25, 2022

napsternxg commented Apr 25, 2022