Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a dataset loader for MoNERo #67

Closed
hakunanatasha opened this issue Jan 21, 2022 · 7 comments · Fixed by #516
Closed

Create a dataset loader for MoNERo #67

hakunanatasha opened this issue Jan 21, 2022 · 7 comments · Fixed by #516
Assignees
Labels
CC BY SA Licence CoNLL Format NER Task Romanian Language

Comments

@hakunanatasha
Copy link
Collaborator

From https://www.racai.ro/en/tools/text/

@napsternxg
Copy link
Contributor

#self-assign

@hakunanatasha
Copy link
Collaborator Author

Hi @napsternxg, can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8, no worries if you are not finished but intend to work on this. Please either ping me here at @hakunanatasha or ping the discord admins (with @admins)

@napsternxg
Copy link
Contributor

Hi @hakunanatasha yes I plan to work on this over the weekend.

napsternxg added a commit to napsternxg/biomedical that referenced this issue Apr 11, 2022
@jason-fries
Copy link
Member

Hi @napsternxg
Just a ping on the status of this dataset. Please let us know if you are still working on it and when you plan to submit a PR. Thanks!!

@napsternxg
Copy link
Contributor

Hi @jason-fries thanks for the reminder. I have started work on this in my local branch.
Will send a PR early next week.

@napsternxg
Copy link
Contributor

Details on the paper:

@inproceedings{mitrofan-etal-2019-monero,
    title = "{M}o{NER}o: a Biomedical Gold Standard Corpus for the {R}omanian Language",
    author = "Mitrofan, Maria  and
      Barbu Mititelu, Verginica  and
      Mitrofan, Grigorina",
    booktitle = "Proceedings of the 18th BioNLP Workshop and Shared Task",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-5008",
    doi = "10.18653/v1/W19-5008",
    pages = "71--79",
}

The corpus is licensed under the Creative Commons License Attribution-ShareAlike 4.0 International. Hence, I have downloaded it and uploaded it in tar.gz format here for usage in the data loader.

MoNERo.tar.gz

The dataset doesn't have any offsets information hence I am going to make a text by joining the tokens via space and computing offsets on the resulting dataset.

@napsternxg napsternxg mentioned this issue Apr 25, 2022
8 tasks
@napsternxg
Copy link
Contributor

Added PR: #516

@phlobo phlobo closed this as completed in 0435fcd Dec 9, 2024
@github-project-automation github-project-automation bot moved this from PR in Progress to Done in Biomedical Dataset Hackathon 2022 Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CC BY SA Licence CoNLL Format NER Task Romanian Language
Projects
Development

Successfully merging a pull request may close this issue.

5 participants