Skip to content

Latest commit

 

History

History
39 lines (29 loc) · 1.88 KB

README.md

File metadata and controls

39 lines (29 loc) · 1.88 KB

OryzaGP

A dataset for Named Entity Recognition for rice gene

Citation

Please cite with the following reference:

Updating OryzaGP dataset during BLAH7

The aim of this projet is to :

  • update the datasets with new pubmed entries
  • process annotation on gene/protein entities

Step 1: updating OryzaGP with new pubmed entries

Step 2: creating a new pub dictionnary

  • In order to create or use ER tools, we need to setup a dictionary of gene/protein entities
  • a first file named pub_dictionnary.txt was created from the Oryzabase gene dataset
  • a second pub_dictionnary_with_rapdb_URI.txt was created from the same dataset
    • it contains a label/gene name/symbol/synonyms [TAB] RAP-DB database URI
  • a third pub_dictionnary_with_msu.txt was created from the same dataset
    • it contains a label/gene name/symbol/synonyms [TAB] MSU database URI

Step 3: creating PubDictionary Annotators

  • we created 2 annotators for each pub dictionary ( single and batch mode)