Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to Biolink instance data export #3

Open
gaurav opened this issue Oct 13, 2021 · 4 comments
Open

Changes to Biolink instance data export #3

gaurav opened this issue Oct 13, 2021 · 4 comments
Assignees

Comments

@gaurav
Copy link
Member

gaurav commented Oct 13, 2021

  1. Categories must be biolinkml types like publication
  2. CRF[Publication] --(has_part)--> CDE[Publication? Information Content Entity?] --(keywords/attributes)--> ChEBI, MONDO, etc.
@gaurav gaurav self-assigned this Oct 13, 2021
@gaurav
Copy link
Member Author

gaurav commented Oct 13, 2021

  1. Can we send the CDE question text to the Biolink NER API (e.g. https://api.monarchinitiative.org/api/nlp/annotate/entities?min_length=4&longest_only=false&include_abbreviation=false&include_acronym=false&include_numbers=false&content=COVID-19) and get back a list of referenced concept?
    • It probably makes the most sense to do first do this without mapping to LOINC at all -- just see what we can get from the NER codes.
  2. Can we send the results to the Translator Node Normalization SRI service to get normalized nodes for the CURIEs? (e.g. https://nodenormalization-sri.renci.org/1.2/get_normalized_nodes?curie=MONDO:0005015)

@gaurav
Copy link
Member Author

gaurav commented Oct 13, 2021

Each concept (e.g. normalized disease identifier) is a node in the knowledge graph, linked to the https://biolink.github.io/biolink-model/docs/NamedThingToInformationContentEntityAssociation.html (maybe we need a NER/weak association type)?

To write a KGX file:

  • Each node has an id and a category (core)
  • Edges have subjects, objects, predicate (core)
  • Format is JSON objects, so you don't need the kgx tool per se, but you'll need that to load/validate them

@gaurav
Copy link
Member Author

gaurav commented Oct 13, 2021

Example data: https://stars.renci.org/var/kgx_data/v3.0/

@YaphetKG
Copy link

YaphetKG commented Nov 23, 2021

very minor issues

  • Publication type nodes have very large attributes (specifically the Summary attributes) these could be minimized some how , or potentially be links (or some meta ) pointing to the actual data. (If this is not possible we can potentially invent ways to incorporate this, from the graph bulk loader side)

  • Edges contain predicate "IAO:0000142" which is a great predicate but we can further biolinkify it via the service call https://bl-lookup-sri.renci.org/resolve_predicate?predicate=IAO%3A0000142&version=2.2.5 which returns data

{
  "IAO:0000142": {
    "identifier": "biolink:mentions",
    "label": "mentions",
    "inverted": false
  }
}

so we can use this as the identifier as the predicate and label as the predicate_label attributes. In the past we have seen cases where the biolink version of predicate is sometimes too broad, and hence the need to retain the original predicate. If that's the case here (although it doesn't seem to be ) we can create relation and relation_label attributes on the edge and store the original (non-biolinkfied version of the predicate there)
Biolinkifying the edges allows us to make use of tranql queries, as it currently doesn't support non biolink curie types for querying edges

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants