Changes to Biolink instance data export #3

gaurav · 2021-10-13T18:29:52Z

Categories must be biolinkml types like publication
CRF[Publication] --(has_part)--> CDE[Publication? Information Content Entity?] --(keywords/attributes)--> ChEBI, MONDO, etc.

gaurav · 2021-10-13T18:37:10Z

Can we send the CDE question text to the Biolink NER API (e.g. https://api.monarchinitiative.org/api/nlp/annotate/entities?min_length=4&longest_only=false&include_abbreviation=false&include_acronym=false&include_numbers=false&content=COVID-19) and get back a list of referenced concept?
- It probably makes the most sense to do first do this without mapping to LOINC at all -- just see what we can get from the NER codes.
Can we send the results to the Translator Node Normalization SRI service to get normalized nodes for the CURIEs? (e.g. https://nodenormalization-sri.renci.org/1.2/get_normalized_nodes?curie=MONDO:0005015)

gaurav · 2021-10-13T18:43:09Z

Each concept (e.g. normalized disease identifier) is a node in the knowledge graph, linked to the https://biolink.github.io/biolink-model/docs/NamedThingToInformationContentEntityAssociation.html (maybe we need a NER/weak association type)?

To write a KGX file:

Each node has an id and a category (core)
Edges have subjects, objects, predicate (core)
Format is JSON objects, so you don't need the kgx tool per se, but you'll need that to load/validate them

gaurav · 2021-10-13T18:46:24Z

Example data: https://stars.renci.org/var/kgx_data/v3.0/

YaphetKG · 2021-11-23T19:19:25Z

very minor issues

Publication type nodes have very large attributes (specifically the Summary attributes) these could be minimized some how , or potentially be links (or some meta ) pointing to the actual data. (If this is not possible we can potentially invent ways to incorporate this, from the graph bulk loader side)
Edges contain predicate "IAO:0000142" which is a great predicate but we can further biolinkify it via the service call https://bl-lookup-sri.renci.org/resolve_predicate?predicate=IAO%3A0000142&version=2.2.5 which returns data

{
  "IAO:0000142": {
    "identifier": "biolink:mentions",
    "label": "mentions",
    "inverted": false
  }
}

so we can use this as the identifier as the predicate and label as the predicate_label attributes. In the past we have seen cases where the biolink version of predicate is sometimes too broad, and hence the need to retain the original predicate. If that's the case here (although it doesn't seem to be ) we can create relation and relation_label attributes on the edge and store the original (non-biolinkfied version of the predicate there)
Biolinkifying the edges allows us to make use of tranql queries, as it currently doesn't support non biolink curie types for querying edges

gaurav self-assigned this Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to Biolink instance data export #3

Changes to Biolink instance data export #3

gaurav commented Oct 13, 2021

gaurav commented Oct 13, 2021

gaurav commented Oct 13, 2021

gaurav commented Oct 13, 2021

YaphetKG commented Nov 23, 2021 •

edited by gaurav

Loading

Changes to Biolink instance data export #3

Changes to Biolink instance data export #3

Comments

gaurav commented Oct 13, 2021

gaurav commented Oct 13, 2021

gaurav commented Oct 13, 2021

gaurav commented Oct 13, 2021

YaphetKG commented Nov 23, 2021 • edited by gaurav Loading

YaphetKG commented Nov 23, 2021 •

edited by gaurav

Loading