Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor annotation extension facet / make more parts of annotation extensions searchable #201

Open
cmungall opened this issue May 15, 2015 · 4 comments

Comments

@cmungall
Copy link
Member

Currently we expose a facet 'annotation extensions', which is useful for curators but does not directly relate to a meaningful biological question. Also, it assumes classes (see http://jira.geneontology.org/browse/GO-838)

We will refactor this to expose facets that are biological categories. We will start with 'participant' and 'location' (TBD: how should the hierarchy of facets be handled in the UI?)

The majority of the work here is in the golr loader; the only change in this codebase is a trivial addition/replacement in the yaml. This would be a standard 4-tuple (id,label,closure,closure-label), though in fact only the closure parts would be used.

Population of closure

Each field is associated with 1 or more OPs. For example

  • participant: has participant
  • location: [located in, part of, occurs in]

Call this specified set P_f. Call the union of P_f and inferred subproperties P_f*.

For each gene association, we walk the graph

  1. From the assigned GO class
  2. From the extension class, if the extension relation is in P_f*.

Note that in either case, the walking should only follow OPs in P_f*.

The PoorMansReasoning strategy is to use the OGW graph walking code specifying P_f.

A cleaner approach is to replace steps 1 and 2 with the following:

  1. Translate the combo of the class and extension to an OWL anon class, e.g. C and R some Y, as is done for GAF validation and as specified in the extensions paper. Call this C'
  2. Use the materialized expression reasoner to find all reflexive ancestors of C' over every P in P_f.

This is guaranteed complete for EL. The PMR strategy may have edges but may be good enough. @hdietze to investigate.

Note that in all cases we treat the filler in the annotation extension as a class. For completeness we should ensure that we include SubClassOf some SO:gene | PR:protein etc to allow the GO-838 query in a seamless way.

Examples for 'location' facet

  • foo1 annotated to 'bar synthesis' (c5) and part_of some interneuron (c16)

location_closure should contain interneuron, neuron, etc as well as 'nervous system', since the initial relation R is in P_f', and the classes are in the subClass + partOf path (rule 2)

Note also that we expect the same thing if a precomposed term is used

  • foo1 annotated to 'bar synthesis in interneuron'
  • 'bar synthesis in interneuron' = 'bar synthesis' and part_of some interneuron

So long as go-plus is loaded the path will be the same (rule 1)

TBD

For completeness, the (implicit or explicit) annotation relationship must be considered. E.g. for the location facet, and a direct annotation to 'axon' (here the location closure blends into the existing isa-partof one).

cc @dosumis

@cmungall cmungall added this to the 2.3 milestone May 15, 2015
@cmungall
Copy link
Member Author

It should be noted that these fields should provide the obvious anchor points for the somewhat pointless at present amigo pages for CL, CHEBI classes etc

@kltm kltm changed the title Refactor annotation extension facet Refactor annotation extension facet / make more parts of annotation extensions searchable May 15, 2015
@kltm
Copy link
Member

kltm commented May 15, 2015

From my initial comments:

Currently in AmiGO, annotation extensions are stored as 1) a blob (annotation_extension_json) and 2) annotation extension classes (including class, class_label, the closures, searchables, etc.).

However, this means that you are unable to do things like find the target of an activity (e.g. has_input(UniProtKB:O89046) by normal searching methods. Moreover, even power tools like gannet are unable to do much here since the blob is not searchable. (The only way would be CLI tools, like the bbop libs, and parsing the output yourself).

The goal here would be to make a satisfying amount of the annotation extension structure available to the standard AmiGO interface.

This also means that we'll need to add more fields to the load.

@cmungall
Copy link
Member Author

@kltm - yes this will require additional closure fields such as participant as specified above.

@cmungall
Copy link
Member Author

cmungall commented Nov 1, 2015

Note that the specification here can be generalized to include population of the current isa_partof_closure and regulates_closure fields. This provides the natural solution to #267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants