-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor annotation extension facet / make more parts of annotation extensions searchable #201
Comments
It should be noted that these fields should provide the obvious anchor points for the somewhat pointless at present amigo pages for CL, CHEBI classes etc |
From my initial comments: Currently in AmiGO, annotation extensions are stored as 1) a blob (annotation_extension_json) and 2) annotation extension classes (including class, class_label, the closures, searchables, etc.). However, this means that you are unable to do things like find the target of an activity (e.g. has_input(UniProtKB:O89046) by normal searching methods. Moreover, even power tools like gannet are unable to do much here since the blob is not searchable. (The only way would be CLI tools, like the bbop libs, and parsing the output yourself). The goal here would be to make a satisfying amount of the annotation extension structure available to the standard AmiGO interface. This also means that we'll need to add more fields to the load. |
@kltm - yes this will require additional closure fields such as |
Note that the specification here can be generalized to include population of the current |
Currently we expose a facet 'annotation extensions', which is useful for curators but does not directly relate to a meaningful biological question. Also, it assumes classes (see http://jira.geneontology.org/browse/GO-838)
We will refactor this to expose facets that are biological categories. We will start with 'participant' and 'location' (TBD: how should the hierarchy of facets be handled in the UI?)
The majority of the work here is in the golr loader; the only change in this codebase is a trivial addition/replacement in the yaml. This would be a standard 4-tuple (id,label,closure,closure-label), though in fact only the closure parts would be used.
Population of closure
Each field is associated with 1 or more OPs. For example
has participant
located in
,part of
,occurs in
]Call this specified set
P_f
. Call the union ofP_f
and inferred subpropertiesP_f*
.For each gene association, we walk the graph
P_f*
.Note that in either case, the walking should only follow OPs in
P_f*
.The PoorMansReasoning strategy is to use the OGW graph walking code specifying
P_f
.A cleaner approach is to replace steps 1 and 2 with the following:
C and R some Y
, as is done for GAF validation and as specified in the extensions paper. Call thisC'
C'
over every P inP_f
.This is guaranteed complete for EL. The PMR strategy may have edges but may be good enough. @hdietze to investigate.
Note that in all cases we treat the filler in the annotation extension as a class. For completeness we should ensure that we include SubClassOf some SO:gene | PR:protein etc to allow the GO-838 query in a seamless way.
Examples for 'location' facet
location_closure should contain interneuron, neuron, etc as well as 'nervous system', since the initial relation
R
is inP_f
', and the classes are in the subClass + partOf path (rule 2)Note also that we expect the same thing if a precomposed term is used
So long as go-plus is loaded the path will be the same (rule 1)
TBD
For completeness, the (implicit or explicit) annotation relationship must be considered. E.g. for the location facet, and a direct annotation to 'axon' (here the location closure blends into the existing isa-partof one).
cc @dosumis
The text was updated successfully, but these errors were encountered: