-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ontotrace nexml validation #8
Comments
I think the issue is the polymorphism. It probably doesn't like that the state symbol for the polymorphic condition is |
I’ll try to figure out if I can find where this is getting checked in the Perl. |
@rvosa does the Bio::Phylo NeXML validator handle polymorphic states? It seems to be complaining about a string character symbol. |
@uyedaj if you replace the polymorphism symbols with a number, e.g. "10", it validates. I believe this is an error in the validator. For now you can replace
with
Will this work? Additionally you will want to make sure RNeXML is handling polymorphisms, if you are using that. |
As mentioned at last meeting, based on ropensci/RNeXML#171 I think right now polymorphic characters whose symbols aren't numeric are all going to run into the same issue. What would be good to have while looking at possible solutions is the format that polymorphic character states normally take in matrices. Is it an extra numeric symbol (e.g., |
Most methods can't really handle polymorphic characters... DNA sequence data has dedicated characters for ambiguous bases, while I think the standard for binary characters is generally just to put a '?'. I think a lot of the nexus parsers will have issues if it's anything longer than a single character, so for now it seems like an extra numeric symbol would be a good option. |
@hlapp do you think I should update OntoTrace to choose a random unused digit (for presence/absence, of course this can be The "1 and 0" symbol works really well for display in Phenex, which is one reason I'm reluctant to change. NEXUS has its own scheme to represent polymorphism, and I would expect libraries that read/write both to interconvert between them. |
Based on @uyedaj's response I was indeed going to suggest that. However, I am also going to argue that making this change is in fact the right thing to do, and not just a kludge to make this work. Technically, as I verified earlier against the NeXML schema, the type of the One of our goals for the API is to facilitate more widespread adoption of our data and capabilities. One aspect of this should be to return data in lossless formats that can be as straightforwardly as possible converted into the input formats needed by existing tool ecosystems. In most cases, the latter means NEXUS. Obviously, NEXUS can't encode well all the information that constitutes our data value propositions, so isn't lossless and hence we don't output NEXUS but instead do NeXML or JSON-LD. But one of the beauties of the NeXML format is that changing a So instead of asking every single consumer implementation to have to perform this substitution, we can make it for them, without loss of information in the data that we return.
But to quote from your own response:
Arguably, it is Phenex that doesn't treat NeXML data as it is designed. Namely, it's Phenex that gives semantic significance to the value of the symbol of polymorphic state set (namely, expecting it to show the states in the set), rather than, if and where needed, extracting this information from the data as NeXML was designed to support it. |
I have to dig in my memory as to why this was the way it was but I think
the general idea was that we wanted "standard" symbols to be single
characters. Polymorphic states would need to use a single character symbol
not yet used elsewhere, and to give us enough headroom for this we allowed
both strings and integers. That's why "1 and 2" validates, though it goes
against the original idea. Does that make sense?
…On Wed, Oct 3, 2018 at 11:41 PM Hilmar Lapp ***@***.***> wrote:
@hlapp <https://github.com/hlapp> do you think I should update OntoTrace
to choose a random unused digit (for presence/absence, of course this can
be 2) to just make this work?
Based on @uyedaj <https://github.com/uyedaj>'s response I was indeed
going to suggest that. However, I am also going to argue that making this
change is in fact the *right thing* to do, and not just a kludge to make
this work.
Technically, as I verified earlier against the NeXML schema, the type of
the symbol attribute for polymorphic_state_set is indeed xs:anySimpleType,
so strings are allowed. That the NeXML validator complains about this is
thus a problem with the validator, not the data file at hand. However, for
StandardState and StandardStateSet the type of the symbol attribute is
*integer*. I would argue that allowing this much room for confusion in
implementations is a mistake in the NeXML schema itself. Furthermore, as
per @uyedaj <https://github.com/uyedaj>'s answer, it's also rather
inconsistent with common practice in comparative method tools (which I
would say runs counter to the original design goals of NeXML, as reflected
in the prefix of its name, another reason why arguably this should be
regarded as a mistake in the schema).
One of our goals for the API is to facilitate more widespread adoption of
our data and capabilities. One aspect of this should be to return data in
lossless formats that can be as straightforwardly as possible converted
into the input formats needed by existing tool ecosystems. In most cases,
the latter means NEXUS. Obviously, NEXUS can't encode well all the
information that constitutes our data value propositions, so isn't lossless
and hence we don't output NEXUS but instead do NeXML or JSON-LD. But one of
the beauties of the NeXML format is that changing a PolymorphicStateSet's
symbol from "1 and 0" (a multi-character string) to "2" (a single digit
integer) incurs no loss of information, because the states in the set are
explicitly defined and not meant to be only (or even at all) parseable from
the symbol.
So instead of asking every single consumer implementation to have to
perform this substitution, we can make it for them, without loss of
information in the data that we return.
The "1 and 0" symbol works really well for display in Phenex, which is one
reason I'm reluctant to change.
But to quote from your own response:
As you might expect I would much prefer for tools that read NeXML to just
handle it as designed. :-)
Arguably, it is *Phenex* that doesn't treat NeXML data as it is designed.
Namely, it's *Phenex* that gives semantic significance to the value of
the symbol of polymorphic state set (namely, expecting it to show the
states in the set), rather than, if and where needed, extracting this
information from the data as NeXML was designed to support it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAGf-rJt7h_FKS0coWSdFnn6DuJZhjUZks5uhS74gaJpZM4W_lpF>
.
|
Hey, now I think this is getting a bit twisted. 😆 I shouldn't be dinged for going the extra mile and generating a friendly label while staying within the spec. I still think it's quite bad for readers of NeXML not to recognize polymorphic states. If that's not represented in their model they should do something like treat it as unknown instead of charging ahead. In NEXUS there is no additional state symbol generated anyway; a polymorphism would be represented as If we return polymorphisms as a third state symbol |
I'm not dinging anyone here 😄 However, I think it's important to realize that the
You mean human readers of a NeXML file, machine readers of a NeXML file, or users of a tool that lets them inspect the data in a NeXML file? I think quite firmly that human readers of NeXML files is not, or at least not an important use case. XML is not for human readability, and pretending otherwise just leads down bad paths. A machine reader of a NeXML file will need a single-character symbol. For a user of a tool for inspecting a NeXML file, the tool does have the information needed to construct a user-friendly label for a polymorphic state (do you disagree with that?).
I think that's what @uyedaj was saying is what is typically done, right? I.e., most comparative tools can't distinguish (in the sense of computing differently) between polymorphic and uncertain states.
That's a property of the NeXML format, and I agree that tools converting NeXML to NEXUS should probably write out a polymorphic state that way. But what to do depends on the target of the conversion.
Possibly, so maybe it should be a question mark instead, as recommended by @uyedaj and as is the common symbol used for this? |
Update—@hlapp has updated RNeXML to handle polymorphic states. I will update Phenoscape NeXML generation to use NEXUS-style polymorphism symbols (e.g. |
@balhoff When I run the nexml files that Wasila sent me (https://github.com/phenoscape/scate/tree/master/data)
through ontotrace through the validator (http://www.nexml.org/phylows/validator) I get the following error (here Malabarba-1998_Ontotrace.xml). Looks like the issue is the polymorphism again?
The text was updated successfully, but these errors were encountered: