You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue covers the quite large topic of how to make TreeLDR modular.
What is modularity in this context
The input of TreeLDR is a set of RDF datasets that are processed into an inner data model, then given to various (code) generators. For now, the generators are all embedded into the compiler. This is a bad design that I expect will cause many problems in the long run:
As the number of generator grows, the source code will become hard to maintain. The number of dependencies will grow as the time needed to refactor parts of the core compiler.
As the size of the input dataset grows, the time spend compiling will grow. This may become a problem with datasets such as schema.org that are too large to be processed every time a generator is called.
Modularity means:
Spliting the core compiler and generators into independent programs.
Pre-processing datasets into a reusable data format.
Pre-processing
The primary task of the TreeLDR compiler is to take the input dataset triples, infer new triples according to the semantics of RDF/RDFS/OWL/TreeLDR and store the resulting triples into a final data structure for easy access by the generators. The idea here would be to create an intermediate file format to store the triples, including inferred triples, for later accesses without having to call the compiler again. This is very similar to the way traditional compilers will create an intermediate object file *.o for each compiled file before merging them into the final executable.
Merging structurally equivalent blank nodes is something TreeLDR does all the time to since most of the time they refer to the same resource and it can greatly reduce the complexity of the final model. However now consider the following graph:
Imagine we want to update our previous interpretation to include this now knowledge. Because :prop is declared as a owl:FunctionalProperty then according to the OWL semantics I(_:0) = I(_:2) and I(_:1) = I(_:3). But because of the _:2 owl:differentFrom _:3 statement we also have I(_:2) != I(_:3) which also means I(_:0) != I(_:1). But we already decided in our previous interpretation that I(_:0) = I(_:1) = blank. We cannot go back. All information is lost and we wouldn't know which of I(_:0) or I(_:1) is such that <baz, I(_:?)> in EXT(prop).
This shows that we cannot update or compose interpretations this way.
Solution to the Composition Problem
So we cannot decide on a single interpretation from just a subset of the processed datasets. One solution is to build two interpretations:
Maximal Interpretation: this is a conservative interpretation where two names (IRIs, Blank node ids, literals) are never interpreted the same unless explicitly stated in the graph (with owl:sameAs for instance).
Optimal Interpretation: this is the non conservative interpretation where two names are merged at liberty unless explicitly stated otherwise (with owl:differentFrom for instance).
We can easily compose interpretation pairs:
Since the maximal interpretation is conservative if two resources are interpreted the same it means they must be interpreted the same. If they are interpreted the same in one maximal interpretation and not the other, then the resources must be merged.
Since the optimal interpretation is so liberal if two resource are not interpreted the same, it means they must not be interpreted the same. If they are interpreted differently in one optimal interpretation and not the other, then the interpretation must be refined. Fortunately, merged resources can be separated by looking at the maximal interpretation.
The text was updated successfully, but these errors were encountered:
This issue covers the quite large topic of how to make TreeLDR modular.
What is modularity in this context
The input of TreeLDR is a set of RDF datasets that are processed into an inner data model, then given to various (code) generators. For now, the generators are all embedded into the compiler. This is a bad design that I expect will cause many problems in the long run:
schema.org
that are too large to be processed every time a generator is called.Modularity means:
Pre-processing
The primary task of the TreeLDR compiler is to take the input dataset triples, infer new triples according to the semantics of RDF/RDFS/OWL/TreeLDR and store the resulting triples into a final data structure for easy access by the generators. The idea here would be to create an intermediate file format to store the triples, including inferred triples, for later accesses without having to call the compiler again. This is very similar to the way traditional compilers will create an intermediate object file
*.o
for each compiled file before merging them into the final executable.The intermediate file describes a Model Theoretic Interpretation of the processed dataset.
Composition Problem
The main challenge is to make sure the resulting interpretations are composable with each other. For instance consider the following schema:
A valid interpretation of this graph can merge the blank nodes like so:
Merging structurally equivalent blank nodes is something TreeLDR does all the time to since most of the time they refer to the same resource and it can greatly reduce the complexity of the final model. However now consider the following graph:
Imagine we want to update our previous interpretation to include this now knowledge. Because
:prop
is declared as aowl:FunctionalProperty
then according to the OWL semanticsI(_:0) = I(_:2)
andI(_:1) = I(_:3)
. But because of the_:2 owl:differentFrom _:3
statement we also haveI(_:2) != I(_:3)
which also meansI(_:0) != I(_:1)
. But we already decided in our previous interpretation thatI(_:0) = I(_:1) = blank
. We cannot go back. All information is lost and we wouldn't know which ofI(_:0)
orI(_:1)
is such that<baz, I(_:?)> in EXT(prop)
.This shows that we cannot update or compose interpretations this way.
Solution to the Composition Problem
So we cannot decide on a single interpretation from just a subset of the processed datasets. One solution is to build two interpretations:
owl:sameAs
for instance).owl:differentFrom
for instance).We can easily compose interpretation pairs:
The text was updated successfully, but these errors were encountered: