This is an extension on top of STAM that allows for interoperability between W3C Web Annotation and STAM.
The Web Annotation Data Model offers a model for describing annotations in which the annotations and their targets (any resource, not just text) can live distributed over the web. The Web Annotation Vocabulary specifies the exact set of RDF classes and predicates used by the data model. Web Annotations are typically serialized in JSON-LD with context http://www.w3.org/ns/anno.jsonld.
There are some limitations to this model, as not everything that W3C Web Annotations support can be expressed in STAM, and vice versa. This STAM extension describes how to map one to the other.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
W3C Web Annotation builds on RDF. This means that for interoperability to work, all the RDF constraints posed in the core STAM specification MUST be adhered to. This mostly entails that all identifiers MUST either be IRIs or transformable into one, we recap from from section Identifiers in the main specification:
The public identifier of keys MUST be able to be transformed into an IRI as follows:
- It it already an IRI by itself (no transformation necessary).
- It forms an IRI when appended to the public identifier of the annotation data set. If identifier of the annotation data set does not end in
/
or#
, an extra/
SHOULD be inserted as delimiter in the concatenation.
This extension does not specify any extensions to the STAM data model itself - the core model suffices - instead, it defines some data annotation sets and protocols for conversion.
JSON-LD is the preferred serialization format for Web Annotation, using the
context definition http://www.w3.org/ns/anno.jsonld
. This STAM extension
defines how to serialize STAM to valid Web Annotations in JSON-LD. Note that
although STAM JSON looks very similar to JSON-LD, it is not.
For interoperability with Web Annotations, you can reuse a lot of the
vocabulary defined in http://www.w3.org/ns/anno.jsonld
from within STAM. We prescribe that you
SHOULD use a STAM Annotation DataSet with that exact URI as identifier if you
want to use any of the terms of that vocabulary. The keys in the JSON-LD file
then correspond one-to-one to the STAM datakeys in the set. This allows us to use much of the
web annotation vocabulary from STAM without any complicated mappings. We will
refer to this as the w3anno set in this documentation, simply to avoid having
to mention the full URI every time.
Let's illustrate this with an example. Consider the following STAM
Annotation in STAM JSON (partial excerpt) in which we express an annotation
using vocabulary from the Web Annotation Data Model (the key creator
is
defined there).
{
"@type": "Annotation",
"@id": "https://example.org/annotation1",
"data": [
{
"@type": "AnnotationData",
"set": "http://www.w3.org/ns/anno.jsonld"
"key": "creator",
"value": "proycon",
}
],
"target": {
...
}
}
This translates to the following Web Annotation (JSON-LD):
{
"@context": "http://www.w3.org/ns/anno.jsonld",
"id": "https://example.org/annotation1",
"creator": "proycon",
"target": {
...
}
}
You're not limited to only the w3anno set, it is merely a convenience if you want to directly reuse terms defined in their JSON-LD context definition, and their use is RECOMMENDED if you know in advance you want to export to web annotations. Aside from these, you MAY use any RDF predicate/object as a key/value as we established mapping rules for public identifiers in STAM and IRIs.
Web Annotations have the notion of body which corresponds more or less to what is
called data
in STAM JSON, i.e. a collection of STAM AnnotationData elements. If
there is multiple data associated with an annotation, they SHOULD translate to a
single body element. A type of DataSet
SHOULD be associated with the WebAnnotation body.
(If conversely, translating Web Annotations to STAM, multiple bodies SHOULD be translated to multiple STAM annotations).
In the first examples above we showed that creator
was mapped directly onto
the annotation (rather than the body); implementations of this extension
SHOULD convert annotation data in the w3anno set
(http://www.w3.org/ns/anno.jsonld
) to properties directly on the web
annotation if and only if these properties are commonly expressed on the web
annotation. Any other properties SHOULD go into the body
scope. As this can
not be unambiguously determined, implementations have a fair degree of freedom
in choosing which properties they assign to the annotation as a whole, and which to a
body (e.g. by parametrising this).
Consider the next STAM JSON excerpt where we do not use the w3anno set but other sets that map to RDF via the mapping rules:
{
"@type": "Annotation",
"@id": "https://example.org/annotation1",
"data": [
{
"@id": "D1",
"@type": "AnnotationData",
"set": "http://example.org/my-set"
"key": "valuation",
"value": "I like this part!",
},
{
"@id": "D2",
"@type": "AnnotationData",
"set": "http://schema.org",
"key": "contentRating",
"value": 5,
}
],
"target": {
...
}
}
This translates to the following Web Annotation (JSON-LD):
{
"@context": "http://www.w3.org/ns/anno.jsonld",
"id": "https://example.org/annotation1",
"body": {
"type": "DataSet",
"https://example.org/my-set/valuation": "I like this part!",
"https://schema.org/contentRating": 5,
},
"target": {
...
}
}
This example shows a few other things
- The body has
type: DataSet
, which is RECOMMENDED. - The JSON-LD keys will be full IRIs as they are not part of the context. Implementations MAY generate additional context and use aliases.
- The STAM dataset's public identifier and key public identifier have been joined to form a valid RDF URI,
this SHOULD be done by concatenating them, adding a
/
in the middle if and only if the set identifier does not already end in/
or#
. - Other ontologies can be used as if they are a STAM Annotation Dataset (such as schema.org in this example). The important criterion is only that the identifiers match, taking the above concatenation rule in consideration.
- The public identifiers of
AnnotationData
can not be preserved, they are lost in this conversion.
Next, we show an example where the data value (STAM DataValue
) maps to RDF, first we recap from the main specification:
- Any String value that is a valid IRI SHOULD be interpreted as such in conversion from/to RDF.
STAM JSON excerpt:
{
"@type": "Annotation",
"@id": "https://example.org/annotation1",
"data": [
{
"@id": "D1",
"@type": "AnnotationData",
"set": "http://www.w3.org/ns/anno.jsonld"
"key": "creator",
"value": "https://orcid.org/0000-0002-1046-0006",
},
],
"target": {
...
}
}
JSON-LD output:
{
"@context": "http://www.w3.org/ns/anno.jsonld",
"id": "https://example.org/annotation1",
"creator": { "id": "https://orcid.org/0000-0002-1046-0006" },
"target": {
...
}
}
Compare this to the first example and notice the extra id
in the JSON-LD output.
Not shown in the excerpts thus-far is the target, which serves the same purpose in both STAM and Web Annotations. Let's take a look at a more complete STAM JSON example now that includes the target:
{
"@type": "Annotation",
"@id": "https://example.org/annotation1",
"data": [
{
"@type": "AnnotationData",
"set": "http://www.w3.org/ns/anno.jsonld"
"key": "creator",
"value": "proycon",
},
{
"@type": "AnnotationData",
"set": "http://schema.org",
"key": "contentRating",
"value": 5,
}
],
"target": {
"@type": "TextSelector",
"resource": "http://example.org/hello.txt"
"offsets": {
"begin": {
"@type": "BeginAlignedCursor",
"value": 0,
},
"end": {
"@type": "BeginAlignedCursor",
"value": 5,
},
}
}
}
This translates to the following Web Annotation (JSON-LD):
{
"@context": "http://www.w3.org/ns/anno.jsonld",
"id": "https://example.org/annotation1",
"creator": "proycon",
"body": {
"type": "DataSet",
"http://schema.org/contentRating": 5,
}
"target": {
"source": "https://example.org/hello.txt",
"selector": {
"type": "TextPositionSelector",
"start": 0,
"end": 5
}
}
}
The underlying coordinate system for STAM's TextSelector
and w3anno's
TextPositionSelector
are identical (unicode points, 0-indexed, end
non-inclusive). STAM's TextSelector
however, offers more expressive power,
any stam:EndAlignedCursor
MUST first be resolved to its absolute
(begin-aligned) position.
STAM allows formulating annotations relative to the annotations that contain
them; the AnnotationSelector
supports selecting text offsets relative to the
target annotation. This can for example be used for expressing words relative
to sentences. This we call higher-order annotation with relative offsets.
An example of this can be found here. This has two annotations in STAM JSON, the first expresses a sentence in a text, and the second expresses a word in that sentence formulated relative to it.
Converting this to Web Annotations, however, poses a problem. If a
stam:AnnotationSelector
expresses such offsets, implementations MUST choose
one of two options for conversion:
- Resolve the relative annotations to absolute ones and make the webannotation target the resource directly, rather than the other annotation. This way the resulting solution is simpler, at the cost of losing the relative annotations.
- Translate this to the Web Annotation model in the most direct way possible
by having one webannotation target another, and including a
TextPositionSelector
as-if it was a text. This may be a stretching the web annotation model a bit, as such higher-order annotations are not specified by its specification, but it does not violate the specification either. This way the relative offset are maintained at the cost of extra complexity.
If an implementation can't determine which choice is most appropriate, it MAY let the user decide, e.g. via a parameter.
In STAM we have the notion of complex selectors, a selector which selects multiple targets.
This is either a MultiSelector
or DirectionalSelector
, CompositeSelector
.
The Web Annotation Data Model does not describe a clear unequivocal mechanism for referencing multiple targets. The underlying Web Annotation Vocabulary, however, does propose some solutions in section D. These, however, are not normative for the Web Annotation standard. They do make for the best and easiest translation when mapping from/to STAM. We therefore consider them RECOMMENDED:
- oa:Composite can be used for STAM's
CompositeSelector
. - oa:Independents can be used for STAM's
MultiSelector
. - oa:List can be used for STAM's
DirectionalSelector
.
Consider this STAM JSON excerpt in which the annotations targets a discontinuous text (two fragments) as one:
"target": {
"@type": "CompositeSelector",
"selectors": [
{
"@type": "TextSelector",
"resource": "http://example.org/hello.txt"
"offsets": {
"begin": {
"@type": "BeginAlignedCursor",
"value": 0,
},
"end": {
"@type": "BeginAlignedCursor",
"value": 5,
},
}
},
{
"@type": "TextSelector",
"resource": "http://example.org/hello.txt"
"offsets": {
"begin": {
"@type": "BeginAlignedCursor",
"value": 10,
},
"end": {
"@type": "BeginAlignedCursor",
"value": 15,
},
}
},
]
}
This would translate to the following JSON-LD except for Web Annotations. As the mappings we just introduced are not part of the w3anno JSON-LD context we've been using, we will write their IRIs in full in the following example:
{
"@context": "http://www.w3.org/ns/anno.jsonld",
...
"target": {
"type": "http://www.w3.org/ns/oa#Composite",
"items": [
{
"source": "https://example.org/hello.txt",
"selector": {
"type": "TextPositionSelector",
"start": 0,
"end": 5
}
},
{
"source": "https://example.org/hello.txt",
"selector": {
"type": "TextPositionSelector",
"start": 10,
"end": 15
}
}
]
}
}
Do consider that due to this being non-normative in the Web Annotation specification, it may not be implemented widely.
A MultiSelector
MAY also be mapped simply to multiple target
elements in
the WebAnnotation output. Recall that this means that the annotation applies to
all of targets equally, individually and independently.
- A
stam:AnnotationDataSet
itself can not be mapped to Web Annotations (out of scope), this also goes for any annotations usingstam:DataSetSelector
. - Public identifiers for
AnnotationData
are lost. This also implies that annotations usingstam:AnnotationDataSelector
can not be converted. The same also applied tostam:DataKeySelector
. All these, however, are more interesting for STAM's internal model and out of scope for Web Annotations.
Mapping Web Annotation to STAM generally follows the inverse of what we have already seen. However, Web Annotation and RDF upon which it builds forms a broader model than STAM typically provides, so not all that can be expressed there can be unambiguously expressed in STAM terms.
- As Web Annotations does not have the concept of annotation data set, implementations SHOULD allow users to associate some annotation data sets prior to the conversion. Any keys then found in the web annotation will be associated with those sets. Any keys that are not found MUST either be simply mapped to a single annotation data set, or to multiple annotation data sets that are created on the fly based on for instance a shared RDF namespace.
- Annotations on text in STAM are mediated by a
TextSelector
. This always translates to aTextPositionSelector
in the Web Annotation model (and vice versa). The web annotation model supports a wide variety of selectors for different media types which STAM does not, it only references plain text. This extension therefore only supports w3annoTextPositionSelector
. Implementations parsing web annotations for STAM, when encountering any other selectors, MUST produce either an error or convert the selector to an STAMTextSelector
if deemed possible.
- Implementations MUST implement proper JSON-LD parsing for Web Annotations
- Implementations SHOULD use a proper RDF triple store as a foundation