Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix schema.org domain in 'aggregates' section #3

Open
ThomasThelen opened this issue Feb 4, 2021 · 1 comment
Open

Fix schema.org domain in 'aggregates' section #3

ThomasThelen opened this issue Feb 4, 2021 · 1 comment

Comments

@ThomasThelen
Copy link
Member

ThomasThelen commented Feb 4, 2021

We have a couple of unbound terms (not in a vocab) and schema.org terms with incorrect domains being used in physical files that are aggregated. For example,

    "aggregates": [
        {
            "md5": "81faaedac351f28092bd845a48c6d0a5",
            "size": 170,
            "schema:license": "CC-BY-4.0",
            "mimeType": "text/plain",
            "uri": "../data/LICENSE"
        }
    ],

This can be read as an RDF triple as... wt:Tale ore:Aggregates ore:AggregatedResource

Solution

From the ORE documentation,

Note that asserting that a resource is a member of the class of Aggregated Resources does not imply anything other than that it is aggregated by at least one Aggregation. As such, this class is mostly informative and there is no need to assert that aggregated resources are instances of the ore:AggregatedResource class.

In summary it means that calling this object an ore:AggregatedResource doesn't really give us anything particularly useful, other than letting us know that it's aggregated.

Calling downloaded files a plain CreativeWork is probably a stretch. schema,org also has the DataDownload type which sounds promising but represents an entire dataset.

One clean solution is to create our own type that's a sublcass of a schema:CreativeWork and ore:AggregatedResource. This allows us to mostly use this object how it's currently used.

An example of what this looks like as an OWL class,

<https://vocabularies.wholetale.org/wt/1.0/wt#physicalFile>
  a owl:Class ;
  rdfs:subClassOf <https://schema.org/CreativeWork>, <http://www.openarchives.org/ore/1.0/vocabulary#aggr_res>;
  rdfs:comment "A class that represents a file that physically exists on disk."@en ;
  rdfs:label "Physical File"@en .

Alternative

The alternative is to use something from https://id.loc.gov/ontologies/premis-3-0-0.html

It has support for the notion of a file, license, and cryptographic signature, but schema,org isn't compatible with it. It's also arguably not light weight. We still run into the issue of needing to subclass ore:AggregatedResource.

Dealing with 'md5'

If we turn the object into a schema,org class, we can use the suggestion here
This results in replacing md5 with "schema:identifier": "ni:///md5;81faaedac351f28092bd845a48c6d0a5"

Dealing with 'size'

CreativeWork objects have a property for size, size.

"schema:size": {
   "@type": "schema:QuantitativeValue"
   "schema:value": 170
}

or as a string "schema:size": "170"

Dealing with 'schema:license'

schema:license is a valid property for CreativeWork classes, but is expected to be a CreativeWork or URL

Dealing with 'mimeType'

This value should change to schema:encodingFormat

Example

    "aggregates": [
        {
        "@type": "wt:PhysicalFile",
            "schema:identifier": "ni:///md5;81faaedac351f28092bd845a48c6d0a5",
            "schema:size": {
               "@type": "schema:QuantitativeValue"
               "schema:value": 170
            }
            "schema:license": "CC-BY-4.0",
            "schema:encodingFormat": "text/plain",
            "uri": "../data/LICENSE"
        }
    ],
    ```
### Misc

It might make sense to leverage the idea of os:File [here](https://github.com/CIRSS/cpr/blob/master/model/wt.pdf). 
@ThomasThelen ThomasThelen changed the title Fix unbound terms in file aggregations Fix schema.org domain in 'aggregates' section Feb 9, 2021
@craig-willis
Copy link
Collaborator

As discussed, I would just use our own vocabulary (wt) for the unbound terms and define them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants