Skip to content

Commit

Permalink
Update docs, include HealthDCAT-AP
Browse files Browse the repository at this point in the history
  • Loading branch information
amercader committed Jan 9, 2025
1 parent 73b74b8 commit d334bb3
Show file tree
Hide file tree
Showing 10 changed files with 286 additions and 268 deletions.
80 changes: 80 additions & 0 deletions docs/application-profiles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Application profiles

Besides the [base profiles](profiles.md) there are other profiles included to support other domain
specific application profiles.

!!! Note
If you are interested in contributing a profile that might be useful for the wider community
check the documentation on [writing custom profiles](writing-profiles.md) and the
[contribution guidelines](contributing.md#including-new-profiles).

## HealthDCAT-AP

### Introduction

This extension contains a profile (`euro_health_dcat_ap`) for the proposed
[HealthDCAT-AP](https://healthdcat-ap.github.io/) specification.
This is a health-related extension of the DCAT application profile for sharing information about
Catalogues containing Datasets and Data Services descriptions in Europe (DCAT-AP).

The development of a Health DCAT application profile aims to standardize health metadata within
the scope of the [European Health Data Space](https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space_en)
(EHDS), fostering greater interoperability, findability and accessibility of electronic health
data across the EU.

The goal of this profile is to provide the wider FAIR community and other EU portals with a starting
point for implementing HealthDCAT-AP within their own data catalogs.

!!! Note
HealthDCAT-AP is still under active development and not finalized yet. Cardinalities,
certain vocabularies and the namespace have not been officially ratified yet. These are
expected to be finalized after the public consultation in Q1 2025.


### Usage

Use the included `euro_health_dcat_ap` profile in your configuration:

```ini
ckanext.dcat.rdf.profiles = euro_health_dcat_ap
```

The HealthDCAT-AP profile is an extension of the DCAT-AP v3 profile and requires ckanext-scheming.
See the [documentation](getting-started.md#schemas) on how to set it up. You can use the included
`health_dcat_ap.yaml` schema file as a starting point to adapt it to your needs:

```ini
scheming.dataset_schemas = ckanext.dcat.schemas:health_dcat_ap.yaml
```

This profile has currently no additional settings.

### Limitations and deviations

As HealthDCAT-AP is still a draft, it is bound to change. There are currently still some
inconsistencies in the standard and unclarities regarding certain properties. Below is a short summary
of limitations and implementaiton decisions made during development of this profile.

1. Cardinalities have not yet been finalized for HealthDCAT-AP. This CKAN schema has taken a very
liberal approach and takes all values as strictly optional (no failed validation for missing
fields). Note that some mandatory fields are currently impossible to fill with real data e.g. the
Health Data Access Body (HDAB) field: the EHDS legislation has not been implemented yet and no HDABs
have been formally appointed.
2. The HealthDCAT-AP namespace is not formally defined yet. For now,
`http://healthdataportal.eu/ns/health#` is used. This will be updated once the final namespace is
standardized.
3. The official examples of the standard uses the `dct:description` property to encode the data
purpose. This does not seem to be according to the Data Privacy Vocabulary specification, which
proposes a controlled vocabulary. See [this issue](https://github.com/HealthDCAT-AP-de/healthdcat-ap.de/issues/11)
for the German perspective on this.
4. The distributions proposed by HealthDCAT-AP, *analytics* and *sample*, are not specifically
implemented. URIs are linked, the resources themselves are not loaded. For *sample*, as this is
an upstream DCAT-AP property, this can be included once picked up there.
5. Documentation (*foaf:page*) is implemented as an URI. There is some HealthDCAT-AP example data
out in the wild that uses a blank node for this and adds several properties, however this is
inconsistent with other DCAT implementations.
6. DatasetSeries are not supported yet by CKAN, and also not by this profile.
7. The *quality annotation* property has not been implemented due to usage not being completely
defined yet.
8. There is no multilingual support yet.
9. For other properties, any limitations from the DCAT-AP profiles still apply.
2 changes: 1 addition & 1 deletion docs/endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ RDF representations will be advertised using `<link rel="alternate">` tags on th
</head>
```

Check the [RDF DCAT Serializer](profiles.md#rdf-dcat-serializer) section for more details about how these are generated and how to customize the output using [profiles](profiles.md#profiles).
Check the [RDF DCAT Serializer](writing-profiles.md#rdf-dcat-serializer) section for more details about how these are generated and how to customize the output using [profiles](profiles.md#profiles).


You can specify the profile by using the `profiles=<profile1>,<profile2>` query parameter on the dataset endpoint (as a comma-separated list):
Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ The extension includes ready to use [ckanext-scheming](https://github.com/ckan/c
that enable DCAT support. These include a schema definition file (located
in [`ckanext/dcat/schemas`](https://github.com/ckan/ckanext-dcat/tree/master/ckanext/dcat/schemas))
plus extra validators and other custom logic that integrates the metadata modifications with the
RDF DCAT [Parsers](profiles.md#rdf-dcat-parser) and [Serializers](profiles.md#rdf-dcat-serializer) and other CKAN features and extensions.
RDF DCAT [Parsers](writing-profiles.md#rdf-dcat-parser) and [Serializers](writing-profiles.md#rdf-dcat-serializer) and other CKAN features and extensions.

There are the following schemas currently included with the extension:

* *dcat_ap_recommended.yaml*: Includes the recommended properties for `dcat:Dataset` and `dcat:Distribution` according to the DCAT AP specification. You can use this schema with the `euro_dcat_ap_2` (+ `euro_dcat_ap_scheming`) and `euro_dcat_ap_3` profiles.
* *dcat_ap_full.yaml*: Includes most of the properties defined for `dcat:Dataset` and `dcat:Distribution` in the [DCAT AP v2.1](https://semiceu.github.io/DCAT-AP/releases/2.1.1/) and [DCAT AP v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) specification. You can use this schema with the `euro_dcat_ap_2` (+ `euro_dcat_ap_scheming`) and `euro_dcat_ap_3` profiles.
* *dcat_ap_multilingual.yaml*: An example schema implementing multilingual metadata in some fields using [ckanext-fluent](https://github.com/ckan/ckanext-fluent). See [Multilingual support](profiles.md#multilingual-support) for more information.

* *dcat_us_full.yaml*: Includes most of the properties defined for `dcat:Dataset` and `dcat:Distribution` in the [DCAT US v3](https://doi-do.github.io/dcat-us/) specification. You can use this schema with the `dcat_us_3` profile.

Most sites will want to use these as a base to create their own custom schema to address their own requirements, perhaps alongside a [custom profile](profiles.md#profiles). Of course site maintainers can add or remove schema fields, as well as change the existing validators.

Expand Down
2 changes: 1 addition & 1 deletion docs/harvester.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## RDF DCAT harvester

The [RDF parser](profiles.md#rdf-dcat-parser) described in the previous section has been integrated into a harvester,
The [RDF parser](writing-profiles.md#rdf-dcat-parser) described in the previous section has been integrated into a harvester,
to allow automatic import of datasets from remote sources. To enable the RDF harvester, add the `harvest` and `dcat_rdf_harvester` plugins to your CKAN configuration file (you will also need to install [ckanext-harvest](https://github.com/ckan/ckanext-harvest)):

ckan.plugins = ... harvest dcat_rdf_harvester
Expand Down
61 changes: 0 additions & 61 deletions docs/healthdcat.md

This file was deleted.

4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,6 @@ These are implemented internally using:

* A base [mapping](mapping.md) between DCAT and CKAN datasets and viceversa (compatible with **DCAT-AP** [v1.1](https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-ap-v11), [v2.1](https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe/release/210) and [v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) and **DCAT-US** [v3](https://doi-do.github.io/dcat-us/)).

* An [RDF Parser](profiles.md#rdf-dcat-parser) that allows to read RDF serializations in different formats and extract CKAN dataset dicts, using customizable [profiles](profiles.md#profiles).
* An [RDF Parser](writing-profiles.md#rdf-dcat-parser) that allows to read RDF serializations in different formats and extract CKAN dataset dicts, using customizable [profiles](profiles.md#profiles).

* An [RDF Serializer](profiles.md#rdf-dcat-serializer) that allows to transform CKAN datasets metadata to different semantic formats, also allowing customizable [profiles](profiles.md#profiles).
* An [RDF Serializer](writing-profiles.md#rdf-dcat-serializer) that allows to transform CKAN datasets metadata to different semantic formats, also allowing customizable [profiles](profiles.md#profiles).
4 changes: 2 additions & 2 deletions docs/mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The following table provides a generic mapping between the fields of the `dcat:D
their equivalents in the CKAN model. In most cases this mapping is deliberately a loose one. For instance, it does not try to link
the DCAT publisher property with a CKAN dataset author, maintainer or organization, as the link between them is not straight-forward
and may depend on a particular instance needs. When mapping from CKAN metadata to DCAT though, there are in some cases fallback fields
that are used if the default field is not present (see [RDF Serializer](profiles.md#rdf-dcat-serializer) for more details on this).
that are used if the default field is not present (see [RDF Serializer](writing-profiles.md#rdf-dcat-serializer) for more details on this).

This mapping is compatible with **DCAT-AP** [v1.1](https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-ap-v11), [v2.1](https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe/release/210) and [v3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) and **DCAT-US** [v3](https://doi-do.github.io/dcat-us/). It depends on the active [profile(s)](profiles.md#profiles) and the fields present in your custom [schema](getting-started.md#schemas) which DCAT properties are mapped.

Expand Down Expand Up @@ -276,7 +276,7 @@ If no `publisher` or `publisher_*` fields are found, the serializers will fall b
### Spatial coverage


The following formats for `dct:spatial` are supported by the default [parser](profiles.md#rdf-dcat-parser). Note that the default [serializer](profiles.md#rdf-dcat-serializer) will return the single `dct:spatial` instance form by default.
The following formats for `dct:spatial` are supported by the default [parser](writing-profiles.md#rdf-dcat-parser). Note that the default [serializer](writing-profiles.md#rdf-dcat-serializer) will return the single `dct:spatial` instance form by default.

- One `dct:spatial` instance, URI only

Expand Down
Loading

0 comments on commit d334bb3

Please sign in to comment.