Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

Commit

Permalink
Descriptor definition; clarify JSON/YAML (#50)
Browse files Browse the repository at this point in the history
* Added descriptor definition

* Added custom properties

* Updated the specs

* Update content/docs/specifications/data-package.md

Co-authored-by: Peter Desmet <[email protected]>

---------

Co-authored-by: Peter Desmet <[email protected]>
  • Loading branch information
roll and peterdesmet authored Apr 10, 2024
1 parent 91e89a3 commit b35c11b
Show file tree
Hide file tree
Showing 5 changed files with 56 additions and 67 deletions.
28 changes: 9 additions & 19 deletions content/docs/specifications/data-package.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,30 +75,16 @@ Several example data packages can be found in the [datasets organization on gith

## Descriptor

Data Package descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section.

When available as a file a descriptor `MUST` be named `datapackage.json` and it `MUST` be placed in the top-level directory (relative to any other resources provided as part of the data package).

The descriptor is the central file in a Data Package. It provides:

- General metadata such as the package's title, license, publisher etc
- A list of the data "resources" that make up the package including their location on disk or online and other relevant information (including, possibly, schema information about these data resources in a structured form)

A Data Package descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)). When available as a file it `MUST` be named `datapackage.json` and it `MUST` be placed in the top-level directory (relative to any other resources provided as part of the data package).

The descriptor `MUST` contain a `resources` property describing the data resources.

All other properties are considered `metadata` properties. The descriptor `MAY` contain any number of other `metadata` properties. The following sections provides a description of required and optional metadata properties for a Data Package descriptor.

Adherence to the specification does not imply that additional, non-specified properties cannot be used: a descriptor `MAY` include any number of properties in additional to those described as required and optional properties. For example, if you were storing time series data and wanted to list the temporal coverage of the data in the Data Package you could add a property `temporal` (cf [Dublin Core](http://dublincore.org/documents/usageguide/qualifiers.shtml#temporal)):

```json
"temporal": {
"name": "19th Century",
"start": "1800-01-01",
"end": "1899-12-31"
}
```

This flexibility enables specific communities to extend Data Packages as appropriate for the data they manage. As an example, the [Tabular Data Package](https://specs.frictionlessdata.io/tabular-data-package/) specification extends Data Package to the case where all the data is tabular and stored in CSV.

Here is an illustrative example of a datapackage JSON file:
An example of a Data Package descriptor:

```json
{
Expand All @@ -114,6 +100,10 @@ Here is an illustrative example of a datapackage JSON file:
}
```

:::note[File Names]
A file containing a Data Package descriptor `MAY` have other name rather than `datapackage.json` as an internal part of some project or system if supported by corresponding implementations. A descriptor `SHOULD NOT` be externally published under any other name than `datapackage.json`.
:::

## Properties

A Data Package descriptor `MUST` have `resoures` property and `SHOULD` have `name`, `id`, `licenses`, and `profile` properties.
Expand Down
39 changes: 3 additions & 36 deletions content/docs/specifications/data-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,40 +21,11 @@ A simple format to describe and package a single data resource such as a individ

The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `SHOULD NOT`, `RECOMMENDED`, `MAY`, and `OPTIONAL` in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt)

## Example

A minimal Data Resource looks as follows:

With data accessible via the local filesystem.

```json
{
"name": "resource-name",
"path": "resource-path.csv"
}
```

With data accessible via http.

```json
{
"name": "resource-name",
"path": "http://example.com/resource-path.csv"
}
```
## Descriptor

A minimal Data Resource pointing to some inline data looks as follows.
Data Resource descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section.

```json
{
"name": "resource-name",
"data": {
"resource-name-data": [{ "a": 1, "b": 2 }]
}
}
```

A comprehensive Data Resource example with all required, recommended and optional properties looks as follows.
An example of a Data Resource descriptor:

```json
{
Expand All @@ -73,10 +44,6 @@ A comprehensive Data Resource example with all required, recommended and optiona
}
```

## Descriptor

A Data Resource descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)).

## Properties

Standard properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties.
Expand Down
32 changes: 32 additions & 0 deletions content/docs/specifications/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,38 @@ The key words `MUST`, `MUST NOT`, `REQUIRED`, `SHALL`, `SHALL NOT`, `SHOULD`, `S

## Definitions

### Descriptor

The Data Package Standard uses a concept of a `descriptor` to represent metadata defined according to the core specefications such as Data Package or Table Schema.

On logical level, a descriptor is represented by a data structure. The data structure `MUST` be a JSON `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, a descriptor is represented by a file. The file `MUST` contains a valid JSON `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a file containing a descriptor.

:::note[File Formats]
A descriptor `MAY` be serialized using alternative formats like YAML or TOML as an internal part of some project or system if supported by corresponding implementations. A descriptor `SHOULD NOT` be externally published in any other format rather than JSON.
:::

#### Custom Properties

The Data Package specifications define a set of standard properties to be used and allows custom properties to be added. It is `RECOMMENDED` to use `namespace:property` naming convention for custom properties.

Adherence to a specification does not imply that additional, non-specified properties cannot be used: a descriptor `MAY` include any number of properties in additional to those described as required and optional properties. For example, if you were storing time series data and wanted to list the temporal coverage of the data in the Data Package you could add a property `temporal` (cf [Dublin Core](http://dublincore.org/documents/usageguide/qualifiers.shtml#temporal)):

```json
{
"dc:temporal": {
"name": "19th Century",
"start": "1800-01-01",
"end": "1899-12-31"
}
}
```

This flexibility enables specific communities to extend metadata as appropriate for the data they manage. As an example, the [Tabular Data Package](https://specs.frictionlessdata.io/tabular-data-package/) specification extends Data Package to the case where all the data is tabular and stored in CSV.

### URL or Path

A `URL or Path` is a `string` with the following additional constraints:
Expand Down
16 changes: 9 additions & 7 deletions content/docs/specifications/table-dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,15 +33,17 @@ Table Dialect supersedes [CSV Dialect](https://specs.frictionlessdata.io/csv-dia

## Descriptor

On logical level, Table Dialect descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).
Table Dialect descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section.

On physical level, Table Dialect descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.
An example of a Table Dialect descriptor:

The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI.

This specification defines a set of standardized properties to be used and allows custom properties to be added. It is `RECOMMENDED` to use `namespace:property` naming convention for custom properties.
```json
{
"header": false,
"delimiter": ";",
"quoteChar": "'"
}
```

## Properties

Expand Down
8 changes: 3 additions & 5 deletions content/docs/specifications/table-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,9 @@ For example, `constraints` `SHOULD` be tested on the logical representation of d

## Descriptor

A Table Schema is represented by a descriptor. The descriptor `MUST` be a JSON `object` (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)).
Table Schema descriptor `MUST` be a descriptor as per [Descriptor](../glossary/#descriptor) definition. A list of standard properties that can be included into a descriptor is defined in the [Properties](#properties) section.

The descriptor `MAY` have the additional properties set out below and `MAY` contain any number of other properties not defined in this specification.

The following is an illustration of this structure:
An example of a Table Schema descriptor:

```json
{
Expand All @@ -92,7 +90,7 @@ The following is an illustration of this structure:
...
],
"missingValues": [ ... ],
"primaryKey": [ ... ]
"primaryKey": [ ... ],
"foreignKeys": [... ]
}
```
Expand Down

0 comments on commit b35c11b

Please sign in to comment.