From db23e66f4a74f0ae8b7c0422be58e6608fc352eb Mon Sep 17 00:00:00 2001 From: roll Date: Wed, 21 Feb 2024 17:56:46 +0000 Subject: [PATCH 01/10] Updated table-schema --- content/docs/specifications/table-schema.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index 4f21df08..a190827b 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -67,7 +67,15 @@ For example, `constraints` `SHOULD` be tested on the logical representation of d ## Descriptor -A Table Schema is represented by a descriptor. The descriptor `MUST` be a JSON `object` (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)). +On logical level, Table Schema descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt). + +On physical level, Table Schema descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. + +The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. + +## Metadata + +### Fields It `MUST` contain a property `fields`. `fields` `MUST` be an array where each entry in the array is a field descriptor (as defined below). The order of elements in `fields` array `SHOULD` be the order of fields in the CSV file. The number of elements in `fields` array `SHOULD` be the same as the number of fields in the CSV file. From 7b7266bc0e7c261ca766ec8050e89b5222f68a57 Mon Sep 17 00:00:00 2001 From: roll Date: Wed, 21 Feb 2024 18:19:37 +0000 Subject: [PATCH 02/10] Improved wording --- content/docs/specifications/data-package.md | 12 ++++++++---- content/docs/specifications/data-resource.md | 10 ++++++++-- content/docs/specifications/table-dialect.md | 10 +++++++++- content/docs/specifications/table-schema.md | 2 ++ 4 files changed, 27 insertions(+), 7 deletions(-) diff --git a/content/docs/specifications/data-package.md b/content/docs/specifications/data-package.md index 000550a4..fc0728ba 100644 --- a/content/docs/specifications/data-package.md +++ b/content/docs/specifications/data-package.md @@ -81,15 +81,19 @@ Several example data packages can be found in the [datasets organization on gith ### Descriptor +On logical level, Data Package descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt). + +On physical level, Data Package descriptor is represented by a file. A data producer `MAY` use any suitable serialization format and `SHOULD` name the file `datapackage.json`. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. + +The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Package while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. + +This specification does not introduce any discoverability mechanisms making a serialized Data Package be referenced only directly by its URI. It means that techically the name of a Data Package file is irrelevant although it is good practice to use `datapackage.json` as a public convention. + The descriptor is the central file in a Data Package. It provides: - General metadata such as the package's title, license, publisher etc - A list of the data "resources" that make up the package including their location on disk or online and other relevant information (including, possibly, schema information about these data resources in a structured form) -A Data Package descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627][]). When available as a file it `MUST` be named `datapackage.json` and it `MUST` be placed in the top-level directory (relative to any other resources provided as part of the data package). - -[RFC 4627]: http://www.ietf.org/rfc/rfc4627.txt - The descriptor `MUST` contain a `resources` property describing the data resources. All other properties are considered `metadata` properties. The descriptor `MAY` contain any number of other `metadata` properties. The following sections provides a description of required and optional metadata properties for a Data Package descriptor. diff --git a/content/docs/specifications/data-resource.md b/content/docs/specifications/data-resource.md index fbad99f8..01f5bfbd 100644 --- a/content/docs/specifications/data-resource.md +++ b/content/docs/specifications/data-resource.md @@ -81,9 +81,15 @@ A comprehensive Data Resource example with all required, recommended and optiona } ``` -### Descriptor +## Descriptor -A Data Resource descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627][]). +On logical level, Data Resource descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt). + +On physical level, Data Resource descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. + +The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. + +This specification does not introduce any discoverability mechanisms making a serialized Data Resource be referenced only directly by its URI. Key properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties. diff --git a/content/docs/specifications/table-dialect.md b/content/docs/specifications/table-dialect.md index d6bd2adc..ff3d7e39 100644 --- a/content/docs/specifications/table-dialect.md +++ b/content/docs/specifications/table-dialect.md @@ -37,7 +37,15 @@ CSV Dialect is useful for programmes which might have to deal with multiple dial Some related work can be found in [this comparison of csv dialect support](https://docs.google.com/spreadsheet/ccc?key=0AmU3V2vcPKrIdEhoU1NQSWtoQmJwcUNCelJtdkx2bFE&usp=sharing), this [example of similar JSON format](http://panda.readthedocs.org/en/latest/api.html#data-uploads), and in Python's [PEP 305](http://www.python.org/dev/peps/pep-0305/). -## Specification +## Descriptor + +On logical level, Table Dialect descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt). + +On physical level, Table Dialect descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. + +The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. + +This specification does not introduce any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI. A CSV Dialect descriptor, `dialect`, `MUST` be a JSON `object` with the following properties: diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index a190827b..ecfb8e75 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -73,6 +73,8 @@ On physical level, Table Schema descriptor is represented by a file. A data prod The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. +This specification does not introduce any discoverability mechanisms making a serialized Table Schema be referenced only directly by its URI. + ## Metadata ### Fields From a6dbfae0c1c02b21180ee869a16baab872585a41 Mon Sep 17 00:00:00 2001 From: roll Date: Wed, 21 Feb 2024 18:23:30 +0000 Subject: [PATCH 03/10] Updated wording --- content/docs/specifications/data-package.md | 2 +- content/docs/specifications/data-resource.md | 2 +- content/docs/specifications/table-dialect.md | 2 +- content/docs/specifications/table-schema.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/specifications/data-package.md b/content/docs/specifications/data-package.md index fc0728ba..60850b3c 100644 --- a/content/docs/specifications/data-package.md +++ b/content/docs/specifications/data-package.md @@ -87,7 +87,7 @@ On physical level, Data Package descriptor is represented by a file. A data prod The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Package while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not introduce any discoverability mechanisms making a serialized Data Package be referenced only directly by its URI. It means that techically the name of a Data Package file is irrelevant although it is good practice to use `datapackage.json` as a public convention. +This specification does not define any discoverability mechanisms making a serialized Data Package be referenced only directly by its URI. It means that techically the name of a Data Package file is irrelevant although it is good practice to use `datapackage.json` as a public convention. The descriptor is the central file in a Data Package. It provides: diff --git a/content/docs/specifications/data-resource.md b/content/docs/specifications/data-resource.md index 01f5bfbd..83d9b71e 100644 --- a/content/docs/specifications/data-resource.md +++ b/content/docs/specifications/data-resource.md @@ -89,7 +89,7 @@ On physical level, Data Resource descriptor is represented by a file. A data pro The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not introduce any discoverability mechanisms making a serialized Data Resource be referenced only directly by its URI. +This specification does not define any discoverability mechanisms making a serialized Data Resource be referenced only directly by its URI. Key properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties. diff --git a/content/docs/specifications/table-dialect.md b/content/docs/specifications/table-dialect.md index ff3d7e39..a3ee2d0f 100644 --- a/content/docs/specifications/table-dialect.md +++ b/content/docs/specifications/table-dialect.md @@ -45,7 +45,7 @@ On physical level, Table Dialect descriptor is represented by a file. A data pro The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not introduce any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI. +This specification does not define any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI. A CSV Dialect descriptor, `dialect`, `MUST` be a JSON `object` with the following properties: diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index ecfb8e75..973d339d 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -73,7 +73,7 @@ On physical level, Table Schema descriptor is represented by a file. A data prod The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not introduce any discoverability mechanisms making a serialized Table Schema be referenced only directly by its URI. +This specification does not define any discoverability mechanisms making a serialized Table Schema be referenced only directly by its URI. ## Metadata From 44c09e27cc4ff09ae6a7c8f456cb8ff1affe2bab Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 14 Mar 2024 15:48:20 +0000 Subject: [PATCH 04/10] Update content/docs/specifications/data-package.md Co-authored-by: Peter Desmet --- content/docs/specifications/data-package.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specifications/data-package.md b/content/docs/specifications/data-package.md index 60850b3c..d2548c4f 100644 --- a/content/docs/specifications/data-package.md +++ b/content/docs/specifications/data-package.md @@ -85,7 +85,7 @@ On logical level, Data Package descriptor is represented by a data structure. Th On physical level, Data Package descriptor is represented by a file. A data producer `MAY` use any suitable serialization format and `SHOULD` name the file `datapackage.json`. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. -The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Package while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. +JSON is the serialization format that `MUST` be used for publishing a Data Package while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. This specification does not define any discoverability mechanisms making a serialized Data Package be referenced only directly by its URI. It means that techically the name of a Data Package file is irrelevant although it is good practice to use `datapackage.json` as a public convention. From f00284ae788691905572ef992c3df91f555b7412 Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 14 Mar 2024 15:51:11 +0000 Subject: [PATCH 05/10] Updated JSON wording for other specs --- content/docs/specifications/data-resource.md | 2 +- content/docs/specifications/table-dialect.md | 2 +- content/docs/specifications/table-schema.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/specifications/data-resource.md b/content/docs/specifications/data-resource.md index 83d9b71e..1e8de870 100644 --- a/content/docs/specifications/data-resource.md +++ b/content/docs/specifications/data-resource.md @@ -87,7 +87,7 @@ On logical level, Data Resource descriptor is represented by a data structure. T On physical level, Data Resource descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. -The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. +JSON is the serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. This specification does not define any discoverability mechanisms making a serialized Data Resource be referenced only directly by its URI. diff --git a/content/docs/specifications/table-dialect.md b/content/docs/specifications/table-dialect.md index a3ee2d0f..e53a2ff5 100644 --- a/content/docs/specifications/table-dialect.md +++ b/content/docs/specifications/table-dialect.md @@ -43,7 +43,7 @@ On logical level, Table Dialect descriptor is represented by a data structure. T On physical level, Table Dialect descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. -The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. +JSON is the serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. This specification does not define any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI. diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index 5b2fdb4e..ce685690 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -71,7 +71,7 @@ On logical level, Table Schema descriptor is represented by a data structure. Th On physical level, Table Schema descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML. -The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. +JSON is the serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. This specification does not define any discoverability mechanisms making a serialized Table Schema be referenced only directly by its URI. From 4f5ead33f4ff867e7d5d4c267cf560af0c239123 Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 14 Mar 2024 15:52:30 +0000 Subject: [PATCH 06/10] Update content/docs/specifications/data-package.md Co-authored-by: Peter Desmet --- content/docs/specifications/data-package.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specifications/data-package.md b/content/docs/specifications/data-package.md index d2548c4f..75abf364 100644 --- a/content/docs/specifications/data-package.md +++ b/content/docs/specifications/data-package.md @@ -87,7 +87,7 @@ On physical level, Data Package descriptor is represented by a file. A data prod JSON is the serialization format that `MUST` be used for publishing a Data Package while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not define any discoverability mechanisms making a serialized Data Package be referenced only directly by its URI. It means that techically the name of a Data Package file is irrelevant although it is good practice to use `datapackage.json` as a public convention. +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Data Package. It is good practice and a common convention to name the file `datapackage.json`. The descriptor is the central file in a Data Package. It provides: From 9157b68f96c42defb897c907af3f03ad9347dd66 Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 14 Mar 2024 15:52:57 +0000 Subject: [PATCH 07/10] Update content/docs/specifications/data-resource.md Co-authored-by: Peter Desmet --- content/docs/specifications/data-resource.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specifications/data-resource.md b/content/docs/specifications/data-resource.md index 1e8de870..c8c05c87 100644 --- a/content/docs/specifications/data-resource.md +++ b/content/docs/specifications/data-resource.md @@ -89,7 +89,7 @@ On physical level, Data Resource descriptor is represented by a file. A data pro JSON is the serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not define any discoverability mechanisms making a serialized Data Resource be referenced only directly by its URI. +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Data Resource. Key properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties. From 229bb866b27d34c9501089c7438103452ddc085a Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 14 Mar 2024 15:53:04 +0000 Subject: [PATCH 08/10] Update content/docs/specifications/table-dialect.md Co-authored-by: Peter Desmet --- content/docs/specifications/table-dialect.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specifications/table-dialect.md b/content/docs/specifications/table-dialect.md index e53a2ff5..afec8278 100644 --- a/content/docs/specifications/table-dialect.md +++ b/content/docs/specifications/table-dialect.md @@ -45,7 +45,7 @@ On physical level, Table Dialect descriptor is represented by a file. A data pro JSON is the serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not define any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI. +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Dialect. A CSV Dialect descriptor, `dialect`, `MUST` be a JSON `object` with the following properties: From b9e547696af2547d9825fff76dd6aec5b078d0bb Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 14 Mar 2024 15:53:31 +0000 Subject: [PATCH 09/10] Update content/docs/specifications/table-schema.md Co-authored-by: Peter Desmet --- content/docs/specifications/table-schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index ce685690..ad79b97a 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -73,7 +73,7 @@ On physical level, Table Schema descriptor is represented by a file. A data prod JSON is the serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not define any discoverability mechanisms making a serialized Table Schema be referenced only directly by its URI. +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Schema. ## Metadata From 610a7885662d908715e2ee6a9ec30c6a27006d00 Mon Sep 17 00:00:00 2001 From: roll Date: Thu, 28 Mar 2024 09:22:31 +0000 Subject: [PATCH 10/10] Fixed linting --- content/docs/specifications/data-resource.md | 2 +- content/docs/specifications/table-dialect.md | 2 +- content/docs/specifications/table-schema.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/content/docs/specifications/data-resource.md b/content/docs/specifications/data-resource.md index c8c05c87..e7deab14 100644 --- a/content/docs/specifications/data-resource.md +++ b/content/docs/specifications/data-resource.md @@ -89,7 +89,7 @@ On physical level, Data Resource descriptor is represented by a file. A data pro JSON is the serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Data Resource. +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Data Resource. Key properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties. diff --git a/content/docs/specifications/table-dialect.md b/content/docs/specifications/table-dialect.md index afec8278..1144fa8c 100644 --- a/content/docs/specifications/table-dialect.md +++ b/content/docs/specifications/table-dialect.md @@ -45,7 +45,7 @@ On physical level, Table Dialect descriptor is represented by a file. A data pro JSON is the serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Dialect. +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Dialect. A CSV Dialect descriptor, `dialect`, `MUST` be a JSON `object` with the following properties: diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index 5e43c688..db94ec59 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -73,7 +73,7 @@ On physical level, Table Schema descriptor is represented by a file. A data prod JSON is the serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations. -This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Schema. +This specification does not define any discoverability mechanisms. Any URI can be used to directly reference a serialized Table Schema. ## Metadata