From 3459830a8d21c83d4eea1e808c30fb6ca2fb3542 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 25 Dec 2023 23:38:17 +0100 Subject: [PATCH 01/20] Update README.md --- patterns/README.md | 61 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/patterns/README.md b/patterns/README.md index 23c88987..75760228 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1395,3 +1395,64 @@ substantial limitation when creating value labels or categoricals, since system missing values can typically be easily distinguished from other missing values when exported in CSV format (e.g., "." in Stata or SAS, "NA" in R, or "" in Pandas). + +## Table Schema: Relationship between Fields + +### Overview + +La structure des tabular dataset est simple : un ensemble de Fields regroupé dans un tableau. + +Pourtant, les données présentes sont souvent complexes et traduisent une interdépendance entre les Fields (voir explications dans l'Internet-Draft [NTV tabular format (NTV-TAB)](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2)). + +Prenons l'exemple du dataset suivant: + +| country | region | code | population | +|---------|----------------|-------|--------------| +| France | European Union | FR | 65 eu 449 | +| Spain | European Union | ES | 48 | +| Estonia | European Union | ES | 1.3 | +| Nigeria | Africa | NI | 223 afr 1460 | + +Le schéma de données pour ce dataset indique dans le Field Descriptor "description" : + +* Field "code": "country code alpha-2" +* Field "population": "region population in 2022 (millions)" + +Si maintenant, on regarde les données on constate que ce jeu de données n'est pas consistent car il contient deux erreurs de structure: + +* Le code doit être unique pour chaque pays, on ne peut donc avoir ES pour Spain et Estonia, +* La population de European Union ne peut avoir deux valeurs différentes (449 et 48) + +Ces erreurs de structure rendent les données inexploitables et pourtant elles ne sont pas détectées dans la validation du jeu de données (dans la version actuelle de Table Schema, il n'y a pas de descripteurs pour exprimer cette dépendance entre deux champs). + +L'objet de cette spécification est donc d'une part d'exprimer ces contraintes de structure dans le schéma de données et d'autre part de définir les contrôles associés à la validation du jeu de données. + +### Contexte + +Ce sujet a été étudié et traité pour les bases de données et à conduit d'une part à la mise d'une méthodologie de spécification des relations et d'autre part à la mise en oeuvre des bases de données relationnelles consistantes. + +La méthodologie repose principalement sur les [Entity–relationship model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model) : + +> An entity–relationship model (or ER model) describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types (which classify the things of interest) and specifies relationships that can exist between entities (instances of those entity types). + +The Entity–relationship model est déclinée selon the conceptual-logical-physical hierarchy. + +Relationships sont exprimées de facon litterale par un nom et de façon numérique par une [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). + +### Principles + +Cette méthodologie appliquée pour les bases de données peut être appliquée également pour les tabular data dont la structure est similaire à celle des tables des bases de données relationnelles mais dont la représentation des relations est différente (voir [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) utilisés dans les représentations tabulaires). + +Cette déclinaison est expliquée dans le [Notebook lié](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). + +Le contrôle de l'application d'une relation pour un jeu de données défini (validation) peut s'effectuer simplement (voir [exemple](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) d'implementation simple). + +### Proposed extensions + +### Implementations + +### Specification + +### Suggested implementations + +### Notes From bae197488cceac25e47cf09519e7128c163c67b6 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Tue, 26 Dec 2023 20:04:24 +0100 Subject: [PATCH 02/20] relationships property --- patterns/README.md | 145 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 137 insertions(+), 8 deletions(-) diff --git a/patterns/README.md b/patterns/README.md index 75760228..256b13f9 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1415,21 +1415,21 @@ Prenons l'exemple du dataset suivant: Le schéma de données pour ce dataset indique dans le Field Descriptor "description" : -* Field "code": "country code alpha-2" -* Field "population": "region population in 2022 (millions)" +* pour le Field "code": "country code alpha-2" +* pour le Field "population": "region population in 2022 (millions)" Si maintenant, on regarde les données on constate que ce jeu de données n'est pas consistent car il contient deux erreurs de structure: -* Le code doit être unique pour chaque pays, on ne peut donc avoir ES pour Spain et Estonia, -* La population de European Union ne peut avoir deux valeurs différentes (449 et 48) +* Le "code" doit être unique pour chaque pays, on ne peut donc avoir "ES" pour "Spain" et "Estonia", +* La "population" de "European Union" ne peut avoir deux valeurs différentes (449 et 48) -Ces erreurs de structure rendent les données inexploitables et pourtant elles ne sont pas détectées dans la validation du jeu de données (dans la version actuelle de Table Schema, il n'y a pas de descripteurs pour exprimer cette dépendance entre deux champs). +Ces erreurs de structure rendent les données inexploitables et pourtant elles ne sont pas détectées dans la validation du jeu de données (dans la version actuelle de Table Schema, il n'y a pas de Descriptors pour exprimer cette dépendance entre deux champs). L'objet de cette spécification est donc d'une part d'exprimer ces contraintes de structure dans le schéma de données et d'autre part de définir les contrôles associés à la validation du jeu de données. ### Contexte -Ce sujet a été étudié et traité pour les bases de données et à conduit d'une part à la mise d'une méthodologie de spécification des relations et d'autre part à la mise en oeuvre des bases de données relationnelles consistantes. +Ce sujet a été étudié et traité pour les bases de données et a conduit d'une part à la mise d'une méthodologie de spécification des relations et d'autre part à la mise en oeuvre des bases de données relationnelles consistantes. La méthodologie repose principalement sur les [Entity–relationship model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model) : @@ -1437,18 +1437,147 @@ La méthodologie repose principalement sur les [Entity–relationship model](htt The Entity–relationship model est déclinée selon the conceptual-logical-physical hierarchy. -Relationships sont exprimées de facon litterale par un nom et de façon numérique par une [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). +The Relationships sont exprimées de facon litterale par un nom et de façon numérique par une [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). ### Principles +Deux aspects sont à traiter : + +* expression des relations: + Cette méthodologie appliquée pour les bases de données peut être appliquée également pour les tabular data dont la structure est similaire à celle des tables des bases de données relationnelles mais dont la représentation des relations est différente (voir [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) utilisés dans les représentations tabulaires). Cette déclinaison est expliquée dans le [Notebook lié](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). -Le contrôle de l'application d'une relation pour un jeu de données défini (validation) peut s'effectuer simplement (voir [exemple](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) d'implementation simple). +L'utilisation d'un modèle de données est un moyen simple pour exprimer les relations mais il n'est pas obligatoire. On peut très bien exprimer directement les relations au niveau du schéma de données. + +* validité d'un jeu de données: + +Le contrôle de la validité d'une relation pour un jeu de données défini est une des fonctions de [l'analyse des structures tabulaires](https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis.pdf). Elle ne nécessite que des fonctions de comptage accessibles pour tout type de langage (voir un [exemple](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) d'implementation simple). ### Proposed extensions +Une relation est définie par les informations suivantes : + +* les deux Fields impliqués, +* la représentation textuelle de la relation, +* la nature de la relation + +Trois propositions d'extension de Table Schema sont proposées : + +1 - New Field Descriptor: + +Les propriétés associées au Field Descriptor "relationships" pourraient être : + +* "field" : name of the other Field impliqué +* "description" : description string (optional) +* "link": nature de la relationship + +Pros + +* No mixing with other descriptors +* Consistent with a field view + +Cons + +* why choose one Field from the two ? + +Examples : + +```python +{ "fields" : [ + { "name": "country", + "relationships": [ + { "field" : "code", + "description" : "is the country code alpha-2 of", + "link" : "coupled" } + ] + } + { "name": "region", + "relationships": [ + { "field" : "population", + "description" : "is the population of", + "link" : "derived"} + ] + } + ] +} + +2 - New Constrainst Descriptor: + +Les propriétés associées au Field Descriptor "relationships" pourraient être : + +* "field" : name of the other Field impliqué +* "description" : description string (optional) +* "link": nature de la relationship + +Pros + +* The « constraints » property is consistent with the point + +Cons + +* The « crossed » link can’t be validate at the data entry +* Need to add a level in the properties tree +* why choose one Field from the two ? + +Examples : + +```python +{ "fields" : [ + { "name": "country", + "constrainst" : { + "relationships": [ + { "field" : "code", + "description" : "is the country code alpha-2 of", + "link" : "coupled" } + ] + } + } + { "name": "region", + "constrainst" : { + "relationships": [ + { "field" : "population", + "description" : "is the population of", + "link" : "derived"} + ] + } + } + ] +} + +3 - New Table Descriptor: + +Les propriétés associées au Table Descriptor "relationships" pourraient être : + +* "fields" : array with the names of the two Fields impliqué +* "description" : description string (optional) +* "link": nature de la relationship + +Pros + +* No mixing with Fields descriptors + +Cons + +* Need to add a new Table Descriptor + +Examples : + +```python +{ "fields": {...}, + "relationships": [ + { "fields" : [ "country", "code"], + "description" : "is the country code alpha-2 of", + "link" : "coupled" + } + { "fields" : [ "region", "population"], + "description" : "is the population of", + "link" : "derived"} + ] +} +``` + ### Implementations ### Specification From 4345418304c58b7053183dd74ed6647eb68bcdc3 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Wed, 27 Dec 2023 00:09:23 +0100 Subject: [PATCH 03/20] specif --- patterns/README.md | 238 ++++++++++++++++++++++++++------------------- 1 file changed, 136 insertions(+), 102 deletions(-) diff --git a/patterns/README.md b/patterns/README.md index 256b13f9..98274ecd 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1406,12 +1406,12 @@ Pourtant, les données présentes sont souvent complexes et traduisent une inter Prenons l'exemple du dataset suivant: -| country | region | code | population | -|---------|----------------|-------|--------------| -| France | European Union | FR | 65 eu 449 | -| Spain | European Union | ES | 48 | -| Estonia | European Union | ES | 1.3 | -| Nigeria | Africa | NI | 223 afr 1460 | +| country | region | code | population | +|---------|----------------|-------|------------| +| France | European Union | FR | 449 | +| Spain | European Union | ES | 48 | +| Estonia | European Union | ES | 449 | +| Nigeria | Africa | NI | 1460 | Le schéma de données pour ce dataset indique dans le Field Descriptor "description" : @@ -1420,8 +1420,8 @@ Le schéma de données pour ce dataset indique dans le Field Descriptor "descrip Si maintenant, on regarde les données on constate que ce jeu de données n'est pas consistent car il contient deux erreurs de structure: -* Le "code" doit être unique pour chaque pays, on ne peut donc avoir "ES" pour "Spain" et "Estonia", -* La "population" de "European Union" ne peut avoir deux valeurs différentes (449 et 48) +* La valeur du Field "code" doit être unique pour chaque pays, on ne peut donc avoir "ES" pour "Spain" et "Estonia", +* La valeur du Field "population" de "European Union" ne peut avoir deux valeurs différentes (449 et 48) Ces erreurs de structure rendent les données inexploitables et pourtant elles ne sont pas détectées dans la validation du jeu de données (dans la version actuelle de Table Schema, il n'y a pas de Descriptors pour exprimer cette dépendance entre deux champs). @@ -1429,31 +1429,31 @@ L'objet de cette spécification est donc d'une part d'exprimer ces contraintes d ### Contexte -Ce sujet a été étudié et traité pour les bases de données et a conduit d'une part à la mise d'une méthodologie de spécification des relations et d'autre part à la mise en oeuvre des bases de données relationnelles consistantes. +Ce sujet a été étudié et traité pour les bases de données et a conduit d'une part à la définition d'une méthodologie de spécification des relations et d'autre part à la mise en oeuvre des bases de données relationnelles consistantes. La méthodologie repose principalement sur les [Entity–relationship model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model) : -> An entity–relationship model (or ER model) describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types (which classify the things of interest) and specifies relationships that can exist between entities (instances of those entity types). +> *An entity–relationship model (or ER model) describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types (which classify the things of interest) and specifies relationships that can exist between entities (instances of those entity types).* The Entity–relationship model est déclinée selon the conceptual-logical-physical hierarchy. -The Relationships sont exprimées de facon litterale par un nom et de façon numérique par une [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). +The Relationships sont exprimées de facon litterale par un nom et de façon structurée par une [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). ### Principles Deux aspects sont à traiter : -* expression des relations: +* **expression des relations**: -Cette méthodologie appliquée pour les bases de données peut être appliquée également pour les tabular data dont la structure est similaire à celle des tables des bases de données relationnelles mais dont la représentation des relations est différente (voir [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) utilisés dans les représentations tabulaires). + Cette méthodologie appliquée pour les bases de données peut être appliquée également pour les tabular data dont la structure est similaire à celle des tables des bases de données relationnelles mais dont la représentation des relations est différente (voir [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) utilisés dans les représentations tabulaires). -Cette déclinaison est expliquée dans le [Notebook lié](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). + Cette déclinaison est expliquée dans le [Notebook lié](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). -L'utilisation d'un modèle de données est un moyen simple pour exprimer les relations mais il n'est pas obligatoire. On peut très bien exprimer directement les relations au niveau du schéma de données. + L'utilisation d'un modèle de données est un moyen simple pour exprimer les relations mais il n'est pas obligatoire. On peut très bien exprimer directement les relations au niveau du schéma de données. -* validité d'un jeu de données: +* **validité d'un jeu de données**: -Le contrôle de la validité d'une relation pour un jeu de données défini est une des fonctions de [l'analyse des structures tabulaires](https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis.pdf). Elle ne nécessite que des fonctions de comptage accessibles pour tout type de langage (voir un [exemple](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) d'implementation simple). + Le contrôle de la validité d'une relation pour un jeu de données défini est une des fonctions de [l'analyse des structures tabulaires](https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis.pdf). Elle ne nécessite que des fonctions de comptage accessibles pour tout type de langage (voir un [exemple](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) d'implementation simple). ### Proposed extensions @@ -1465,123 +1465,157 @@ Une relation est définie par les informations suivantes : Trois propositions d'extension de Table Schema sont proposées : -1 - New Field Descriptor: +* **New Field Descriptor**: -Les propriétés associées au Field Descriptor "relationships" pourraient être : + Un Field Descriptor "relationships" est ajouté. + Les propriétés associées à ce Descriptor pourraient être : -* "field" : name of the other Field impliqué -* "description" : description string (optional) -* "link": nature de la relationship + * "field" : name of the other Field impliqué + * "description" : description string (optional) + * "link": nature de la relationship -Pros + Pros -* No mixing with other descriptors -* Consistent with a field view + * No mixing with other descriptors + * Consistent with a field view -Cons + Cons -* why choose one Field from the two ? + * why choose one Field from the two ? -Examples : + Examples : -```python -{ "fields" : [ - { "name": "country", - "relationships": [ - { "field" : "code", - "description" : "is the country code alpha-2 of", - "link" : "coupled" } - ] - } - { "name": "region", - "relationships": [ - { "field" : "population", - "description" : "is the population of", - "link" : "derived"} - ] - } - ] -} - -2 - New Constrainst Descriptor: - -Les propriétés associées au Field Descriptor "relationships" pourraient être : - -* "field" : name of the other Field impliqué -* "description" : description string (optional) -* "link": nature de la relationship - -Pros - -* The « constraints » property is consistent with the point - -Cons - -* The « crossed » link can’t be validate at the data entry -* Need to add a level in the properties tree -* why choose one Field from the two ? - -Examples : - -```python -{ "fields" : [ - { "name": "country", - "constrainst" : { + ```python + { "fields" : [ + { "name": "country", "relationships": [ { "field" : "code", "description" : "is the country code alpha-2 of", - "link" : "coupled" } + "link" : "coupled" } ] } - } - { "name": "region", - "constrainst" : { + { "name": "region", "relationships": [ { "field" : "population", "description" : "is the population of", "link" : "derived"} ] } - } - ] -} + ] + } + ``` -3 - New Table Descriptor: +* **New Constraint Property**: -Les propriétés associées au Table Descriptor "relationships" pourraient être : + Une Constraint Property "relationships" est ajoutée. + Les propriétés associées à cette Property pourraient être : -* "fields" : array with the names of the two Fields impliqué -* "description" : description string (optional) -* "link": nature de la relationship + * "field" : name of the other Field impliqué + * "description" : description string (optional) + * "link": nature de la relationship -Pros + Pros -* No mixing with Fields descriptors + * The "constraints" property is consistent with the point -Cons + Cons -* Need to add a new Table Descriptor + * This Property is an object (more complex than the other) + * Need to add a level in the properties tree + * why choose one Field from the two ? -Examples : + Examples : -```python -{ "fields": {...}, - "relationships": [ - { "fields" : [ "country", "code"], - "description" : "is the country code alpha-2 of", - "link" : "coupled" - } - { "fields" : [ "region", "population"], - "description" : "is the population of", - "link" : "derived"} - ] -} -``` + ```python + { "fields" : [ + { "name": "country", + "constrainst" : { + "relationships": [ + { "field" : "code", + "description" : "is the country code alpha-2 of", + "link" : "coupled" } + ] + } + } + { "name": "region", + "constrainst" : { + "relationships": [ + { "field" : "population", + "description" : "is the population of", + "link" : "derived"} + ] + } + } + ] + } + ``` -### Implementations +* **New Table Descriptor**: + + Un Table Descriptor "relationships" est ajouté. + Les propriétés associées à ce Descriptor pourraient être : + + * "fields" : array with the names of the two Fields impliqué + * "description" : description string (optional) + * "link": nature de la relationship + + Pros + + * No mixing with Fields descriptors + + Cons + + * Need to add a new Table Descriptor + + Examples : + + ```python + { "fields": [...], + "relationships": [ + { "fields" : [ "country", "code"], + "description" : "is the country code alpha-2 of", + "link" : "coupled" + } + { "fields" : [ "region", "population"], + "description" : "is the population of", + "link" : "derived"} + ] + } + ``` ### Specification +Si la solution 3 est retenue (Table Descriptor), la spécification pourrait être la suivante: + +The Relationships Descriptor MAY be used to define the dependency between fields. +The Relationships Descriptor, if present, MUST be a JSON object and MUST contain two properties : + +* "fields" : Array with the property "name" of the two fields linked +* "link" : the nature of the relationship between them + +The "link" property value MUST be one of the three following : + +* "derived" : + * The values of the child field are dependant on the values of the parent field (i.e. a value in the parent field is associated with a single value in the child field). + * e.g. The "Quarter" field [ "T1", "T2", "T2", "T1", "T2", "T1" ] and the "month" field [ "jan", "apr", "jun", "feb", "may", "jan"] are derived, + * i.e. if a new entry "jun" is added, the corresponding "quarter" value must be "T2". + +* "coupled" : + * The values of one field are associated to the values of the other field. + * e.g. The "Nickname" field [ "jock", "paulo", "lili", "paulo" ] and the "name" field [ "john", "paul", "leah", "paul" ] are coupled, + * i.e. if a new entry "lili" is added, the corresponding "Name" value must be "leah" just as if a new entry "leah" is added, the corresponding "nickname" value must be "lili". + +* "crossed" : + This relationship means that all the different values of one field are associated with all the different values of the other field. + e.g. the "Year" Field [ 2020, 2020, 2021, 2021] and the "Semester" Field [ "S1", "S2", "S1", "S2" ] are crossed + i.e the year 2020 is associated to semesters "s1" and "s2", just as the semester "s1" is associated with years 2020 and 2021 + +The Relationships Descriptor, if present, MAY contain one property : + +* "description" : String with the description of the relationship between the two Fields. + ### Suggested implementations + + ### Notes From 4831132dcf5158243c7315941ff8cbff9c956b8c Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Wed, 27 Dec 2023 17:48:55 +0100 Subject: [PATCH 04/20] relationships pattern --- patterns/README.md | 171 ++++++++++++++++++++++++++------------------- 1 file changed, 99 insertions(+), 72 deletions(-) diff --git a/patterns/README.md b/patterns/README.md index 98274ecd..1e945caa 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -20,6 +20,25 @@ sidebar: auto This document describes various patterns for solving common problems, in ways that are not (yet) specified in any Frictionless Data specification. If we see increased adoption, or wide support, for any pattern, it is a prime candidate for formalising as part of a specification. +## Table of Contents + +1. [Private properties](#private-properties) +2. [Caching of resources](#caching-of-resources) +3. [Compression of resources](#compression-of-resources) +4. [Language support](#language-support) +5. [Translation support](#translation-support) +6. [Table Schema: Foreign Keys to Data Packages](#table-schema-foreign-keys-to-data-packages) +7. [Data Package Version](#data-package-version) +8. [Data Dependencies](#data-dependencies) +9. [Table Schema: metadata properties](#table-schema-metadata-properties) +10. [JSON Data Resources](#json-data-resources) +11. [Describing Data Package Catalogs using the Data Package Format](#describing-data-package-catalogs-using-the-data-package-format) +12. [Table Schema: Unique constraints](#table-schema-unique-constraints) +13. [Describing files inside a compressed file such as Zip](#describing-files-inside-a-compressed-file-such-as-zip) +14. [Missing values per field](#missing-values-per-field) +15. [Table Schema: Enum labels and ordering](#table-schema-enum-labels-and-ordering) +16. [Table Schema: Relationship between Fields](#table-schema-relationship-between-fields) + ## Private properties ### Overview @@ -1400,11 +1419,11 @@ values when exported in CSV format (e.g., "." in Stata or SAS, "NA" in R, or ### Overview -La structure des tabular dataset est simple : un ensemble de Fields regroupé dans un tableau. +The structure of tabular datasets is simple: a set of Fields grouped in a table. -Pourtant, les données présentes sont souvent complexes et traduisent une interdépendance entre les Fields (voir explications dans l'Internet-Draft [NTV tabular format (NTV-TAB)](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2)). +However, the data present is often complex and reflects an interdependence between Fields (see explanations in the Internet-Draft [NTV tabular format (NTV-TAB)](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2)). -Prenons l'exemple du dataset suivant: +Let's take the example of the following dataset: | country | region | code | population | |---------|----------------|-------|------------| @@ -1413,79 +1432,79 @@ Prenons l'exemple du dataset suivant: | Estonia | European Union | ES | 449 | | Nigeria | Africa | NI | 1460 | -Le schéma de données pour ce dataset indique dans le Field Descriptor "description" : +The data schema for this dataset indicates in the Field Descriptor "description": -* pour le Field "code": "country code alpha-2" -* pour le Field "population": "region population in 2022 (millions)" +* for the "code" Field : "country code alpha-2" +* for the "population" Field: "region population in 2022 (millions)" -Si maintenant, on regarde les données on constate que ce jeu de données n'est pas consistent car il contient deux erreurs de structure: +If we now look at the data we see that this dataset is not consistent because it contains two structural errors: -* La valeur du Field "code" doit être unique pour chaque pays, on ne peut donc avoir "ES" pour "Spain" et "Estonia", -* La valeur du Field "population" de "European Union" ne peut avoir deux valeurs différentes (449 et 48) +* The value of the "code" Field must be unique for each country, we cannot therefore have "ES" for "Spain" and "Estonia", +* The value of the "population" Field of "European Union" cannot have two different values (449 and 48) -Ces erreurs de structure rendent les données inexploitables et pourtant elles ne sont pas détectées dans la validation du jeu de données (dans la version actuelle de Table Schema, il n'y a pas de Descriptors pour exprimer cette dépendance entre deux champs). +These structural errors make the data unusable and yet they are not detected in the validation of the dataset (in the current version of Table Schema, there are no Descriptors to express this dependency between two fields). -L'objet de cette spécification est donc d'une part d'exprimer ces contraintes de structure dans le schéma de données et d'autre part de définir les contrôles associés à la validation du jeu de données. +The purpose of this specification is therefore on the one hand to express these structural constraints in the data schema and on the other hand to define the controls associated with the validation of a dataset. -### Contexte +### Context -Ce sujet a été étudié et traité pour les bases de données et a conduit d'une part à la définition d'une méthodologie de spécification des relations et d'autre part à la mise en oeuvre des bases de données relationnelles consistantes. +This subject was studied and treated for databases and led on the one hand to the definition of a methodology for specifying relationships and on the other hand to the implementation of consistent relational databases. -La méthodologie repose principalement sur les [Entity–relationship model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model) : +The methodology is mainly based on the [Entity–relationship model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model): > *An entity–relationship model (or ER model) describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types (which classify the things of interest) and specifies relationships that can exist between entities (instances of those entity types).* -The Entity–relationship model est déclinée selon the conceptual-logical-physical hierarchy. +The Entity–relationship model is broken down according to the conceptual-logical-physical hierarchy. -The Relationships sont exprimées de facon litterale par un nom et de façon structurée par une [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). +The Relationships are expressed literally by a name and in a structured way by a [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). ### Principles -Deux aspects sont à traiter : +Two aspects need to be addressed: -* **expression des relations**: +* **relationship expression**: - Cette méthodologie appliquée pour les bases de données peut être appliquée également pour les tabular data dont la structure est similaire à celle des tables des bases de données relationnelles mais dont la représentation des relations est différente (voir [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) utilisés dans les représentations tabulaires). + This methodology applied for databases can also be applied for tabular data whose structure is similar to that of relational database tables but whose representation of relationships is different (see [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) used in tabular representations). - Cette déclinaison est expliquée dans le [Notebook lié](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). + This variation is explained in the [Linked notebook](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). - L'utilisation d'un modèle de données est un moyen simple pour exprimer les relations mais il n'est pas obligatoire. On peut très bien exprimer directement les relations au niveau du schéma de données. + Using a data model is a simple way to express relationships but it is not required. We can very well express the relationships directly at the data schema level. -* **validité d'un jeu de données**: +* **validity of a dataset**: - Le contrôle de la validité d'une relation pour un jeu de données défini est une des fonctions de [l'analyse des structures tabulaires](https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis.pdf). Elle ne nécessite que des fonctions de comptage accessibles pour tout type de langage (voir un [exemple](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) d'implementation simple). + Checking the validity of a relationship for a defined dataset is one of the functions of [tabular structure analysis](https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis.pdf). It only requires counting functions accessible for any type of language (see a simple implementation [example](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb)). ### Proposed extensions -Une relation est définie par les informations suivantes : +A relationship is defined by the following information: -* les deux Fields impliqués, -* la représentation textuelle de la relation, -* la nature de la relation +* the two Fields involved, +* the textual representation of the relationship, +* the nature of the relationship -Trois propositions d'extension de Table Schema sont proposées : +Three proposals for extending Table Schema are presented: * **New Field Descriptor**: - Un Field Descriptor "relationships" est ajouté. - Les propriétés associées à ce Descriptor pourraient être : + A `relationships` Field Descriptor is added. + The properties associated with this Descriptor could be: - * "field" : name of the other Field impliqué - * "description" : description string (optional) - * "link": nature de la relationship + * `field`: name of the other Field involved + * `description`: description string (optional) + * `link`: nature of the relationship - Pros + Pros: * No mixing with other descriptors * Consistent with a field view - Cons + Cons: - * why choose one Field from the two ? + * why choose one Field from the two? - Examples : + Example: - ```python + ```json { "fields" : [ { "name": "country", "relationships": [ @@ -1507,26 +1526,26 @@ Trois propositions d'extension de Table Schema sont proposées : * **New Constraint Property**: - Une Constraint Property "relationships" est ajoutée. - Les propriétés associées à cette Property pourraient être : + A `relationships` Property of `constraint` Descriptor is added. + The properties associated with this Property could be: - * "field" : name of the other Field impliqué - * "description" : description string (optional) - * "link": nature de la relationship + * `field`: name of the other Field involved + * `description`: description string (optional) + * `link`: nature of the relationship - Pros + Pros: - * The "constraints" property is consistent with the point + * The `constraints` Descriptor is consistent with the point - Cons + Cons: - * This Property is an object (more complex than the other) + * This Property is an object (more complex than the other properties) * Need to add a level in the properties tree - * why choose one Field from the two ? + * why choose one Field from the two? - Examples : + Example: - ```python + ```json { "fields" : [ { "name": "country", "constrainst" : { @@ -1552,25 +1571,25 @@ Trois propositions d'extension de Table Schema sont proposées : * **New Table Descriptor**: - Un Table Descriptor "relationships" est ajouté. - Les propriétés associées à ce Descriptor pourraient être : + A `relationships` Table Descriptor is added. + The properties associated with this Descriptor could be: - * "fields" : array with the names of the two Fields impliqué - * "description" : description string (optional) - * "link": nature de la relationship + * `fields`: array with the names of the two Fields involved + * `description`: description string (optional) + * `link`: nature of the relationship - Pros + Pros: * No mixing with Fields descriptors - Cons + Cons: * Need to add a new Table Descriptor - Examples : + Example: - ```python - { "fields": [...], + ```json + { "fields": [ ], "relationships": [ { "fields" : [ "country", "code"], "description" : "is the country code alpha-2 of", @@ -1585,37 +1604,45 @@ Trois propositions d'extension de Table Schema sont proposées : ### Specification -Si la solution 3 est retenue (Table Descriptor), la spécification pourrait être la suivante: +Assuming solution 3 (Table Descriptor), the specification could be as follows: -The Relationships Descriptor MAY be used to define the dependency between fields. -The Relationships Descriptor, if present, MUST be a JSON object and MUST contain two properties : +The `relationships` Descriptor MAY be used to define the dependency between fields. +The `relationships` Descriptor, if present, MUST be an array where each entry in the array is an object and MUST contain two required properties and one optional: -* "fields" : Array with the property "name" of the two fields linked -* "link" : the nature of the relationship between them +* `fields` : Array with the property `name` of the two fields linked (required) +* `link` : String with the nature of the relationship between them (required) +* `description` : String with the description of the relationship between the two Fields (optional) -The "link" property value MUST be one of the three following : +The `link` property value MUST be one of the three following : -* "derived" : +* `derived` : * The values of the child field are dependant on the values of the parent field (i.e. a value in the parent field is associated with a single value in the child field). * e.g. The "Quarter" field [ "T1", "T2", "T2", "T1", "T2", "T1" ] and the "month" field [ "jan", "apr", "jun", "feb", "may", "jan"] are derived, * i.e. if a new entry "jun" is added, the corresponding "quarter" value must be "T2". -* "coupled" : +* `coupled` : * The values of one field are associated to the values of the other field. * e.g. The "Nickname" field [ "jock", "paulo", "lili", "paulo" ] and the "name" field [ "john", "paul", "leah", "paul" ] are coupled, * i.e. if a new entry "lili" is added, the corresponding "Name" value must be "leah" just as if a new entry "leah" is added, the corresponding "nickname" value must be "lili". -* "crossed" : +* `crossed` : This relationship means that all the different values of one field are associated with all the different values of the other field. e.g. the "Year" Field [ 2020, 2020, 2021, 2021] and the "Semester" Field [ "S1", "S2", "S1", "S2" ] are crossed i.e the year 2020 is associated to semesters "s1" and "s2", just as the semester "s1" is associated with years 2020 and 2021 -The Relationships Descriptor, if present, MAY contain one property : +### implementations -* "description" : String with the description of the relationship between the two Fields. +The implementation of a new Descriptor is not discussed here (no particular point to address). -### Suggested implementations +The control implementation is based on the following principles: +* calculation of the number of different values for the two Fields, +* calculation of the number of different values for the virtual Field composed of tuples of each of the values of the two Fields +* comparison of these three values to deduce the type of relationship +* comparison of the calculated relationship type with that defined in the data schema +The [implementation example](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) presents a calculation function using the standard python library for datasets with few values as well as a function built with NumPy functions. ### Notes + +If the relationships are defined in a data model, the generation of the relationships in the data schema can be automatic. From 5d59e7552dfcfc0cd11f92e555c85d63b1432565 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Fri, 29 Dec 2023 23:48:27 +0100 Subject: [PATCH 05/20] add example (field-relationship) --- patterns/README.md | 47 ++++++++++++++++++++++++++++------------------ 1 file changed, 29 insertions(+), 18 deletions(-) diff --git a/patterns/README.md b/patterns/README.md index 1e945caa..073c5d10 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1448,7 +1448,7 @@ The purpose of this specification is therefore on the one hand to express these ### Context -This subject was studied and treated for databases and led on the one hand to the definition of a methodology for specifying relationships and on the other hand to the implementation of consistent relational databases. +This subject was studied and treated for databases and led to the definition of a methodology for specifying relationships and to the implementation of consistent relational databases. The methodology is mainly based on the [Entity–relationship model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model): @@ -1466,19 +1466,19 @@ Two aspects need to be addressed: This methodology applied for databases can also be applied for tabular data whose structure is similar to that of relational database tables but whose representation of relationships is different (see [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) used in tabular representations). - This variation is explained in the [Linked notebook](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). + This variation is explained in the [linked notebook](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). - Using a data model is a simple way to express relationships but it is not required. We can very well express the relationships directly at the data schema level. + Using a data model is a simple way to express relationships but it is not required. We can express the relationships directly at the data schema level. * **validity of a dataset**: - Checking the validity of a relationship for a defined dataset is one of the functions of [tabular structure analysis](https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis.pdf). It only requires counting functions accessible for any type of language (see a simple implementation [example](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb)). + Checking the validity of a relationship for a defined dataset is one of the functions of [tabular structure analysis](https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis.pdf). It only requires counting functions accessible for any type of language (see [example of implementation](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb)). ### Proposed extensions A relationship is defined by the following information: -* the two Fields involved, +* the two Fields involved (the order of the Fields is important with the "derived" link), * the textual representation of the relationship, * the nature of the relationship @@ -1489,7 +1489,7 @@ Three proposals for extending Table Schema are presented: A `relationships` Field Descriptor is added. The properties associated with this Descriptor could be: - * `field`: name of the other Field involved + * `parent`: name of the other Field involved * `description`: description string (optional) * `link`: nature of the relationship @@ -1497,10 +1497,11 @@ Three proposals for extending Table Schema are presented: * No mixing with other descriptors * Consistent with a field view + * the direction of the relationship is clear Cons: - * why choose one Field from the two? + * a relationship is not a property of a Field Example: @@ -1508,14 +1509,14 @@ Three proposals for extending Table Schema are presented: { "fields" : [ { "name": "country", "relationships": [ - { "field" : "code", + { "parent" : "code", "description" : "is the country code alpha-2 of", "link" : "coupled" } ] } { "name": "region", "relationships": [ - { "field" : "population", + { "parent" : "population", "description" : "is the population of", "link" : "derived"} ] @@ -1527,21 +1528,23 @@ Three proposals for extending Table Schema are presented: * **New Constraint Property**: A `relationships` Property of `constraint` Descriptor is added. + The properties associated with this Property could be: - * `field`: name of the other Field involved + * `parent`: name of the other Field involved * `description`: description string (optional) * `link`: nature of the relationship Pros: * The `constraints` Descriptor is consistent with the point + * the direction of the relationship is clear Cons: * This Property is an object (more complex than the other properties) * Need to add a level in the properties tree - * why choose one Field from the two? + * a relationship is not a property of a Field Example: @@ -1550,7 +1553,7 @@ Three proposals for extending Table Schema are presented: { "name": "country", "constrainst" : { "relationships": [ - { "field" : "code", + { "parent" : "code", "description" : "is the country code alpha-2 of", "link" : "coupled" } ] @@ -1559,7 +1562,7 @@ Three proposals for extending Table Schema are presented: { "name": "region", "constrainst" : { "relationships": [ - { "field" : "population", + { "parent" : "population", "description" : "is the population of", "link" : "derived"} ] @@ -1585,6 +1588,7 @@ Three proposals for extending Table Schema are presented: Cons: * Need to add a new Table Descriptor + * The order of the Fields in the array is important with the "derived" link Example: @@ -1607,6 +1611,7 @@ Three proposals for extending Table Schema are presented: Assuming solution 3 (Table Descriptor), the specification could be as follows: The `relationships` Descriptor MAY be used to define the dependency between fields. + The `relationships` Descriptor, if present, MUST be an array where each entry in the array is an object and MUST contain two required properties and one optional: * `fields` : Array with the property `name` of the two fields linked (required) @@ -1616,19 +1621,22 @@ The `relationships` Descriptor, if present, MUST be an array where each entry in The `link` property value MUST be one of the three following : * `derived` : - * The values of the child field are dependant on the values of the parent field (i.e. a value in the parent field is associated with a single value in the child field). + + * The values of the child (second array element) field are dependant on the values of the parent (first array element) field (i.e. a value in the parent field is associated with a single value in the child field). * e.g. The "Quarter" field [ "T1", "T2", "T2", "T1", "T2", "T1" ] and the "month" field [ "jan", "apr", "jun", "feb", "may", "jan"] are derived, * i.e. if a new entry "jun" is added, the corresponding "quarter" value must be "T2". * `coupled` : + * The values of one field are associated to the values of the other field. * e.g. The "Nickname" field [ "jock", "paulo", "lili", "paulo" ] and the "name" field [ "john", "paul", "leah", "paul" ] are coupled, * i.e. if a new entry "lili" is added, the corresponding "Name" value must be "leah" just as if a new entry "leah" is added, the corresponding "nickname" value must be "lili". * `crossed` : - This relationship means that all the different values of one field are associated with all the different values of the other field. - e.g. the "Year" Field [ 2020, 2020, 2021, 2021] and the "Semester" Field [ "S1", "S2", "S1", "S2" ] are crossed - i.e the year 2020 is associated to semesters "s1" and "s2", just as the semester "s1" is associated with years 2020 and 2021 + + * This relationship means that all the different values of one field are associated with all the different values of the other field. + * e.g. the "Year" Field [ 2020, 2020, 2021, 2021] and the "Semester" Field [ "S1", "S2", "S1", "S2" ] are crossed + * i.e the year 2020 is associated to semesters "s1" and "s2", just as the semester "s1" is associated with years 2020 and 2021 ### implementations @@ -1641,8 +1649,11 @@ The control implementation is based on the following principles: * comparison of these three values to deduce the type of relationship * comparison of the calculated relationship type with that defined in the data schema -The [implementation example](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) presents a calculation function using the standard python library for datasets with few values as well as a function built with NumPy functions. +The [implementation example](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) presents calculation function. +An [analysis tool](https://github.com/loco-philippe/tab-analysis/blob/main/README.md) is also available and accessible from pandas data. ### Notes If the relationships are defined in a data model, the generation of the relationships in the data schema can be automatic. + +The example presented in the [Overview](#overview) and the rule for converting a Data model into a Table schema are detailed in [this NoteBook](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example_schema.ipynb). From aea9ec490c490a2f1bf0fdeebbcadc35caf9ecbf Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Fri, 29 Dec 2023 23:55:24 +0100 Subject: [PATCH 06/20] link to nbviewer to see mermaid data model --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 073c5d10..370b9d04 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1656,4 +1656,4 @@ An [analysis tool](https://github.com/loco-philippe/tab-analysis/blob/main/READM If the relationships are defined in a data model, the generation of the relationships in the data schema can be automatic. -The example presented in the [Overview](#overview) and the rule for converting a Data model into a Table schema are detailed in [this NoteBook](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example_schema.ipynb). +The example presented in the [Overview](#overview) and the rule for converting a Data model into a Table schema are detailed in [this NoteBook](https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example_schema.ipynb). From eca844a5729eada752baa5509fa96d1bc75cbfde Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Thu, 4 Jan 2024 00:03:28 +0100 Subject: [PATCH 07/20] add open-data example --- patterns/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/patterns/README.md b/patterns/README.md index 370b9d04..90888137 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1657,3 +1657,5 @@ An [analysis tool](https://github.com/loco-philippe/tab-analysis/blob/main/READM If the relationships are defined in a data model, the generation of the relationships in the data schema can be automatic. The example presented in the [Overview](#overview) and the rule for converting a Data model into a Table schema are detailed in [this NoteBook](https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example_schema.ipynb). + +A complete example (60 000 rows, 50 fields) is used to validate the methodology and the tools: [open-data IRVE](https://www.data.gouv.fr/fr/reuses/les-donnees-irve-sont-elles-coherentes/) From 1cb9310093fbc4ff5249b8c5b3d8dd6a8489b075 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Tue, 16 Jan 2024 11:40:20 +0100 Subject: [PATCH 08/20] add links to the example --- patterns/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 90888137..b5e437c7 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1458,6 +1458,8 @@ The Entity–relationship model is broken down according to the conceptual-logic The Relationships are expressed literally by a name and in a structured way by a [cardinality](https://en.wikipedia.org/wiki/Cardinality_(data_modeling)). +The Entity–relationship model for the example presented in the [Overview](#overview) is detailed in [this NoteBook](https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example_schema.ipynb). + ### Principles Two aspects need to be addressed: @@ -1466,7 +1468,7 @@ Two aspects need to be addressed: This methodology applied for databases can also be applied for tabular data whose structure is similar to that of relational database tables but whose representation of relationships is different (see [patterns](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#section-2) used in tabular representations). - This variation is explained in the [linked notebook](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb). + This variation is explained in the [linked notebook](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/methodology.ipynb) and presented in the [example](https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example_schema.ipynb). Using a data model is a simple way to express relationships but it is not required. We can express the relationships directly at the data schema level. From 1c19b18d6bfa6e405fde742088f71fcbec6ac941 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Tue, 16 Jan 2024 11:58:33 +0100 Subject: [PATCH 09/20] change of specification examples --- patterns/README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/patterns/README.md b/patterns/README.md index b5e437c7..3930cd3e 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1625,20 +1625,20 @@ The `link` property value MUST be one of the three following : * `derived` : * The values of the child (second array element) field are dependant on the values of the parent (first array element) field (i.e. a value in the parent field is associated with a single value in the child field). - * e.g. The "Quarter" field [ "T1", "T2", "T2", "T1", "T2", "T1" ] and the "month" field [ "jan", "apr", "jun", "feb", "may", "jan"] are derived, - * i.e. if a new entry "jun" is added, the corresponding "quarter" value must be "T2". + * e.g. The "name" field [ "john", "paul", "leah", "paul" ] and the "Nickname" field [ "jock", "paulo", "lili", "paulo" ] are derived, + * i.e. if a new entry "leah" is added, the corresponding "nickname" value must be "lili". * `coupled` : * The values of one field are associated to the values of the other field. - * e.g. The "Nickname" field [ "jock", "paulo", "lili", "paulo" ] and the "name" field [ "john", "paul", "leah", "paul" ] are coupled, - * i.e. if a new entry "lili" is added, the corresponding "Name" value must be "leah" just as if a new entry "leah" is added, the corresponding "nickname" value must be "lili". + * e.g. The "Country" field [ "france", "spain", "estonia", "spain" ] and the "code alpha-2" field [ "FR", "ES", "EE", "ES" ] are coupled, + * i.e. if a new entry "estonia" is added, the corresponding "code alpha-2" value must be "EE" just as if a new entry "EE" is added, the corresponding "Country" value must be "estonia". * `crossed` : * This relationship means that all the different values of one field are associated with all the different values of the other field. - * e.g. the "Year" Field [ 2020, 2020, 2021, 2021] and the "Semester" Field [ "S1", "S2", "S1", "S2" ] are crossed - * i.e the year 2020 is associated to semesters "s1" and "s2", just as the semester "s1" is associated with years 2020 and 2021 + * e.g. the "Year" Field [ 2020, 2020, 2021, 2021] and the "Population" Field [ "estonia", "spain", "estonia", "spain" ] are crossed + * i.e the year 2020 is associated to population of "spain" and "estonia", just as the population of "estonia" is associated with years 2020 and 2021 ### implementations From 9fadab6dbed60ecf3dffefde3681e4f0ce771eda Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:48:17 +0100 Subject: [PATCH 10/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 3930cd3e..892e9d41 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1499,7 +1499,7 @@ Three proposals for extending Table Schema are presented: * No mixing with other descriptors * Consistent with a field view - * the direction of the relationship is clear + * The direction of the relationship is clear Cons: From 58dff9032dd0db31d6b5df61c7997ff2b7bb88f5 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:48:46 +0100 Subject: [PATCH 11/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 892e9d41..86f8e9d5 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1503,7 +1503,7 @@ Three proposals for extending Table Schema are presented: Cons: - * a relationship is not a property of a Field + * A relationship is not a property of a Field Example: From c1fd4a8869bf7e489af0059de9864bbc78c281c9 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:49:09 +0100 Subject: [PATCH 12/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 86f8e9d5..d8ecb705 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1529,7 +1529,7 @@ Three proposals for extending Table Schema are presented: * **New Constraint Property**: - A `relationships` Property of `constraint` Descriptor is added. + A `relationships` Property of `constraints` Descriptor is added. The properties associated with this Property could be: From 4c42335dd5544b08397ab315af9ef54ee980088e Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:51:26 +0100 Subject: [PATCH 13/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index d8ecb705..6acd614c 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1540,7 +1540,7 @@ Three proposals for extending Table Schema are presented: Pros: * The `constraints` Descriptor is consistent with the point - * the direction of the relationship is clear + * The direction of the relationship is clear Cons: From e11baf4c4f57ab812c3fdbc469a6c1e081e9404e Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:51:47 +0100 Subject: [PATCH 14/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 6acd614c..0dc7b5fc 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1553,7 +1553,7 @@ Three proposals for extending Table Schema are presented: ```json { "fields" : [ { "name": "country", - "constrainst" : { + "constraints" : { "relationships": [ { "parent" : "code", "description" : "is the country code alpha-2 of", From 53e8041c8ba53443c4122beea10c6e869c534b31 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:51:59 +0100 Subject: [PATCH 15/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 0dc7b5fc..5eb85e82 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1562,7 +1562,7 @@ Three proposals for extending Table Schema are presented: } } { "name": "region", - "constrainst" : { + "constraints" : { "relationships": [ { "parent" : "population", "description" : "is the population of", From e67c6104f80ebb944a469041768d0e11a0e919bc Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:52:14 +0100 Subject: [PATCH 16/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 5eb85e82..0370dff5 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1616,7 +1616,7 @@ The `relationships` Descriptor MAY be used to define the dependency between fiel The `relationships` Descriptor, if present, MUST be an array where each entry in the array is an object and MUST contain two required properties and one optional: -* `fields` : Array with the property `name` of the two fields linked (required) +* `fields`: Array with the property `name` of the two fields linked (required) * `link` : String with the nature of the relationship between them (required) * `description` : String with the description of the relationship between the two Fields (optional) From 48d9f58c54d10d4f4b987aaf8541dd0c3ebad479 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:52:29 +0100 Subject: [PATCH 17/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 0370dff5..664a2ec0 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1640,7 +1640,7 @@ The `link` property value MUST be one of the three following : * e.g. the "Year" Field [ 2020, 2020, 2021, 2021] and the "Population" Field [ "estonia", "spain", "estonia", "spain" ] are crossed * i.e the year 2020 is associated to population of "spain" and "estonia", just as the population of "estonia" is associated with years 2020 and 2021 -### implementations +### Implementations The implementation of a new Descriptor is not discussed here (no particular point to address). From db12844cb07d239fe3c1215ef5ee33d3093e14b2 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Mon, 29 Jan 2024 23:52:45 +0100 Subject: [PATCH 18/20] Update patterns/README.md Co-authored-by: Peter Desmet --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index 664a2ec0..d8f083c9 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1631,7 +1631,7 @@ The `link` property value MUST be one of the three following : * `coupled` : * The values of one field are associated to the values of the other field. - * e.g. The "Country" field [ "france", "spain", "estonia", "spain" ] and the "code alpha-2" field [ "FR", "ES", "EE", "ES" ] are coupled, + * e.g. The "Country" field [ "france", "spain", "estonia", "spain" ] and the "code alpha-2" field [ "FR", "ES", "EE", "ES" ] are coupled, * i.e. if a new entry "estonia" is added, the corresponding "code alpha-2" value must be "EE" just as if a new entry "EE" is added, the corresponding "Country" value must be "estonia". * `crossed` : From a595df83af22d0adb6b9ea89421697bf954df864 Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Tue, 30 Jan 2024 00:00:19 +0100 Subject: [PATCH 19/20] Update patterns/README --- patterns/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patterns/README.md b/patterns/README.md index d8f083c9..38ee8bf5 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1539,7 +1539,7 @@ Three proposals for extending Table Schema are presented: Pros: - * The `constraints` Descriptor is consistent with the point + * The `constraints` Descriptor is consistent with the purpose of the proposal * The direction of the relationship is clear Cons: From 5ec9ffec11a69932d118c7734eeb4d33b2b7bb7a Mon Sep 17 00:00:00 2001 From: Philippe THOMY Date: Fri, 23 Feb 2024 10:16:19 +0100 Subject: [PATCH 20/20] Relationship : only one option + add link --- patterns/README.md | 93 +++------------------------------------------- 1 file changed, 6 insertions(+), 87 deletions(-) diff --git a/patterns/README.md b/patterns/README.md index 38ee8bf5..68f82b69 100644 --- a/patterns/README.md +++ b/patterns/README.md @@ -1484,95 +1484,13 @@ A relationship is defined by the following information: * the textual representation of the relationship, * the nature of the relationship -Three proposals for extending Table Schema are presented: +Three proposals for extending Table Schema are being considered: -* **New Field Descriptor**: +* New Field Descriptor +* New Constraint Property +* New Table Descriptor - A `relationships` Field Descriptor is added. - The properties associated with this Descriptor could be: - - * `parent`: name of the other Field involved - * `description`: description string (optional) - * `link`: nature of the relationship - - Pros: - - * No mixing with other descriptors - * Consistent with a field view - * The direction of the relationship is clear - - Cons: - - * A relationship is not a property of a Field - - Example: - - ```json - { "fields" : [ - { "name": "country", - "relationships": [ - { "parent" : "code", - "description" : "is the country code alpha-2 of", - "link" : "coupled" } - ] - } - { "name": "region", - "relationships": [ - { "parent" : "population", - "description" : "is the population of", - "link" : "derived"} - ] - } - ] - } - ``` - -* **New Constraint Property**: - - A `relationships` Property of `constraints` Descriptor is added. - - The properties associated with this Property could be: - - * `parent`: name of the other Field involved - * `description`: description string (optional) - * `link`: nature of the relationship - - Pros: - - * The `constraints` Descriptor is consistent with the purpose of the proposal - * The direction of the relationship is clear - - Cons: - - * This Property is an object (more complex than the other properties) - * Need to add a level in the properties tree - * a relationship is not a property of a Field - - Example: - - ```json - { "fields" : [ - { "name": "country", - "constraints" : { - "relationships": [ - { "parent" : "code", - "description" : "is the country code alpha-2 of", - "link" : "coupled" } - ] - } - } - { "name": "region", - "constraints" : { - "relationships": [ - { "parent" : "population", - "description" : "is the population of", - "link" : "derived"} - ] - } - } - ] - } - ``` +After discussions only the third is retained (a relationship between fields associated to a Field) and presented below: * **New Table Descriptor**: @@ -1653,6 +1571,7 @@ The control implementation is based on the following principles: The [implementation example](https://github.com/loco-philippe/Environmental-Sensing/blob/main/property_relationship/example.ipynb) presents calculation function. An [analysis tool](https://github.com/loco-philippe/tab-analysis/blob/main/README.md) is also available and accessible from pandas data. +An example of implementation as `custom_check` is available [here](https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/property_relationship/relationship_descriptor.ipynb). ### Notes