You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Note that I would consider this proposal to be "low-priority" at the present moment, because it depends on the acceptance of the categorical field type in #875 )
Sometimes tabular data are produced in a "long" format that combines data of multiple different types into a single field / column. I see this form of data a lot in event-driven sensor data. For example:
measurementType
measurementValue
cloudiness
partly cloudy
cloudiness
cloudy
temperature
1
wind force
5
temperature
10
Where:
if measurementType = cloudiness then measurementValue:
Here, measurementValue is not a single type, but actually a union of three types: either a cloudiness measurement, a temperature measurement, or a wind force measurement, each with their own type definitions and constraints. More specifically, this is a tagged union aka discriminated union compound type, where type of measurementValue depends on the "tag" or "discriminator" found in measurmentType.
Tagged union types are a well-established, well-understood abstraction already implemented in many programming languages (e.g. python, rust, etc.) and semantic data parsing / validation libraries (e.g. python's pydantic; and typescript's zod).
Implementing this behavior as a tagged union field type would allow implementations to validate this type of field by parsing its underlying types. It could also perform exhaustiveness checks on the definition (ensure that all levels in the categorical measurementType had corresponding type definitions). It would also facilitate implementations pivoting into wider table formats, because the dependent type definitions would translate into the column types of the resulting wide columns.
Here's an example of how a tagged union field type might look like in frictionless (using the proposed categorical syntax in #875:
Note that the field-level validation on this type would ensure that all the levels of the measurementType categorical field were represented as keys of the match property in the measurementValue field. For example, if temperature wasn't defined as a key in the match property, this would trigger a validation error because temperature is one of the levels of the measurementType field, As mentioned earlier, this is a common feature of tagged union types.
If there is interest in this type, I can put together a more formal definition of the proposed union field's type signature (and RFC language).
The text was updated successfully, but these errors were encountered:
(Adapted from my comment here at @peterdesmet's request!)
(Note that I would consider this proposal to be "low-priority" at the present moment, because it depends on the acceptance of the categorical field type in #875 )
Sometimes tabular data are produced in a "long" format that combines data of multiple different types into a single field / column. I see this form of data a lot in event-driven sensor data. For example:
Where:
if
measurementType = cloudiness
thenmeasurementValue
:type = categorical
categories = ["clear", "mostly clear", "partly cloudy", "mostly cloudy", "cloudy", "unknown"]
If
measurementType = temperature
thenmeasurementValue
:type = number
constraints.min = 0
constraints.max = 20
If
measurementType = wind force
thenmeasurementValue
:type = categorical
categories = [0, 1, 2, 3, 4, 5]
(Example adapted from @peterdesmet's work here)
Here,
measurementValue
is not a single type, but actually aunion
of three types: either acloudiness
measurement, atemperature
measurement, or awind force
measurement, each with their own type definitions and constraints. More specifically, this is atagged union
akadiscriminated union
compound type, where type ofmeasurementValue
depends on the "tag" or "discriminator" found inmeasurmentType
.Tagged union types are a well-established, well-understood abstraction already implemented in many programming languages (e.g. python, rust, etc.) and semantic data parsing / validation libraries (e.g. python's pydantic; and typescript's zod).
Implementing this behavior as a tagged union field type would allow implementations to validate this type of field by parsing its underlying types. It could also perform exhaustiveness checks on the definition (ensure that all levels in the categorical
measurementType
had corresponding type definitions). It would also facilitate implementationspivot
ing into wider table formats, because the dependent type definitions would translate into the column types of the resulting wide columns.Here's an example of how a tagged union field type might look like in frictionless (using the proposed categorical syntax in #875:
Note that the field-level validation on this type would ensure that all the levels of the
measurementType
categorical field were represented as keys of thematch
property in themeasurementValue
field. For example, iftemperature
wasn't defined as a key in thematch
property, this would trigger a validation error becausetemperature
is one of the levels of themeasurementType
field, As mentioned earlier, this is a common feature of tagged union types.If there is interest in this type, I can put together a more formal definition of the proposed
union
field's type signature (and RFC language).The text was updated successfully, but these errors were encountered: