diff --git a/content/docs/specifications/glossary.md b/content/docs/specifications/glossary.md index c514ce3c..5c15a4d2 100644 --- a/content/docs/specifications/glossary.md +++ b/content/docs/specifications/glossary.md @@ -95,3 +95,119 @@ Example of a relative path that this will work both as a relative path on disk a :::caution[Security] `/` (absolute path) and `../` (relative parent path) are forbidden to avoid security vulnerabilities when implementing data package software. These limitations on resource `path` ensure that resource paths only point to files within the data package directory and its subdirectories. This prevents data package software being exploited by a malicious user to gain unintended access to sensitive information. For example, suppose a data package hosting service stores packages on disk and allows access via an API. A malicious user uploads a data package with a resource path like `/etc/passwd`. The user then requests the data for that resource and the server naively opens `/etc/passwd` and returns that data to the caller. ::: + +### Tabular Data + +Tabular data consists of a list of rows. Each row has a list of fields (columns). We usually expect that each row has the same list of fields and thus we can talk about the fields for the table as a whole. + +In case of tables in spreadsheets or CSV files we often interpret the first row as a header row, giving the names of the fields. By contrast, in other situations, e.g. tables in SQL databases, the field names are explicitly designated. + +To illustrate, here's a classic spreadsheet table: + +```text +field field + | | + | | + V V + + A | B | C | D <--- Row (Header) + ------------------------------------ + valA | valB | valC | valD <--- Row + ... +``` + +In JSON, a table would be: + +```json +[ + { "A": value, "B": value, ... }, + { "A": value, "B": value, ... }, + ... +] +``` + +### Data Representation + +In order to talk about data representation and processing of tabular data from data sources, it is useful to introduce the concepts of the `physical`, `native`, and `logical` representation of data. + +#### Physical Representation + +The `physical` representation of data refers to the representation of data in any form that is used to store data, for example, in a CSV or JSON serialized file on a disk. Usually, the data stored is some binary format but strictly speaking not limited to it in the context of the Data Package standard. + +For example, here is a hexadecimal representation of a CSV file encoded using "UTF-8" encoding and stored on a disk: + +```text title=table.csv +69 64 7C 6E 61 6D 65 0A 31 7C 61 70 70 6C 65 0A 32 7C 6F 72 61 6E 67 65 +``` + +For a reference, the file contents after being decoded to a textual form: + +```text +id|name +1|apple +2|orange +``` + +#### Native Representation + +The `native` representation of data refers to the representation of data in a form that is produced by a format-specific driver in some computational environment. The Data Package Standard itself does not define any data formats and relies on existent data formats (such as CSV, JSON, or SQL) and corresponding drivers on the implementations level. + +Having a Data Resource definition as below: + +```json +{ + "path": "table.csv", + "format": "csv", + "dialect": { + "delimiter": "|" + } +} +``` + +The data from the CSV example above will be in `native` representation (we use a JavaScript-based environment for illustration): + +```javascript +{id: "1", name: "apple"} +{id: "2", name: "orange"} +``` + +Note that handled by a CSV reader that took into account the dialect information, the data has been transformed from a binary form to a data structure. In real implementation it could be a data stream, a data frame, or other forms. + +#### Logical Representation + +The `logical` representation of data refers to the "ideal" representation of the data in terms of the Data Package standard types, data structures, and relations, all as defined by the specifications. We could say that the specifications is about the logical representation of data, as well as about ways in which to handle serialization and deserialization between `physical` representation of data and the `logical` representation of data. + +Having a Data Resource definition as below: + +```json +{ + "path": "table.csv", + "format": "csv", + "dialect": { + "delimiter": "|" + }, + "schema": { + "fields": [ + { "name": "id", "type": "integer" }, + { "name": "name", "type": "string" } + ] + } +} +``` + +The data from the CSV example above will be in `logical` representation (we use a JavaScript-based environment for illustration): + +```javascript +{id: 1, name: "apple"} +{id: 2, name: "orange"} +``` + +Note that handled by a post-processor that took into account the schema information, the data has been transformed from a partially typed data structure to the fully typed data structure that is compliant to the provided Table Schema. + +:::tip[Data Formats] +The example below uses the CSV format that has only one native data type i.e. `string`. Other popular data formats like JSON or Parquet have more native data types that in many cases make data in `native` and `logical` form closer to each other, or, sometimes, even identical. +::: + +:::note[Implementation Note] +Due to diversity of data formats and computational environments, there is no clear boundary between Table Dialect and Table Schema metadata and their roles in `physical-to-native` and `native-to-logical` transformation. It is recommended to maximize the usage of an available data format driver to get `native` data as closer as possible to `logical` data and do post-processing for all unsupported features. +::: diff --git a/content/docs/specifications/table-schema.md b/content/docs/specifications/table-schema.md index d8e8d6d2..90bcb44d 100644 --- a/content/docs/specifications/table-schema.md +++ b/content/docs/specifications/table-schema.md @@ -27,47 +27,14 @@ Table Schema is a simple language- and implementation-agnostic way to declare a ## Concepts -### Tabular Data +This specification heavily relies on the following concepts: -Tabular data consists of a set of rows. Each row has a set of fields (columns). We usually expect that each row has the same set of fields and thus we can talk about _the_ fields for the table as a whole. +- [Tabular Data](../glossary/#tabular-data) +- [Data Representation](../glossary/#data-representation) -In case of tables in spreadsheets or CSV files we often interpret the first row as a header row, giving the names of the fields. By contrast, in other situations, e.g. tables in SQL databases, the field names are explicitly designated. +In this document, we will explicitly refer to either the [Native Representation](../glossary/#native-representation) or [Logical Representation](../glossary/#logical-representation) of data in places where it prevents ambiguity for those engaging with the specification, especially implementors. Note, that this specification does not deal in any way with [Physical Representation](../glossary/#physical-representation) of data. -To illustrate, here's a classic spreadsheet table: - -```text -field field - | | - | | - V V - - A | B | C | D <--- Row (Header) - ------------------------------------ - valA | valB | valC | valD <--- Row - ... -``` - -In JSON, a table would be: - -```json -[ - { "A": value, "B": value, ... }, - { "A": value, "B": value, ... }, - ... -] -``` - -### Data Representation - -In order to talk about the representation and processing of tabular data from text-based sources, it is useful to introduce the concepts of the _physical_ and the _logical_ representation of data. - -The _physical representation_ of data refers to the representation of data as text on disk, for example, in a CSV or JSON file. This representation can have some _type_ information (JSON, where the primitive types that JSON supports can be used) or not (CSV, where all data is represented in string form). - -The _logical representation_ of data refers to the "ideal" representation of the data in terms of primitive types, data structures, and relations, all as defined by the specification. We could say that the specification is about the logical representation of data, as well as about ways in which to handle conversion of a physical representation to a logical one. - -In this document, we'll explicitly refer to either the _physical_ or _logical_ representation in places where it prevents ambiguity for those engaging with the specification, especially implementors. - -For example, `constraints` `SHOULD` be tested on the logical representation of data, whereas a property like `missingValues` applies to the physical representation of the data. +Note, that whenever a native value is allowed to be provided in this spec, the most similar JSON type should be used to represent it. If no such type exists (e.g. in case there's a native date value), a string representation of that value should be provided. Such mappings between native types and JSON types, and the string representations described above are file format specific and left for implementors to decide (unless defined explicitly in this specification or its appendixes). ## Descriptor @@ -127,18 +94,17 @@ A Table Schema descriptor `MAY` contain a property `fieldsMatch` that `MUST` be Many datasets arrive with missing data values, either because a value was not collected or it never existed. Missing values may be indicated simply by the value being empty in other cases a special value may have been used e.g. `-`, `NaN`, `0`, `-9999` etc. -`missingValues` dictates which string values `MUST` be treated as `null` values. This conversion to `null` is done before any other attempted type-specific string conversion. The default value `[ "" ]` means that empty strings will be converted to null before any other processing takes place. Providing the empty list `[]` means that no conversion to null will be done, on any value. - -`missingValues` `MUST` be an `array` where each entry is a `string`. +The `missingValues` property configures which native values `MUST` be treated as logical `null` values. If provided, the `missingValues` property `MUST` be an `array` of values. -**Why strings**: `missingValues` are strings rather than being the data type of the particular field. This allows for comparison prior to casting and for fields to have missing value which are not of their type, for example a `number` field to have missing values indicated by `-`. +This conversion to `null` is done before any other attempted type-specific conversion. The default value `[ "" ]` means that empty strings will be converted to null before any other processing takes place. Providing the empty list `[]` means that no conversion to null will be done, on any value. -Examples: +Examples of the `missingValues` property: ```text "missingValues": [""] "missingValues": ["-"] "missingValues": ["NaN", "-"] +"missingValues": [-9999] ``` #### `primaryKey` @@ -425,11 +391,19 @@ Supported formats: - **binary**: A base64 encoded string representing binary data. - **uuid**: A string that is a uuid. +**Native Representaiton** + +Values `MUST` be represented as strings. + ### `number` The field contains numbers of any kind including decimals. -The lexical formatting follows that of decimal in [XMLSchema](https://www.w3.org/TR/xmlschema-2/#decimal): a non-empty finite-length sequence of decimal digits separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: '-1.23', '12678967.543233', '+100000.00', '210'. +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings following the rules below. + +Formatting follows that of decimal in [XMLSchema](https://www.w3.org/TR/xmlschema-2/#decimal): a non-empty finite-length sequence of decimal digits separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: '-1.23', '12678967.543233', '+100000.00', '210'. The following special string values are permitted (case need not be respected): @@ -441,60 +415,75 @@ A number `MAY` also have a trailing: - exponent: this `MUST` consist of an E followed by an optional + or - sign followed by one or more decimal digits (0-9) -This lexical formatting `MAY` be modified using these additional properties: +Formatting `MAY` be modified using these additional properties: - **decimalChar**: A string whose value is used to represent a decimal point within the number. The default value is ".". - **groupChar**: A string whose value is used to group digits within the number. This property does not have a default value. A common value is "," e.g. "100,000". -- **bareNumber**: a boolean field with a default of `true`. If `true` the physical contents of this field `MUST` follow the formatting constraints already set out. If `false` the contents of this field may contain leading and/or trailing non-numeric characters (which implementors `MUST` therefore strip). The purpose of `bareNumber` is to allow publishers to publish numeric data that contains trailing characters such as percentages e.g. `95%` or leading characters such as currencies e.g. `€95` or `EUR 95`. Note that it is entirely up to implementors what, if anything, they do with stripped text. +- **bareNumber**: a boolean field with a default of `true`. If `true` the contents of this field `MUST` follow the formatting constraints already set out. If `false` the contents of this field may contain leading and/or trailing non-numeric characters (which implementors `MUST` therefore strip). The purpose of `bareNumber` is to allow publishers to publish numeric data that contains trailing characters such as percentages e.g. `95%` or leading characters such as currencies e.g. `€95` or `EUR 95`. Note that it is entirely up to implementors what, if anything, they do with stripped text. ### `integer` The field contains integers - that is whole numbers. -Integer values are indicated in the standard way for any valid integer. +**Native Representaiton** -This lexical formatting `MAY` be modified using these additional properties: +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings following the rules below. + +Integer values are indicated in the standard way for any valid integer. Formatting `MAY` be modified using these additional properties: - **groupChar**: A string whose value is used to group digits within the integer. This property does not have a default value. A common value is "," e.g. "100,000". -- **bareNumber**: a boolean field with a default of `true`. If `true` the physical contents of this field `MUST` follow the formatting constraints already set out. If `false` the contents of this field may contain leading and/or trailing non-numeric characters (which implementors `MUST` therefore strip). The purpose of `bareNumber` is to allow publishers to publish numeric data that contains trailing characters such as percentages e.g. `95%` or leading characters such as currencies e.g. `€95` or `EUR 95`. Note that it is entirely up to implementors what, if anything, they do with stripped text. +- **bareNumber**: a boolean field with a default of `true`. If `true` the contents of this field `MUST` follow the formatting constraints already set out. If `false` the contents of this field may contain leading and/or trailing non-numeric characters (which implementors `MUST` therefore strip). The purpose of `bareNumber` is to allow publishers to publish numeric data that contains trailing characters such as percentages e.g. `95%` or leading characters such as currencies e.g. `€95` or `EUR 95`. Note that it is entirely up to implementors what, if anything, they do with stripped text. ### `boolean` -The field contains boolean (true/false) data. +The field contains boolean data i.e. logical `true` or logical `false`. -In the physical representations of data where boolean values are represented with strings, the values set in `trueValues` and `falseValues` are to be cast to their logical representation as booleans. `trueValues` and `falseValues` are arrays which can be customised to user need. The default values for these are in the additional properties section below. +**Native Representaiton** -The boolean field can be customised with these additional properties: +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as defined by the `trueValues` and `falseValues` properties that can be customized to user need: -- **trueValues**: `[ "true", "True", "TRUE", "1" ]` -- **falseValues**: `[ "false", "False", "FALSE", "0" ]` +- **trueValues**: An array of native values to be interpreted as logical `true`. The default is `[ "true", "True", "TRUE", "1" ]`. +- **falseValues**: An array of native values to be interpreted as logical `false`. The default is `[ "false", "False", "FALSE", "0" ]`. ### `object` The field contains a valid JSON object. +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be strings that are valid serialized JSON objects. + ### `array` The field contains a valid JSON array. +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be strings that are valid serialized JSON arrays. + ### `list` -The field contains data that is an ordered one-level depth collection of primitive values with a fixed item type. In the lexical representation, the field `MUST` contain a string with values separated by a delimiter which is `,` (comma) by default e.g. `value1,value2`. In comparison to the `array` type, the `list` type is directly modelled on the concept of SQL typed collections. +The field contains data that is an ordered one-level depth collection of primitive values with a fixed item type. In comparison to the `array` type, the `list` type is directly modelled on the concept of SQL typed collections. -`format`: no options (other than the default). +The list field can be customised with this additional property: -The list field can be customised with these additional properties: +- **itemType**: specifies the list item type in terms of existent Table Schema types. If present, it `MUST` be one of `string`, `integer`, `boolean`, `number`, `datetime`, `date`, and `time`. If not present, the default is `string`. A data consumer `MUST` process list items as it were individual values of the corresponding data type. -- **delimiter**: specifies the character sequence which separates lexically represented list items. If not present, the default is `,` (comma). -- **itemType**: specifies the list item type in terms of existent Table Schema types. If present, it `MUST` be one of `string`, `integer`, `boolean`, `number`, `datetme`, `date`, and `time`. If not present, the default is `string`. A data consumer `MUST` process list items as it were individual values of the corresponding data type. Note, that on lexical level only default formats are supported, for example, for a list with `itemType` set to `date`, items have to be in default form for dates i.e. `yyyy-mm-dd`. +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, the field `MUST` contain a string with list items separated by a delimiter which is `,` (comma) by default e.g. `value1,value2`. The list items `MUST` be serialized using a default format of the corresponding `itemType`. The delimiter can be customised with this additional property: + +- **delimiter**: specifies the character sequence which separates list items. If not present, the default is `,` (comma). ### `datetime` -The field contains a date with a time. +The field contains a date with a time and an optional timezone. -Supported formats: +**Native Representaiton** -- **default**: The lexical representation `MUST` be in a form defined by [XML Schema](https://www.w3.org/TR/xmlschema-2/#dateTime) containing required date and time parts, followed by optional milliseconds and timezone parts, for example, `2024-01-26T15:00:00` or `2024-01-26T15:00:00.300-05:00`. +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings in one of the following formats: + +- **default**: values `MUST` be in a form defined by [XML Schema](https://www.w3.org/TR/xmlschema-2/#dateTime) containing required date and time parts, followed by optional milliseconds and timezone parts, for example, `2024-01-26T15:00:00` or `2024-01-26T15:00:00.300-05:00`. - **\**: values in this field can be parsed according to ``. `` `MUST` follow the syntax of [standard Python / C strptime](https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior). Values in the this field `SHOULD` be parsable by Python / C standard `strptime` using ``. Example for `"format": ""%d/%m/%Y %H:%M:%S"` which would correspond to a date with time like: `12/11/2018 09:15:32`. - **any**: Any parsable representation of the value. The implementing library can attempt to parse the datetime via a range of strategies. An example is `dateutil.parser.parse` from the `python-dateutils` library. It is `NOT RECOMMENDED` to use `any` format as it might cause interoperability issues. @@ -502,9 +491,11 @@ Supported formats: The field contains a date without a time. -Supported formats: +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings in one of the following formats: -- **default**: The lexical representation `MUST` be `yyyy-mm-dd` e.g. `2024-01-26` +- **default**: values `MUST` be `yyyy-mm-dd` e.g. `2024-01-26` - **\**: The same as for `datetime` - **any**: The same as for `datetime` @@ -512,51 +503,69 @@ Supported formats: The field contains a time without a date. -Supported formats: +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings in one of the following formats: -- **default**: The lexical representation `MUST` be `hh:mm:ss` e.g. `15:00:00` +- **default**: values `MUST` be `hh:mm:ss` e.g. `15:00:00` - **\**: The same as for `datetime` - **any**: The same as for `datetime` ### `year` -A calendar year as per [XMLSchema `gYear`](https://www.w3.org/TR/xmlschema-2/#gYear). Usual lexical representation is `YYYY`. There are no format options. +The field contains a calendar year. + +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings as per [XMLSchema `gYear`](https://www.w3.org/TR/xmlschema-2/#gYear). Usual representation as a string is `YYYY`. ### `yearmonth` -A specific month in a specific year as per [XMLSchema `gYearMonth`](https://www.w3.org/TR/xmlschema-2/#gYearMonth). Usual lexical representation is: `YYYY-MM`. There are no format options. +The field containts a specific month in a specific year. + +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings as per [XMLSchema `gYearMonth`](https://www.w3.org/TR/xmlschema-2/#gYearMonth). Usual representation as a string is `YYYY-MM`. ### `duration` -A duration of time. +The field contains a duration of time. + +**Native Representaiton** -We follow the definition of [XML Schema duration datatype](http://www.w3.org/TR/xmlschema-2/#duration) directly and that definition is implicitly inlined here. +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings as per [XML Schema `duration`](http://www.w3.org/TR/xmlschema-2/#duration). -To summarize: the lexical representation for duration is the [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601#Durations) extended format PnYnMnDTnHnMnS, where nY represents the number of years, nM the number of months, nD the number of days, 'T' is the date/time separator, nH the number of hours, nM the number of minutes and nS the number of seconds. The number of seconds can include decimal digits to arbitrary precision. Date and time elements including their designator `MAY` be omitted if their value is zero, and lower order elements `MAY` also be omitted for reduced precision. +The duration `MUST` be in the [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601#Durations) extended format `PnYnMnDTnHnMnS`, where `nY` represents the number of years, `nM` the number of months, `nD` the number of days, `T` is the date/time separator, `nH` the number of hours, `nM` the number of minutes and `nS` the number of seconds. The number of seconds can include decimal digits to arbitrary precision. Date and time elements including their designator `MAY` be omitted if their value is zero, and lower order elements `MAY` also be omitted for reduced precision. ### `geopoint` -The field contains data describing a geographic point. +The field contains data describing a geographic point i.e. `lon` and `lat` values that are floating point numbers. -Supported formats: +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings in one of the following formats: - **default**: A string of the pattern "lon, lat", where each value is a number, and `lon` is the longitude and `lat` is the latitude (note the space is optional after the `,`). E.g. `"90.50, 45.50"`. - **array**: A JSON array, or a string parsable as a JSON array, of exactly two items, where each item is a number, and the first item is `lon` and the second item is `lat` e.g. `[90.50, 45.50]` -- **object**: A JSON object with exactly two keys, `lat` and `lon` and each value is a number e.g. `{"lon": 90.50, "lat": 45.50}` +- **object**: A JSON object with exactly two keys, `lon` and `lat` and each value is a number e.g. `{"lon": 90.50, "lat": 45.50}` ### `geojson` -The field contains a JSON object according to GeoJSON or TopoJSON spec. +The field contains a JSON object according to GeoJSON or TopoJSON specifications. Supported formats: - **default**: A geojson object as per the [GeoJSON spec](http://geojson.org/). -- **topojson**: A topojson object as per the [TopoJSON spec](https://github.com/topojson/topojson-specification/blob/master/README.md) +- **topojson**: A topojson object as per the [TopoJSON spec](https://github.com/topojson/topojson-specification/blob/master/README.md). + +**Native Representaiton** + +If supported, values `MUST` be natively represented by a data format. If not supported, values `MUST` be represented as strings that are valid serialized JSON objects. ### `any` -The field contains values of a unspecified or mixed type. A data consumer `MUST NOT` perform any processing on this field's values and `MUST` interpret them as it is in the data source. This data type is directly modelled on the concept of the `any` type of strongly typed object-oriented languages like [TypeScript](https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#any). +The field contains values of a unspecified or mixed type. A data consumer `MUST NOT` perform any processing on values and `MUST` interpret them as it is in native representation of data. This data type is directly modelled on the concept of the `any` type of strongly typed object-oriented languages like [TypeScript](https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#any). For example, having a Table Schema below: @@ -599,11 +608,15 @@ While this JSON data file will have logical values as below: Note, that for the CSV data source the `id` field is interpreted as a string because CSV supports only one data type i.e. string, and for the JSON data source the `id` field is interpreted as an integer because JSON supports a numeric data type and the value was declared as an integer. Also, for the Table Schema above a `type` property for each field can be omitted as it is a default field type. +**Native Representaiton** + +Values `MUST` be natively represented by a data format. + ## Field Constraints The `constraints` property on Table Schema Fields can be used by consumers to list constraints for validating field values. For example, validating the data in a [Tabular Data Resource](https://specs.frictionlessdata.io/tabular-data-package/) against its Table Schema; or as a means to validate data being collected or updated via a data entry interface. -All constraints `MUST` be tested against the logical representation of data, and the physical representation of constraint values `MAY` be primitive types as possible in JSON, or represented as strings that are castable with the `type` and `format` rules of the field. +All constraints `MUST` be tested against the logical representation of data, and the native representation of constraint values `MAY` be primitive types as possible in JSON, or represented as strings that are castable with the `type` and `format` rules of the field. A constraints descriptor `MUST` be a JSON `object` and `MAY` contain one or more of the following properties: @@ -612,7 +625,7 @@ A constraints descriptor `MUST` be a JSON `object` and `MAY` contain one or more - **Type**: boolean - **Fields**: all -Indicates whether this field cannot be `null`. If required is `false` (the default), then `null` is allowed. See the section on `missingValues` for how, in the physical representation of the data, strings can represent `null` values. +Indicates whether this field cannot be `null`. If required is `false` (the default), then `null` is allowed. See the section on `missingValues` for how, in the native representation of the data, strings can represent `null` values. ### `unique` @@ -624,14 +637,14 @@ If `true`, then all values for that field `MUST` be unique within the data file ### `minLength` - **Type**: integer -- **Fields**: collections (string, array, object) +- **Fields**: collections (string, list, array, object) An integer that specifies the minimum length of a value. ### `maxLength` - **Type**: integer -- **Fields**: collections (string, array, object) +- **Fields**: collections (string, list, array, object) An integer that specifies the maximum length of a value. diff --git a/profiles/source/dictionary/schema.yaml b/profiles/source/dictionary/schema.yaml index 9d5a59ad..d9774688 100644 --- a/profiles/source/dictionary/schema.yaml +++ b/profiles/source/dictionary/schema.yaml @@ -214,19 +214,13 @@ tableSchemaForeignKey: tableSchemaTrueValues: type: array minItems: 1 - items: - type: string default: ["true", "True", "TRUE", "1"] tableSchemaFalseValues: type: array minItems: 1 - items: - type: string default: ["false", "False", "FALSE", "0"] tableSchemaMissingValues: type: array - items: - type: string default: - "" description: Values that when encountered in the source, should be considered