Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

Separate concerns between metadata enrichment and data structure in Data Resource using resource.type. Remove Tabular Data Resource from v2 website #51

Merged
merged 9 commits into from
Apr 12, 2024

Conversation

roll
Copy link
Member

@roll roll commented Apr 4, 2024


Rationale

Currently, implementations have only one way to identify a resource type -- checking if profile: tabular-data-resource. But Tabular Data Resource can have a profile set to URL; it also can be set to an extension URL. It makes it basically impossible to guarantee resource's type information communication.

I think it will be more robust and extendable if the metadata enrichment system (profiles/extensions) is not mixed with the logical data structure (text/table/etc) as it is mostly orthogonal. For example, we might have Fiscal Data Resource (extra properties) but it still needs to be a table structurally.

This change might highly simplify the Data Resource spec structure and perception. At the same time, I don't think that since the v1 release there are any use cases that rely on resource.profile for anything else rather than resource type communication. Anyway, in frictionlessdata/datapackage#42 we will still have a way to extend resource using profile/$schema but it won't be coupled to the data structure.

Preview deployment:

Copy link

cloudflare-workers-and-pages bot commented Apr 4, 2024

Deploying datapackage with  Cloudflare Pages  Cloudflare Pages

Latest commit: de2b7f5
Status: ✅  Deploy successful!
Preview URL: https://22f4da46.datapackage.pages.dev
Branch Preview URL: https://887-separate-concerns-betwee.datapackage.pages.dev

View logs

@roll roll changed the title Separate concerns between metadata enrichment and data structure Separate concerns between metadata enrichment and data structure in Data Resource: resource.type Apr 4, 2024
@roll roll changed the title Separate concerns between metadata enrichment and data structure in Data Resource: resource.type Separate concerns between metadata enrichment and data structure in Data Resource: resource.type. Remove Tabular Data Resource from v2 website Apr 4, 2024
@roll roll changed the title Separate concerns between metadata enrichment and data structure in Data Resource: resource.type. Remove Tabular Data Resource from v2 website Separate concerns between metadata enrichment and data structure in Data Resource using resource.type. Remove Tabular Data Resource from v2 website Apr 4, 2024

#### `schema`

A Tabular Data Resource `SHOULD` have a `schema` property to describe a tabular schema of the resource data. If provided, the `schema` property `MUST` be a [Table Schema](../table-schema) descriptor in a form of an object or [URL-or-Path](../glossary/#url-or-path).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine to leave this as SHOULD, but previously a resource with "profile"="tabular-data-resource" (i.e. a "Tabular Data Resource") REQUIRED the schema property.

I just want to make sure this change is intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterdesmet
I think one of the good consequences of removing specs fragmentation is that we can simplify things. What I read here - https://specs.frictionlessdata.io/tabular-data-resource/#specification - is that schema needs to be Table Schema, but it doesn't say that schema MUST be present. I also thought that schema was required, but I don't see it in the text.

Personally, I think that not requiring schema is better as it opens more usage scenarios based on incremental enhancement model


Examples:
- `table`: indicates that the resource is tabular as per [Tabular Data](../glossary/#tabular-data) definition. Please read more about [Tabular Resource](#tabular) properties.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that glossary item already exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to wait for #49 to get merged

@@ -161,49 +165,43 @@ Or inline CSV:
Prior to release 1.0.0-beta.18 (Nov 17 2016) there was a `url` property distinct from `path`. In order to support backwards compatibility, implementors `MAY` want to automatically convert a `url` property to a `path` property and issue a warning.
:::

### `profile`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is profile then not defined at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we goes with $schema I think it's only needed as a backward-compat note

@peterdesmet
Copy link
Member

peterdesmet commented Apr 8, 2024

For clarification: are we abandoning the profile property in all specs then?

  • In Data Package in favour of $schema, see Versioning and extensions #42.
  • In Data Resource (this PR) in favour of type. Maybe some clarification should be provided on how to extend Data Resource with e.g. $schema (rather than the v1 profile). I see this is now suggested in Versioning and extensions #42. Note we will then have $schema and schema.

@ezwelty
Copy link

ezwelty commented Apr 10, 2024

I like the symmetry of resource.type and field.type and agree that it feels more intuitive. What role could extensions have in defining or requiring resource.type? It seems that most extensions would be written for a specific resource type.

@roll
Copy link
Member Author

roll commented Apr 10, 2024

@peterdesmet

For clarification: are we abandoning the profile property in all specs then?

I would say that current proposals just rename profile -> $schema (I would re-iterate on this renaming again on the Community Call but I feel that using $schema gives a lot of benefits), keeping almost exactly the same semantics for this property but also:

At the same type as schema/profile being an URL with versioning can't now serve the needs of indicating a resource type, we introduce additional resource.type property. I agree with @ezwelty that is super natural to have this symmetry between resource.type and field.type (both resource and field are collection items in the hierarchy). In the future it also opens a door for adding other resource types like json with e.g. jsonSchema etc and probably we can't create a new spec for every new resource type.

@roll
Copy link
Member Author

roll commented Apr 10, 2024

@ezwelty
As we figured it out in frictionlessdata/datapackage#823, modern JSON Schema gives us an ability to define almost any data package structure an extension needs, for example, this definition will accept only data packages with exactly 1 resource, that is a table called data (and we can required a specific schema, format etc as well):

resources:
  type: array
  items: false
  prefixItems:
    - type: object
      properties:
        name: 
          enum: [data]
        type:
          enum: [table]

@roll
Copy link
Member Author

roll commented Apr 12, 2024

ACCEPTED by WG (6/9)

@roll roll merged commit d6afa4c into main Apr 12, 2024
2 checks passed
@roll roll deleted the 887/separate-concerns-between-types-and-extensions branch April 12, 2024 07:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate concerns between metadata extensions and resource types
3 participants