Yaml as well as JSON for Data Package descriptor files #292

rufuspollock · 2016-09-11T07:37:22Z

Idea: allow data package descriptor files to be in yaml as well as json.

Why: yaml is easier to create and read for ordinary people. JSON is easy to get wrong.

Why not: adds complexity for all implementors of tools as they need and additional format.

I'm creating this for discussion. Very tentative idea atm.

Subissues:

Naming convention of additional properties in the descriptor #663

pwalsh · 2016-09-11T08:52:29Z

I'm neutral on this.

We programmers like to think YAML is easier for ordinary folk, but in my experience the importance of whitespace in YAML is actually a killer for ordinary folk - it is just a different type of problem to that of JSON.

However, I personally think YAML is a fine format, and supporting it as a first class citizen seems a reasonable choice, except in the browser where adding additional dependencies can actually matter (after we add YAML, we add TOML, etc., whatever the favoured serialisation format of the day, and suddenly we have bloat).

danfowler · 2016-09-26T19:51:31Z

Agreeing here that both YAML and JSON are easy to get wrong. Probably worth keeping an eye on the popularity of CSVY as a method of applying JSON Table Schema as YAML frontmatter to a CSV file.

rufuspollock · 2016-09-27T12:03:52Z

WONTFIX.

OK. If yaml is not really easier I think I'm going to close this as WONTFIX for now. It imposes additional costs on implementors without making it much easier for publishers.

Definitely open to reconsider if raised in future.

jgmill · 2016-12-19T15:03:59Z

I use YAML internally when writing the metadata for my data package because it makes for more readable code. For the output I then convert it to JSON as this is the standard you guys specified.

I think it makes sense to stick with one format for the standard. I'm neutral on whether that should be YAML or JSON as I'm not familiar with the specif pros and cons.

danfowler · 2017-02-10T22:51:35Z

@muehlenpfordt to support your point, @akariv made datapackage-pipelines which allows a pipeline creator to describe the Table Schema in YAML

danfowler · 2017-02-23T05:38:03Z

Another point to be made in this closed issue is that datapackage.json is also expressible in this Metatab format.

rufuspollock · 2018-07-09T07:12:35Z

I'm reopening this as i think yaml support would be really nice and simple and YAML is now really familiar (e.g. from jekyll etc) and is a lot easier to write than JSON IME.

rufuspollock · 2019-03-20T21:28:58Z

@pwalsh @akariv what do you think about going for this as an option going forward?

micimize · 2019-08-23T19:35:49Z

I don't think a valid data package should ever lack a datapackage.json, but maybe we could make _datapackage.json a location convention for generated package descriptors, and have tooling for generating them.

This way developers can use whatever source-of-truth (yaml, graphql schema, classes), and use/write generators for their specific use-case. We can always include these sources in packages for documentation purposes.

rufuspollock · 2020-02-16T14:35:08Z

YAML def looks like it is becoming a default for writing human-writable but computer parsable config e.g. look at on CI tooling. I think it is time we support yaml, perhaps even as the default.

peterdesmet · 2021-12-13T09:32:45Z

My 2 cents: as a developer I like yaml very much as a human readable config format, but I agree with @pwalsh's comment in #292 (comment): the importance of whitespace is not intuitive. Other than familiarity with a format, I'm not sure it offers that much benefit to the publishers, while placing quite a burden on implementors, with potentially more requests coming:

Being able to mix JSON (e.g. datapackage.json) with YAML (e.g. schema.yaml)
Supporting TOML, XML, ...
Being able to write extensions to specs (which are expressed as JSON schemas) in YAML

Since Data Packages are a container format for publishing and archiving data, I think it is good to keep a long term perspective in mind and be restrictive/conservative when it comes to specs and only support JSON.

ezwelty · 2024-04-10T07:36:37Z

I would have argued YAML is easier for a non-programmer to read, but in any case I've now made it a habit of also including a markdown and/or pdf rendering. I do find it easier to maintain in two specific cases: complex pattern constraints and long package/resource/field descriptions that span more than one line.

- pattern: https?:\/\/.+
- description: |-
    Drilling method:
    
    - mechanical
    - thermal

{
  "pattern": "https?:\\/\\/.+",
  "description": "Drilling method:\n\n- mechanical\n- thermal"
}

rufuspollock · 2024-06-30T08:37:45Z

If am understanding correctly this was closed as WONTFIX in frictionlessdata/datapackage-v2-draft#50

I'd just add my 2c that from my experience of last half dozen years using yaml is very attractive. I understand the issue of placing a burden on implementors - perhaps the burden can be lowered if one forbade e.g. "mixing and matching".

So flagging this for consideration in v2.1 or similar - this wouldn't be a breaking change and it could be an opt-in for tools gradually? (or, perhaps we have some method for optional extensions that can be tried out for a time and we see how it goes).

roll · 2024-07-01T08:19:08Z

@rufuspollock
In my opinion one of the most important things of v2 work was establishing this system that any change can be promoted by anyone and then voted by the Working Group so yes the current decision was to keep JSON-only but if there is still demand it totally needs to be re-opened following this process 👍

roll · 2024-07-01T08:20:41Z

BTW please take a look at the current wording regarding YAML - https://datapackage.org/standard/glossary/#descriptor

vkhodygo · 2024-07-19T16:22:27Z

A colleague of mine told me about your project recently, and it seems to be a life-saver. However,. I'm surprised that YAML is not the default option considering how widespread it is nowadays.

I'd like to add a few comments regarding what's been said here already:

It's easier for a human to create it manually. I anticipate remarks about white spaces and how it is possible to get it wrong. That is partially true, however, localising the error is much easier at least in small-sized files. Besides, modern IDEs allow to collapse parts of the code which makes visual inspection reasonably straightforward. At the same time, with a nested JSON a single misplaced bracket can easily send not only a single block but the whole structure into oblivion, and it's subjectively more difficult to fix this. One could argue that we should use tabs/spaces to counter this, but JSON in this case transforms into YAML with excessive brackets. Both options require validators of some sort; the actual external code doesn't care much about the representation, it all turns into a dictionary of some sort.
Most if not all modern languages can read YAML, you don't need to implement much. Even some obscure ones like, for example, Nim do.
Mixing JSON with YAML is a bit ambiguous, sticking with one format that's both human- and machine-readable is the best option. Still, nothing's wrong with the approach itself as long as all files are standard-/format-compliant; that's at least for the transition period.
Supporting TOML and XML and anything of the kind should be out of the question. The former is simpler in terms of semantics, and is probably less flexible when it comes to non-trivial metadata storage. To be completely honest here, I don't have much experience dealing with it so take this with a pinch of salt. The latter is just a nightmare for human-editing, and it uses too much space.
It is my understanding that writing extensions to specs is no different conceptually from writing the specs thus boiling down to the choice of the upstream.

One of the clear advantages of YAML is that it allows usage of tags/labels as well as references which is a nice feature to have to reduce data duplication and enforce data type compliance.

Note that this reply is opinionated mostly because I had to manually create metadata files for large datasets basically implementing a trimmed-down version of datapackage, and YAML was my first choice for that.

khusmann · 2024-07-19T19:36:34Z

@vkhodygo I agree YAML can be nice for authoring datapackages, but as you say:

Mixing JSON with YAML is a bit ambiguous, sticking with one format that's both human- and machine-readable is the best option.

So the current guidance in data package v2 is:

A descriptor MAY be serialized using alternative formats like YAML or TOML as an internal part of some project or system if supported by corresponding implementations. A descriptor SHOULD NOT be externally published in any other format rather than JSON.

I think this gives us the best of both worlds -- you can enjoy all the benefits of YAML you mention as you build the data package internally, but then when you publish, you simply render to JSON. This way publishers and data consumers can enjoy all the benefits of JSON's simple parse-ability & unambiguous standard, and gives us one standard format for easy exchange.

ezwelty · 2024-08-08T09:03:21Z

I think this gives us the best of both worlds -- you can enjoy all the benefits of YAML you mention as you build the data package internally, but then when you publish, you simply render to JSON

I'm wondering whether this is really the best of both worlds as I'm faced with exactly this step. First, it requires a custom build step. For a datapackage maintained on GitHub with a YAML, this rules out publishing directly to Zenodo using the standard GitHub-Zenodo integration. Second, I would argue that it increases the need for an additional file that contains a more human-friendly rendering of the JSON.

Here is a side-by-side comparison of YAML and JSON for a more complex datapackage. I'd argue the YAML can stand as a basic text-based readme, but that the JSON looks more like machine code?
datapackage.json.pdf
datapackage.yaml.pdf

rufuspollock added spec-datapackage labels Sep 11, 2016

rufuspollock closed this as completed Sep 27, 2016

rufuspollock mentioned this issue Oct 19, 2016

Implementation of JSON Table Schema at Open Power System Data #310

Closed

rufuspollock reopened this Jul 9, 2018

rufuspollock added the Status: Ready-For-PR label Feb 16, 2020

fjuniorr mentioned this issue Dec 11, 2021

Allow descriptors represented in yaml frictionlessdata/frictionless-r#66

Closed

xrotwang mentioned this issue Jul 11, 2022

Support datapackage.yaml cldf/csvw#63

Closed

roll added this to Open Knowledge Apr 14, 2023

roll removed New Spec labels Jan 3, 2024

roll changed the title ~~Yaml as well as JSON for Data Package descriptor files (?)~~ Yaml as well as JSON for Data Package descriptor files Jan 3, 2024

roll added this to the v2 milestone Jan 3, 2024

roll mentioned this issue Jan 3, 2024

Clarify whether YAML is a valid format for datapackage #810

Closed

roll assigned roll and unassigned roll Jan 25, 2024

roll assigned roll and unassigned roll Feb 6, 2024

roll added the epic label Feb 6, 2024

roll self-assigned this Feb 21, 2024

roll added the proposal label Feb 21, 2024

roll mentioned this issue Feb 21, 2024

[Draft] Clarify serialization and discoverability frictionlessdata/datapackage-v2-draft#40

Closed

roll modified the milestones: v2-draft, v2-final Mar 28, 2024

roll mentioned this issue Apr 3, 2024

Descriptor definition; clarify JSON/YAML frictionlessdata/datapackage-v2-draft#50

Merged

roll closed this as completed in frictionlessdata/datapackage-v2-draft#50 Apr 10, 2024

github-project-automation bot moved this to Done in Open Knowledge Apr 10, 2024

roll reopened this Jul 1, 2024

roll removed this from the v2.0 milestone Jul 2, 2024

roll removed the proposal label Jul 2, 2024

frictionlessdata locked and limited conversation to collaborators Oct 21, 2024

roll converted this issue into discussion #995 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Yaml as well as JSON for Data Package descriptor files #292

Yaml as well as JSON for Data Package descriptor files #292

rufuspollock commented Sep 11, 2016 •

edited by roll

Loading

pwalsh commented Sep 11, 2016 •

edited

Loading

danfowler commented Sep 26, 2016

rufuspollock commented Sep 27, 2016

jgmill commented Dec 19, 2016

danfowler commented Feb 10, 2017

danfowler commented Feb 23, 2017

rufuspollock commented Jul 9, 2018 •

edited

Loading

rufuspollock commented Mar 20, 2019

micimize commented Aug 23, 2019 •

edited

Loading

rufuspollock commented Feb 16, 2020

peterdesmet commented Dec 13, 2021

ezwelty commented Apr 10, 2024

rufuspollock commented Jun 30, 2024

roll commented Jul 1, 2024

roll commented Jul 1, 2024

vkhodygo commented Jul 19, 2024 •

edited

Loading

khusmann commented Jul 19, 2024

ezwelty commented Aug 8, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Yaml as well as JSON for Data Package descriptor files #292

Yaml as well as JSON for Data Package descriptor files #292

Comments

rufuspollock commented Sep 11, 2016 • edited by roll Loading

pwalsh commented Sep 11, 2016 • edited Loading

danfowler commented Sep 26, 2016

rufuspollock commented Sep 27, 2016

jgmill commented Dec 19, 2016

danfowler commented Feb 10, 2017

danfowler commented Feb 23, 2017

rufuspollock commented Jul 9, 2018 • edited Loading

rufuspollock commented Mar 20, 2019

micimize commented Aug 23, 2019 • edited Loading

rufuspollock commented Feb 16, 2020

peterdesmet commented Dec 13, 2021

ezwelty commented Apr 10, 2024

rufuspollock commented Jun 30, 2024

roll commented Jul 1, 2024

roll commented Jul 1, 2024

vkhodygo commented Jul 19, 2024 • edited Loading

khusmann commented Jul 19, 2024

ezwelty commented Aug 8, 2024

This issue was moved to a discussion.

rufuspollock commented Sep 11, 2016 •

edited by roll

Loading

pwalsh commented Sep 11, 2016 •

edited

Loading

rufuspollock commented Jul 9, 2018 •

edited

Loading

micimize commented Aug 23, 2019 •

edited

Loading

vkhodygo commented Jul 19, 2024 •

edited

Loading