-
Notifications
You must be signed in to change notification settings - Fork 5
Discourage usage of unnecessary union types in Table Schema #28
Conversation
As far as I understand this only is about union types in primary and foreign keys, not about union types in other parts of the schema. In this case I'd support the change! |
@nichtich |
While it is a cleaner implementation:
Currently hesitating between 👎 and 👀 for me. |
@peterdesmet |
Yeah, I'm coming to this from R, where a character and a character vector (= array) are interchangeable, so it feels very natural. 😄 val <- "a"
val
#> [1] "a"
val <- c("a")
val
#> [1] "a"
val <- c("a", "b")
val
#> [1] "a" "b" Created on 2024-02-12 with reprex v2.1.0 |
It makes sense, given that the property is already named |
I think it'd be good to distinguish between the different ways union types can be used in the spec. Broadly, I think there's two main categories: distinguishing between features (like @roll mentioned above), and syntactic sugar. I think a key example of the former category (a union type distinguishing between features) is the Although I don't like the behavior in this particular example (as I mention in this comment), but I'm not against it in general. For example, in my comment on I think "unnecessary" union types are cases in the latter category, that is, "union types for syntactic sugar". I think there are three main use cases I've seen:
For me, keeping things consistent is paramount. So, considering (1), if we want some arrays to be specified by a single value I think it should be the case for ALL arrays. Otherwise, every time I want to take the shortcut on a single-item array, I need to look up whether it's allowed. Similarly, for (2), I think if we allow physical values to be specified as I lean towards NOT allowing (1) and (2), because it is more clear, explicit, and emphasizes the actual type of the field. It also avoids potentially ambiguity down the road if we wanted to use the same type signature for a "feature-distinguishing" union type as a way of adding a new feature. An example of this might be "named collections of missing values", where field-level missing values could be declared via a string referencing a named collection, rather than an array of physical values). This would not be possible if we were implementing (1). I think syntactic sugar union types that fall into (3) should be evaluated on a case-by-case basis to weigh their convenience & clarity vs the complexity they introduce. Because they are specific special cases, I'm less worried about them interfering with anything else in the spec. |
Co-authored-by: Peter Desmet <[email protected]>
Deploying with
|
Latest commit: |
13c48ca
|
Status: | ✅ Deploy successful! |
Preview URL: | https://32309e47.datapackage.pages.dev |
Branch Preview URL: | https://873-reduce-union-types.datapackage.pages.dev |
@khusmann Currently
This wording is so complex and referential so by this PR I really think we simplify data publishers live rather than complicate it =) |
@ezwelty |
Of course software will have to continue supporting v1 quirks, but the quirks won't go away someday unless we start to deprecate them... I've been nagged by this many times when writing code to process |
@peterdesmet |
Was reminded of this issue when changing an AWS S3 permission (which are also defined in JSON). There you can mix arrays and strings too: Two actions: "Action": [
"s3:GetObject",
"s3:ListBucket"
] One action: "Action": [
"s3:GetObject"
] or "Action": "s3:GetObject" I quite like this approach (Data Package has it for e.g. |
@peterdesmet
My hope that as tooling emerges So I still suggest we accept if for simplicity on the specs level but honestly speaking, this PR doesn't even change anything because of backward-compat so we probably just need to make a call on it (accept or reject) and focus on more important topics 👍 |
|
APPROVED by WG (6/9) |
Thanks @peterdesmet !
So I think we now have quite consistent terminology:
PS. |
Rationale
I think having only one way of communicating things in cases of
primaryKey
andforeignKeys
will actually benefit both data publishers and data consumers as it reduces confusion. For example in v1foreignKey.fields
can be a single string which might be really confusing for data publishers. Of course, a stricter data model is usually better for data consumers