Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release KDL 2.0.0 #434

Merged
merged 18 commits into from
Dec 22, 2024
Merged

Release KDL 2.0.0 #434

merged 18 commits into from
Dec 22, 2024

Conversation

zkat
Copy link
Member

@zkat zkat commented Dec 15, 2024

This is it, folks! I think we're ready to finalize 2.0, after something like 3 years of work.

I'm very happy with the language we've defined here, and how nice it feels to use. I want to express my thanks to everyone who has contributed ideas and discussion and work to its specification, all the various implementers who have helped validate it, and all the users of 1.0 who helped give real, experience-based feedback on the previous version of the language, helping us figure out what could be improved.

I'm gonna give folks a few days to review things and make a decision, then I'm hoping to have consensus from at least a few of the major contributors and implementers before publishing this.

Thanks again, everyone. This wouldn't have been possible without you, and I think we've ended up with a real, shining gem.

NOTE: If you're "just" a community member, your opinion is still welcome, although contributors/implementers will likely be prioritized at this point.

@zkat
Copy link
Member Author

zkat commented Dec 15, 2024

heh. The draft.8 tests killed kdl-rs multiline parsing and I realized the grammar wasn't actually allowing escapes on the closing multiline quoted string line.

so there's tests for that too now.

@zkat
Copy link
Member Author

zkat commented Dec 15, 2024

@tabatkins @tjol I regret to inform you that the Really Complicated whitespace-only multiline string test was actually wrong (I'm pretty sure).

The prefixes didn't match, and prefixes need to match exactly before any normalization or empty line collapse, unless the line is completely empty.

That is, this should fail, I believe, wince line 2 isn't completely empty, but its prefix doesn't match line 3:

"""\n
\t\s\n
\s\s"""

@tjol
Copy link
Contributor

tjol commented Dec 15, 2024

@zkat I thought we'd established that the rule that empty lines can contain “any” whitespace takes priority over the prefix requirement.

#429 (comment)

@zkat
Copy link
Member Author

zkat commented Dec 15, 2024

Ohhh. Uggghhhhhhhhhhhh ok I'll roll this back

@eilvelia
Copy link
Contributor

eilvelia commented Dec 16, 2024

I think the interactions of multiline strings with escapes are pretty weird currently.

When processing a Multi-line String, implementations MUST dedent the string after resolving all whitespace escapes, but before resolving other backslash escapes.

\" should also be interpretered before dedenting (I assume \""" doesn't close the string) since otherwise it wouldn't be possible to find the correct enclosing """ and deduce the prefix to dedent; if you resolve any escapes, you want to interpret \\ as well. However, resolving them cannot just be done in two separate phases, or \\t will be transformed as \\t -> \t -> <tab>. I think instead it should be noted that, to find the whitespace prefix, at least \" and \\ should be analyzed, and then escapes are resolved from the raw form (i.e. during dedenting the escapes are lexically analyzed but not resolved/saved). It would be nice to add a test with \""" inside a multiline string (I can't find one).

while the following example is allowed

  """
  foo \
bar
  baz
  \   """

IMO having non-literal whitespace in the prefix-defining line also feels pretty weird. Dedented strings are mostly for readability in the source code, transforming the indentation goes against that. Swift's multiline strings are quite similar, and I think can be an inspiration for kdl's strings. Particularly, the final line (or any ws prefix) there can consist of literal whitespace only, the newline escape is not allowed as the last \n, lines must either be completely empty or include the prefix.

The "any whitespace in non-content lines" rule forms an exception to the prefix being identical (or at least missing) in all lines, which is also isn't good I think and more difficult to implement (checking the whole line for any whitespace/content instead of only comparing the first characters to the intended prefix). I initially missed this rule when I was updating the ocaml implementation.

@zkat
Copy link
Member Author

zkat commented Dec 16, 2024

@eilvelia \\t would transform to t. Regular escapes are processed before any dedenting happens, and that includes white space escapes. \" can never be an issue because it’s always interpreted as simply ". \ doesn’t do its “slurping” behavior unless it’s immediately followed by whitespace.

As far as the Swift rules you mention: it does seem like those would simplify the rules for multi line strings. They would be more limiting but these corner cases would be way less confusing to think about by just… banning them altogether (and simplifying parsing)

I’m curious what others think.

@eilvelia
Copy link
Contributor

@eilvelia \t would transform to t. Regular escapes are processed before any dedenting happens, and that includes white space escapes. " can never be an issue because it’s always interpreted as simply ". \ doesn’t do its “slurping” behavior unless it’s immediately followed by whitespace

I think you misunderstood me. \\t would be an issue if one naively transforms \\ to \ and \t to tab in separate phases. You can't transform only the whitespace escape since then foo\\ bar would (unexpectedly) activate the whitespace escape. The issue with \" is that this should be allowed:

"""
\"""
"""

(That would be the most intuitive, also what Swift does and what the grammar and spec currently suggest, as I think.)
If you conpletely disregard all escapes other than the whitespace one before dedenting (as the quote suggests), the second """ would be parsed as the string end (and then fail with non-ws final line).

@eilvelia
Copy link
Contributor

Regular escapes are processed before any dedenting happens

Well, per this line, the escapes (other than ws) are resolved after dedenting:

When processing a Multi-line String, implementations MUST dedent the string after resolving all whitespace escapes, but before resolving other backslash escapes.

@zkat
Copy link
Member Author

zkat commented Dec 16, 2024

@eilvelia what that is intended to mean is that:

"""
\s\s\s\sfoo
    """

is an invalid string (that is, \s isn't resolved by dedenting time, so it doesn't actually "count" as whitespace)

Additionally:

"""
\"""
"""

Is certainly allowed, and certainly is passing my tests right now. The closing """ parsing is done before any of the dedenting stuff even applies. That's just standard delimited string stuff.

@eilvelia
Copy link
Contributor

eilvelia commented Dec 16, 2024

Yes, well, that's what should happen, I think the spec is somewhat vague there. That line in particular seemingly suggests that all escapes except ws are unprocessed in case dedent has not been done yet. IIRC the spec only says how escapes are resolved, not how they are lexically analyzed.

edit: To add a small clarification, scanning and resolving can often be combined into a single step. In kdl, this is possible for single-line strings (and trivially raw multi-line strings) but not for quoted multi-line. Although an implementation that doesn't scan non-ws escapes beforehand should even pass most (all, I think) current tests.

@zkat
Copy link
Member Author

zkat commented Dec 16, 2024

@eilvelia do you have any thoughts on what kind of rewording would be helpful here?

SPEC.md Outdated
When processing a Multi-line String, implementations MUST dedent the string
_after_ resolving all whitespace escapes, but _before_ resolving other backslash
escapes. Furthermore, a whitespace escape that attempts to escape the final
line's newline and/or whitespace prefix is invalid since the multi-line string
Copy link
Contributor

@bgotink bgotink Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this phrasing. It seems to say that escaped newline and/or whitespace are not allowed as the multi-line string has to be valid after the escaped whitespace is removed, but the text below and two of the new tests actually allow escaped whitespace in the final line.
The examples below seem to imply that an escaped newline and/or whitespace are only not allowed if the multiline string would become invalid, but that's not how I read this sentence.

Note that would also allow escaped newlines in some cases, e.g.

node """
  lorem
  ipsum
  \
    """

which would be equivalent to

node """
  lorem
  ipsum
  """

which would imo be worth adding as test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgotink this test sort of tests that but having a more explicit one would be good.

I believe the bit that says that a trailing \ is invalid actually intended to refer to a case like this:

"""
lorem
ipsum\
"""

which would be invalid because:

"""
lorem
ipsum"""

This should probably be reworded, assuming these are the semantics we want to keep.

I'm not terribly inclined to change multiline strings any further, though, tbh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not terribly inclined to change multiline strings any further,

Oh no, I'm definitely not asking to change multiline strings! I was just confused about the text vs tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no that's fine. I was also mostly responding to the ongoing convo with @eilvelia who brought up a lot of good points, though I'm leading towards "clarify, don't change"

@zkat
Copy link
Member Author

zkat commented Dec 19, 2024

I'm pretty happy with where this is at right now. I still need to update kdl-rs with these last couple of changes to make sure everything looks good, but barring any other issues getting brought up, I intend to merge this PR and tag the official 2.0.0 sometime on Saturday.

/cc @larsgw and @tabatkins who I don't think have responded here yet, wanna make sure y'all see this.

@larsgw
Copy link
Contributor

larsgw commented Dec 19, 2024

I felt like I could not give meaningful feedback without attempting to implement it, but I have unfortunately been very busy with my studies the past few weeks. I don't know if I will before Saturday, but that's fine with me.

@zkat
Copy link
Member Author

zkat commented Dec 19, 2024

hmmmmmmmmmm

I was thinking about #"""# and how we made it an error now and... I feel weird about it. It's literally the only string that can't be represented by raw strings, at all.

The only way to represent this is to do either:

"\""
//or
#"""
"
"""#

Which feels really strange? I know it's a relatively minor thing, but it leaves a bad taste in my mouth for something to be, actually, unrepresentable in one of our string syntaxes.

@tjol
Copy link
Contributor

tjol commented Dec 19, 2024

That's not the only string you can't represent with raw strings. Another example is "\r\n" (you can't do this in a single-line raw string because of the newline and you can't do this in a multi-line raw string because of newline normalization.) You also can't represent "\u{feff}" as a raw string (or other strings containing disallowed literal code points). Granted, compared to "\"", the other examples I can think of are a lot ... weirder.

@zkat
Copy link
Member Author

zkat commented Dec 19, 2024

oh that's a good point. And it does make me feel better. 🤔🤔🤔

@eilvelia
Copy link
Contributor

Isn't it a little weird that newlines are allowed after /- (the only line-space inside a node)?

@zkat
Copy link
Member Author

zkat commented Dec 20, 2024

That’s intentional!

@zkat
Copy link
Member Author

zkat commented Dec 20, 2024

That is: I thought it would be good for /- to mean “slurp up any and all whitespace until the item being commented out”, as opposed to giving it special rules within nodes. It’s more for simplicity (a single /- definition) than thinking this is a thing that’ll be done all the time. We used to have much more strict locations for /- and it turned out to actually complicate grammars and parsers more than it was worth

@eilvelia
Copy link
Contributor

Well, currently slashdash is a node/children/arg/prop "modifier" (as I described in #401 (comment)) and can only be inside a node (including at the beginning of it), changing its behaviour would be as simple as

multi-line-comment := '/*' commented-block
commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block
-slashdash := '/-' line-space*
+slashdash := '/-' node-space*

// Whitespace

@zkat
Copy link
Member Author

zkat commented Dec 20, 2024

I definitely want the following to be legal:

/-
my-node 1 2 3

so I’m not terribly inclined to change this at this stage

@zkat
Copy link
Member Author

zkat commented Dec 21, 2024

Final heads up: I'm gonna be wrapping up some stuff with kdl-rs this morning, and then releasing and then I'm gonna merge this and release/announce both at the same time. I'd you'd like to tag your own implementation today and have me announce it please lmk!

Looks like we're all set here :)

@eilvelia
Copy link
Contributor

Could you take a look at two small PRs I've sent (#441 and #442)?

@tjol
Copy link
Contributor

tjol commented Dec 21, 2024

@zkat ckdl 0.2.1 is out now with opt-in support for KDL 2.0.0 as it stands. I'm just changing the defaults to KDL 2.0 now. Should be done in no time at all. Planning to call that version ckdl-1.0!

@bgotink
Copy link
Contributor

bgotink commented Dec 21, 2024

I've tagged version 0.2.0 of npm package @bgotink/kdl with KDL v2 support 🎉 (release)

@tjol
Copy link
Contributor

tjol commented Dec 21, 2024

No huge surprises in the process of changing the defaults, ergo:

ckdl-1.0 is released. This version supports both KDL v1 and v2; hybrid mode is the default for reading, and KDL v2 is the default for writing.

Release / Python package

Feel free to mention in the main announcement.

* Add version marker to the grammer

* Add version marker to the Changelog

* Update SPEC.md

Co-authored-by: eilvelia <[email protected]>

* add a mandatory newline after the version marker

* add mandatory space between version number

---------

Co-authored-by: eilvelia <[email protected]>
@zkat
Copy link
Member Author

zkat commented Dec 22, 2024

LFGGGGGGG

@zkat zkat merged commit 6ceecd8 into main Dec 22, 2024
1 check passed
@zkat zkat deleted the release-2.0.0 branch December 22, 2024 02:33
@zkat
Copy link
Member Author

zkat commented Dec 22, 2024

Thank you everyone!! Great job!! We did it!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants