diff --git a/.gitignore b/.gitignore new file mode 100644 index 000000000..fe4a57d81 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +*~ +compiled/ diff --git a/shrubbery/0000-shrubbery.md b/shrubbery/0000-shrubbery.md new file mode 100644 index 000000000..41d4520fe --- /dev/null +++ b/shrubbery/0000-shrubbery.md @@ -0,0 +1,1279 @@ +- Feature Name: Shrubbery notation +- Start Date: 2019-10-01 +- RFC PR: [racket/racket2-rfcs#122](https://github.com/racket/racket2-rfcs/pull/122) + +# Summary +[summary]: #summary + +Shrubbery notation is similar to S-expression notation, but instead of +generating fully formed trees, it is intended to partially group input +for further enforestation by another parser (e.g., as in Honu). The +notation is line- and indentation-sensitive, and the parsed form of a +shrubbery imposes grouping to ensure that further parsing is +consistent with the shrubbery's lines and indentation. + +# Motivation +[motivation]: #motivation + +S-expression notation imposes a grouping at the lexeme level that is +all but guaranteed to be respected by further parsing via macro +expansion. One consequence of this lexeme-based grouping is that +programs can be pretty-printed and textually traversed in standard +ways. + +A traditional use of S-expression notation, however, insists that +*all* grouping is reflected in the S-expression. Reifying all grouping +at the lexeme level is so onerous that many practical deployments of +S-expressions include deviations from the rule, such as keyword-based +arguments or implicit grouping by position (as in various Clojure +forms). + +Another disadvantage of S-expressions is that many of the parentheses +are redundant after the expression is pretty-printed, because +indentation provides the same grouping information in a more +human-readable way. That observation suggests instead relying on line +breaks and indentation to impart grouping information, as in Python. + +Shrubbery notation explores a point in the design space where the +notation is + + - line- and indentation-sensitive, and + - intended to constrain grouping but not reflect every detail of grouping. + +Deferring complete grouping to another parser relieves a burden on +reader-level notation. At the same time, line- and +indentation-sensitive rules constrain parsing to ensure that line +breaks and indentation in the source are not misleading. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Here are some example shrubberies. Each line either uses old +indentation to continue a nesting level that was started on a previous +line, starts with new indentation and follows a line that ends with +`:`, or starts with new indentation and a `|` on the same line. A `:` +or `|` can also appear in the middle of a line, but that's roughly a +shorthand for starting a new indented line after the `:` or before the +`|`. The complete rules involve more terminology, but that's enough to +get a sense of the examples. + +``` +def identity(x): x + +def fib(n): + cond + | n == 0: 0 + | n == 1: 1 + | else: fib(n-1) + fib(n-2) + +def print_sexp(v): + match v + | empty: display("()") + | cons(a, d): + if is_list(d) + | display("(") + print_sexp(a) + for (v = in_list(d)): + display(" ") + print_sexp(v) + display(")") + | display("(") + print_sexp(a) + display(". ") + print_sexp(d) + display(")") + | v: print_atom(v) +``` + +Forms like `def`, `cond`, and `match` are not specified by +shrubbery notation, since specifying those forms is up to a language +that is built on top of shrubbery notation. Still, shrubbery notation +is meant to accommodate a particular kind of syntax for nested blocks +(via `:` and indentation) and conditional branches (via `|`). + +Identifiers are C-style with alphanumerics and underscores. Operators +are sequences of symbolic characters in the sense of `char-symbolic?`, +roughly. No spaces are needed between operators and non-operators, so +`1+2` and `1 + 2` mean the same thing. Comments are C-style, plus a +`#//` group-comment form. See [lexeme parsing](#lexeme-parsing) for +more information. + +The following tokens are used for grouping, in addition to line breaks +and indentation: + +``` +( ) [ ] { } ; , : | « » \ +``` + +Parentheses, square brackets, and curly braces are used to form groups +in the obvious way. A `;` or `,` acts as a group separator, even +within a single line. A `:` or `|` treats remaining item on the same +line like a new indented line, which forms a subgroup. A guillemet +pair `«` and `»` can be used (probably very rarely) to explicitly +bracket subgroups formed by `:` and `|` without line breaks. A `\` +continues a line, effectively shifting all columns on the next line as +if they appeared immediately after the `\`. + +## Grouping by lines + +The main grouping rule is that sequences on different lines with the +same indentation create separate groups, one for each line. + +``` +this is the first group +this is the second group +``` + +Comments and lines with only whitespace are ignored. They don't count +when this document says “the previous line” or “the next line.” + +## Grouping by _opener_-_closer_ pairs + +An _opener_-_closer_ pair `(` and `)`, `[` and `]`, or `{` and `}` +forms a nested group that can span lines. Within the _opener_-_closer_ +pair, `,` sparates groups. Groups can be on separate lines at the same +indentation, but groups on separate lines still must be separated by +`,`. Parsing retains whether a subgroup is formed by `()`, `[]`, or +`{}`. + +``` +group 1 +[group 2 - subgroup I, + group 2 - subgroup II, + (group 2 - subgroup III - subsubgroup A, + group 2 - subgroup III - subsubgroup B, + {group 2 - subgroup III - subsubgroup C, subsubsubgroup α, + group 2 - subgroup III - subsubgroup C, subsubsubgroup β})] +(group 3 - subgroup I, group 3 - subgroup II, + group 3 - subgroup III) +``` + +The following three forms are not allowed, because they are missing a +`,` between two groups: + +``` +// Not allowed +(1 + 2) +[1 + 2] +{1 + 2} +``` + +A `,` is disallowed if it would create an empty group, except that a +trailing `,` is allowed. + +``` +// Not allowed +(, 1) +(1,, 2) + +// Allowed, but not standard +(1, 2,) +``` + +A trailing `,` is only standard style when the _closer_ that follows is +on its own line. + +``` +list( + red, + green, + blue, + orange, +) +``` + +## Blocking with `:` and indentation + +A sequence of groups has a particular indentation that is determined +by the first group in the sequence. Subsequent groups in a sequence +must start with the same indentation as the first group. + +``` +group 1 +group 2 +// error, because the group is indented incorrectly: + group 3 +``` + +When a line ends with `:` and the next line is more indented, then +it starts a new sequence of groups that form a _block_: + +``` +group: + subgroup 1 + subgroup 2 +``` + +There is no constraint on how much indentation a nested group sequence +must use, as long as the indentation is more than the enclosing group. +Also, a new line is not required after `:`, but then it's as if the +`:` is followed by a newline plus spaces that reach the same column as +the `:`. All four of the following groups are the same, each with one +block that has two nested groups: + + +``` +hello: + world + universe + +hello: + world + universe + +hello: world + universe + +hello: world + universe +``` + +Within an _opener_-_closer_ pair, a nested group sequence can start at +any indentation; it doesn't have to be indented to the right of the +_opener_. + +``` +function( + argument, + more +) +``` + +A block that is started with `:` normally cannot be empty (unless +explicit-grouping `«` and `»` are used as described in a later +section), so the following is ill-formed: + +``` +bad_empty: // empty block disallowed +``` + +However, `:` can be used at the start of a group so that the group +contains only a block. When `:` starts a group that is in the +top-level sequence or within an _opener_-_closer_ pair, the block +created by `:` is allowed to be empty (because that provides a way to +express an empty in a context where it likely to be intentional +instead of confusing). For example, the first of the following three +top-level groups has just a block that contains one group with the +single element `untagged`, the second top-level group has just a +block with zero groups, and the third has a group with one parenthesized +sequence of groups where the middle one has an empty block: + +``` +: untagged + +: + +(1, :, 2) +``` + +## Continuing with indentation and an operator + +When a newly indented line starts with an operator and when the +preceding line does _not_ end with `:`, then the indented line does +not form a block, and it may instead continue the previous line. The +operator-starting line continues only if the previous line was not a +continuing line; however, additional continuing lines can start with +an operator (not necessarily the same one) at the same indentation as +the original continuing line. The following two groups are the same: + +``` +f(1) + 2 + + 3 + 4 + - 5 - 6 + +f(1) + 2 + 3 + 4 - 5 - 6 +``` + +A block is always at the end of its immediately containing group. One +consequence is that an operator-starting line cannot continue a group +that already has a block: + +``` +hello: world + + 3 // bad indentation +``` + +Along those lines, there is no ambiguity when an indented line appears +after `:` and starts with an operator. In that case, the indented line +is part of the block, since it cannot continue the group that contains +the block. For example, the following two groups are the same, each +with a block that has a `+ 3` group: + +``` +hello: + 3 + +hello: + + 3 +``` + +## Blocking with `|` + +A `|` is implicitly shifted half a column right (so, implicitly +nested), and it is implicitly followed by a `:` that conceptually +occupies same column as the `|`. That is, like `:`, a `|` always +creates a nested block. Furthermore, `|` starts an enclosing block +that includes the `|` block plus subsequent `|` blocks that are at the +same indentation. A `|` that starts the enclosing block can appear at +the start of a line with new indentation. The following four groups +are the same: + +``` +hello +| world +| universe + +hello + | world + | universe + +hello | world + | universe + +hello | + world + | + universe +``` + +Each of the four groups has two elements: `hello` and a block. The +block has two groups, each of which is a more nested block. The first +nested block has `world` in a single group, and the second nested +block as `universe` in a single group. + +A `|` cannot be a in a top-level sequence of groups or start a group +immediately within `()`, `[]`, or `{}`, and it cannot appear just +after `:`. Like `:`, the content of a block after `|` cannot be empty +unless explicit-grouping `«` and `»` are used. + +If a `|` appears on the same line as an earlier `|` and is not more +nested inside `()`, `[]`, or `{}`, then the `|` terminates the earlier +`|`'s block and continues its enclosing block with a new `|` group. +The intent and consequence of this rule is that multiple `|`s can be +used on a single line as an alternative to starting each `|` on its +own line, making the following groups the same as the above groups: + +``` +hello | world | universe + +hello +| world | universe +``` + +The implicit shifting of `|` by half a column is consistent with its +visual representation, and it avoids the possibility of a group +sequence that contains a mixture of `|`-started groups and other kinds +of groups. Standard indentation uses no additional space of +indentation before `|` relative to its enclosing block's group. + + +## Separating groups with `;` and `,` + +A `;` separates two groups on the same line. A `;` is allowed in any +context—except between groups immediately within, `()`, `[]`, or `{}`, +where a `,` separates groups. The following three blocks +are the same: + +``` +hello: + world + universe + +hello: + world; universe + +hello: world; universe +``` + +The `;` and `,` separators interact differently with blocks formed by +`:` and `|`. A `,` closes subgroups and blocks as necessary to reach +an enclosing `()`, `[]`, or `{}`, while a `;` separate groups within a +nested group sequence. If `;` would create an empty group, it is +ignored. + +For example, the following two groups are the same, and they have one +parenthesized term that has a single block, and the block has two +groups: + +``` +(hello: world; universe) + +(hello: world + universe) +``` + +The following two groups are also the same, where the group has one +parenthesized term, but that term contains two groups, where the first +group is a block that contains a single group: + + +``` +(hello: world, universe) + +(hello: world, + universe) +``` + +## Line- and column-insensitivity with `«` and `»` + +A block can be delimited explicitly with `«` and `»` to disable the +use of line and column information for parsing between `«` and `»`. A +`«` can be used immediately after `:` or immediately after `|`, in +which case a `»` indicates the end of the block that starts after the +`:` or `|`. Within the block, an explicit `;` must be used to separate +groups. + +A sequence of groups, either at the top level or within a block, can +be written without line and column sensitivity as `;` followed +immediately by `«`, in which case a `»` indicates the end of the +sequence, and groups within the sequence are separated by `;`. When +parsing, the groups within the sequence are spliced into the enclosing +context. The combination of `;` and `«` is intended for entering line- +and column-insensitive mode for a single group or for representing a +sequence of groups that is not within a block. + +Whitespace and block comments are allowed between a `:`, `|`, or `;` +and its `«`, but in a line-sensitive context, the `«` must be on the +same line as its `:`, `|`, or `;`. + +The following five groups are the same: + +``` +hello: + if x + | world + planet + | universe + +hello: if x | world; planet | universe + +hello:« + if x + |« world; + planet » + |« universe »» + +hello:« if x |« world; planet » |« universe »» + +;«hello + : + « + if + x + | + « + world + ; + planet + » + | + « + universe + » + » + » +``` + +Using `«` and `»` can “armor” a shrubbery for transport from one +context to another where its line breaks or indentation might get +mangled. For example, an editor might offer an operation to armor a +range of text in perparation for moving or copying the text, and then +it can be properly indentend in its destination before unmarmoring. +Along similar lines, when writing code as data to be read back later, +it's easy for a printer to insert explicit `«` and `»`. + +In rare cases, a programmer might write `«` and `»` directly. Although +many shrubbery forms can be written with `:`, `|`, and `;` on a single +line, as illustrated above, not all forms can be collapsed to a single +line without extra delimiters. For example, these six groups are all +different: + +``` +outside: + inside: fruit + rind + +// not the same, because `rind` is within `inside:` +outside: inside: fruit; rind + +if true +| if false + | x + | y +| z + +// not the same, because there's one block with five `|` alternatives +if | true | if false | x | y | z + +hello: + if x + | world + | universe + the end + +// not the same, because `the end` is in the second `|`: +hello: if x | world | universe; the end +``` + +Using `«` and `»` can help in those cases: + +``` +outside: + inside: fruit + rind + +outside: inside:« fruit »; rind + +if true +| if false + | x + | y +| z + +if | true |« if false | x | y » | z + +hello: + if x + | world + | universe + the end + +hello: if x | world |« universe »; the end +``` + +Even so, delimiting blocks with `«` and `»` is expected to be rare in +practice, both because programmers are likely to break things across +lines and because a language that uses shrubbery notation is likely to +allow `()` in places where grouping might be needed. For example, +assuming that `if` is an expression form and `()` can wrap an +expression, a nested conditional is probably better written like this: + +``` +if | true | (if false | x | y) | z +``` + +Using `()` in this way does not produce an equivalent shrubbery to `if +| true |« if false | x | y »| z`, but it might represent an equivalent +expression in the language using shrubbery notation. + +To stay consistent with blocks expressed through line breaks and +indentation, a block with `«` and `»` must still appear at the end of +its enclosing group. + +``` +// not allowed, because a block must end a group +inside:« fruit » more +``` + + +## Continuing a line with `\` + +As a last resort, `\` can be used at the end of a line (optionally +followed by whitespace and coments on the line) to continue the next +line as it if were one line continuing with the next line. The itself +`\` does not appear in the parsed form. A that is not at the end of a +line (followed by whitespace and coments) is treated the same as +whitespace. + +Lines contianing only whitespace and (non-term) comments do not count +as “the next line” even for `\` continuations, so any number of +whitespace and comment lines can appear between `\` and the line that +it continues. + + +``` +this is \ + the first group +this \ is \ the \ second \ group + +this is a group \ + with (a, + nested, + list) + +this is a group \ + with (a, + \ + nested, + \ + list) + +this is a group \ + with (a, + \ + /* this a comment on `nested`: */ + nested, + \ + list) + +``` + +## Group comments + +A `#//` comments out a group or `|` alternative. To comment out a +group, `#//` must appear either on its own line before a group or at +the start of a group. To comment out an alternative, `#//` must appear +on its own line before the alternative or just before a `|` that does +*not* start a new line. + +The interaction between `#//` and indentation depends on how it is +used: + + * When `#//` appears completely on its own line (possibly with + whitespace and non-group comments), then its indentation does not + matter. It comments out the next group or alternative—which might + be a single-line group, block, or `|` block. + + * When `#//` appears at the start of a group with more tokens + afterward on the same line, it determines that group's indentation, + and it must obey any constraints on the group's indentation. When + `#//` appears immediately after an opener but with nothing else + afterward on the same line, it determines indentation for the + groups immediately within the opener, and it comments out the first + group. + + * When `#//` appears just before a `|` on the same line, then unlike + the case for groups, it does not affect the the column of the `|` + as used to align alternatives on later lines. (That's because the + half-column alignment of `|` does not fit with the column alignment + of `#`.) Along those lines and to avoid an indentation mismatch, a + `#//` is not allowed to start a line for commenting out a `|` + alternative on the same line. + +A `#//` is not allowed without a group or alternative afterward to +comment out. Multiple `#//`s do not nest (i.e., two `#//`s in a row is +always an error). + +The following three groups all parse the same: + +``` +{ + hello: + val x: f(1, 2 + 3) + match x + | 1: 'one' + | 2: 'two' +} + +{ + hello: + val x: + #// + g(-1) + f( + #// + 0, + 1, + 2 + 3, + #// + 4 + 5) + #// + not included in the code + match x + #// + | 0: no + | 1: 'one' + #// + | 1.5: no + | 2: 'two' + #// + | 3: no, + #// + goodbye: + the enclosing group of the block is commented out +} + +{ + hello: + val x: + #// g(-1) + f(#// 0, 1, 2 + 3, #// 4 + 5) + #// not included in the code + match x #// | 0: no | 1: 'one' #// | 1.5: no + | 2: 'two' #// | 3: no, + #// goodbye: + the enclosing group of the block is commented out +} +``` + +## More examples + +Here are more example shrubberies. These shrubberies are not +necessarily consistent with each other in the sense of sketching a +single language that uses shrubbery notation; they show different +potential ways of using the notation. + + +``` +define pi: 3.14 + +define +| fib(0): 0 +| fib(1): 1 +| fib(n): fib(n-1) + fib(n-2) + +define fib(n): + match n + | 0: 0 + | 1: 1 + | n: fib(n-1) + fib(n-2) + +define fib(n): + match n | 0: 0 + | 1: 1 + | n: (fib(n-1) + + fib(n-2)) + +define fib(n): + match n + | 0: + 0 + | 1: + 1 + | n: + fib(n-1) + fib(n-2) + +define make_adder(n): + lambda (m): + printf("adding to ~a\n", m) + +define fourth(n: integer): + define m: n*n + define v: m*m + printf("~a^4 = ~a\n", n, v) + v + +struct posn(x, y): + property prop_equal_and_hash: + let (hc = lambda (a: posn, hc): + hc(a.x) + hc(a.y), + eql = lambda (a: posn, b: posn, eql): + eql(a.x, b.x) && eql(a.y, b.y)): + values(eql, hc, hc) + +define go(): + define helper(n): + list(n, n) + define more(m): + if m == 0 | "done" + | more(m - 1) + helper(more(9)) + +define curried: + lambda (x): + lambda (y): + lambda (z): + list(x, y, z) + +let (x = 1, + y = 2): + printf("About to add") + x+y + +define show_zip(l, l2): + for (x = in_list(l), + x2 = in_list(l2)): + print(x) + print_string(" ") + print(x2) + newline() + +define show_combos(l, l2): + for (x = in_list(l)): + then (x2 = in_list(l2)): + print(x) + print_string(" ") + print(x2) + newline() +``` + +## Parsed representation + +The parse of a shrubbery can be represented by an S-expression: + + * Each group is represented as a list that starts `'group`, and + the rest of the list are the elements of the group. + + * Atom elements are represented as “themselves” within a group, + including identifers a symbols, except that an operator is + represented as a 2-list that is `'op` followed by the operator name + as a symbol. + + * A group sequence is represented as a list of `'group` lists. + + * An element created by `()` is represented by `'parens` consed + onto a group-sequence list. + + * An element created by `[]` is represented by `'brackets` consed + onto a group-sequence list. + + * An element created by `{}` is represented by `'braces` consed + onto a group-sequence list. + + * A block is represented as either `'block` or `'alts` consed onto a + group-sequence list. The representation uses `'alts` if the content + of the block is a squence of groups started with `|`, and it's + `'block` otherwise. + + * A block created to follow `|` appears immediately in an `'alts` + list. + +Note that a block can only appear immediately in a `'group` or `'alts` +list. Note also that there is no possibility of confusion between +symbol atoms in the input and `'group`, `'block`, etc., at the start +of a list in an S-expression representation, because symbol atoms will +always appear as non-initial items in a `'group` list. + +Here are some example shrubberies with their S-expression parsed +representations: + +``` +define pi: 3.14 + +(group define pi (block (group 3.14))) +``` + +``` +define fourth(n: integer): + define m: n*n + define v: m*m + printf("~a^4 = ~a\n", n, v) + v + +(group define + fourth + (parens (group n (block (group integer)))) + (block + (group define m (block (group n (op *) n))) + (group define v (block (group m (op *) m))) + (group printf + (parens (group "\"~a^4 = ~a\\n\"") (group n) (group v))) + (group v))) +``` + +``` +if x = y +| same +| different + +(group if x (op =) y (alts (block (group same)) + (block (group different)))) +``` + +``` +define fib(n): + match n + | 0: 0 + | 1: 1 + | n: fib(n-1) + fib(n-2) + +(group define + fib + (parens (group n)) + (block + (group match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group n + (block + (group fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))))) +``` + +## Lexeme Parsing +[lexeme-parsing]: #lexeme-parsing + +The tokens used for grouping and indentation are distinct lexemes: + +``` +( ) [ ] { } ; , : | « » \ +``` + +Other lexemes are described by the grammar in the table below, where +an asterisk in the left column indicates the productions that +correspond to lexemes. Only simple forms of numbers are supported +directly (decimal integers, decimal floating point, and hexadecimal +integers, in all cases allowing `_`s between digits), but a `#{`...`}` +escape provides access to the full Racket S-expression number grammar. +Boolean literals are Racket-style, instead of reserving identifiers. +Special floating-point values similarly use a `#` notation. + +Identifiers are formed from Unicode alphanumeric characters plus `_`, +where the initial character must not be a numeric character. An +identifier prefixed with `~` forms a keyword, analogous to prefixing an +identifier with `#:` in Racket. + +Operators are formed from Unicode symbolic and punctuation characters +other than the ones listed above as distinct lexemes (plus a few more, +like `"` and `'`), but `|` or `:` is also +allowed in an operator name as long as it is not by itself. A +multi-character operator cannot end in `+`, `-`, or `.` to avoid +ambiguity in cases like `1+-2` (which is `1` plus `-2`, not `1` and +`2` combined with a `+-` operator), unless the operator contains +only `+`, `-`, or `.` (so `++`, `--`, and `...` are allowed). +Also, multi-character operator cannot end with `/` or contain `//` or +`/*`, because that can create ambiguities with comments. + +Implicit in the grammar is the usual convention of choosing the +largest possible match at the start of a stream. Not reflected in the +grammar is a set of delimiter requirements: numbers, `#true`, and +`#false` must be followed by a delimiter. For example, `1x` is a +lexical error, because the `x` after `1` is not a delimiter. +Non-alphanumeric characters other than `_` and `.` are delimiters. +Finally, the treatment of `+` and `-` as a number prefix versus an +operator is subject to a special rule: they are parsed as operators +when immediately preceded by an alphanumeric character, `_`, `)`, `]`, or `}` +with no whitespace in between. For +example, `1+2` is `1` plus `2`, but `1 +2` is `1` followed by the +number `+2`. + +When a `#{`...`}` escape describes an identifier S-expression, it is +an identifier in the same sense as a shrubbery-notation identifier. +the same holds for numbers, booleans, strings, byte strings, and +keywords. A `#{`...`}` escape must _not_ describe a pair, because +pairs are used to represent a parsed shrubbery, and allowing pairs +would create ambiguous or ill-formed representations. + +A `@` starts an at-expression form similar to the notaton supported by +`#lang at-exp` (which oriented toward S-expressions and +readtable-based). The next subsection explains in more detail, but the +tabel below sketches the shape of `@` forms. + +| | nonterminal | | production | adjustment | +|---|-----------------|-----|-----------------------------------------------------------|--------------------------------| +| * | _identifier_ | is | _alpha_ _alphanum_ * | | +| | | | | | +| | _alpha_ | is | **an alphabetic Unicode character or** `_` | | +| | | | | | +| | _alphanum_ | is | _alpha_ | | +| | | or | **a numeric Unicode character** | | +| | | | | | +| * | _keyword_ | is | `~` _identifier_ | | +| | | | | | +| * | _operator_ | is | _opchar_ * _tailopchar_ | **not** `❘` **or** `:` ... | +| | | or | `.` + | ... **or containing** `//` ... | +| | | or | `+` + | ... **or containing** `/*` | +| | | or | `-` + | | +| | | | | | +| | _opchar_ | is | **a symbolic Unicode character not in** _special_ | | +| | | or | **a punctuation Unicode character not in** _special_ | | +| | | or | **one of** `:`, `❘` | | +| | | | | | +| | _tailopchar_ | is | **anything in** _opchar_ **except** `+`, `-`, `.`, `/` | | +| | | | | | +| | _special_ | is | **one of** `(`, `)`, `[`, `]`, `{`, `}`, `«`, `»` | | +| | | or | **one of** `"`, `;`, `,`, `~`, `#`, `\`, `_`, `@` | | +| | | | | | +| * | _number_ | is | _integer_ | | +| | | or | _float_ | | +| | | or | _hexinteger_ | | +| | | | | | +| | _integer_ | is | _sign_ ? _nonneg_ | | +| | | | | | +| | _sign_ | is | **one of** `+` **or** `-` | | +| | | | | | +| | _nonneg_ | is | _decimal_ _usdecimal_ + | | +| | | | | | +| | _decimal_ | is | `0` through `9` | | +| | | | | | +| | _usdecimal_ | is | _decimal_ | | +| | | or | `_` _decimal_ | | +| | | | | | +| | _float_ | is | _sign_ ? _nonneg_ ? `.` _nonneg_? _exp_ ? | | +| | | or | _sign_ ? _nonneg_ _exp_ | | +| | | or | `#inf` | | +| | | or | `#neginf` | | +| | | or | `#nan` | | +| | | | | | +| | _exp_ | is | `e` _sign_ ? _nonneg_ | | +| | | or | `E` _sign_ ? _nonneg_ | | +| | | | | | +| | _hexinteger_ | is | `0x` _hex_ _ushex_ * | | +| | | | | | +| | _hex_ | is | **one of** `0` **through** `9` | | +| | | or | **one of** `a` **through** `f` | | +| | | or | **one of** `A` **through** `F` | | +| | | | | | +| | _ushex_ | is | _hex_ | | +| | | or | `_` _hex_ | | +| | | | | | +| * | _boolean_ | is | `#true` | | +| | | or | `#false` | | +| | | | | | +| * | _string_ | is | `"` _strelem_ * `"` | | +| | | | | | +| | _strelem_ | is | **element in Racket string** | `\U` ≤ 6 digits | +| | | | | | +| * | _bytestring_ | is | `#"` _bytestrelem_ * `"` | | +| | | | | | +| | _bytestrelem_ | is | **element in Racket byte string** | | +| | | | | | +| * | _sexpression_ | is | `#{` _racket_ `}` | | +| | | | | | +| | _racket_ | is | **any non-pair Racket S-expression** | | +| | | | | | +| * | _comment_ | is | `//` _nonnlchar_ | | +| | | or | `/*` _anychar_ `*/` | nesting allowed | +| | | | | | +| * | _termcomment_ | is | `#//` | | +| | | | | | +| | _nonnlchar_ | | **any character other than newline** | | +| | | | | | +| * | _atexpression_ | is | `@` _command_ ? _arguments_ ? _body_ ? | no space between these parts | +| | | | | | +| | _command_ | is | _identifier_ | | +| | | or | _keyword_ | | +| | | or | _operator_ | | +| | | or | _number_ | | +| | | or | _boolean_ | | +| | | or | _string_ | | +| | | or | _bytestring_ | | +| | | or | _racket_ | | +| | | or | `(` _group_ * `)` | usual comma-separated groups | +| | | or | `«` _group_ `»` | one spliceable group, no block | +| | | | | | +| | _arguments_ | is | `[` _group_ * `]` | usual comma-separated groups | +| | | | | | +| | _body_ | is | `{` _text_ `}` | possible escapes in _text_ | +| | | or | _atopen_ _text_ _atclose_ | _atcloser_ matching _atopen_ | +| | | | | | +| | _atopen_ | is | `❘` _asciisym_ * `{` | | +| | | | | | +| | _atclose_ | is | `}` _asciisym_ * `❘` | reverses and flips paren-like | + +# At-notation using `@` + +An `@` form of the shape + +``` + @«command ...»[arg, ...]{ body }... +``` + +is parsed into the same representation as + +``` + command ...(arg, ..., [parsed_body, ...], ...) +``` + +That is, the command part is left at the front and spliced into its +enclosing group, while the argument and body parts are wrapped with +parentheses to make them like arguments. Each body text is parsed into +a list of string literals and escapes, and multiple body texts can +be provided in multiple `{`...`}`s. + +The command part usually does not have `«»`, and it is instead +usually written as an identifier, operator, or parenthesized term. The +argument and body parts, when present, always use `[]` and `{}`, +respectively. Any of the three kinds parts can be omitted, but when +multiple parts are present, they must have no space between them or +the leading `@`. When the argument and body parts are both +omitted, the command part is simply spliced into its context. + +The conversion to a call-like form, keeping each body in a separate +list, and allowing multiple body arguments are the three main ways +that shrubbery `@` notation differs from `#lang at-exp` notation. The +other differences are the use of `«`...`»` instead of `|`...`|` for +delimiting a command, and the use of `@//` instead of `@;` for +comments. The details are otherwise meant to be the same, and the rest +of this section is mostly a recap. + +A body part is treated as literal text, except where `@` is used in a +body to escape. An unescaped `}` closes a body, except that an +unescaped `{` must be balanced by an unescaped `}`, with both treated +as part of the body text. Instead of `{`, a body-starting opener can +be `|` plus `{` with any number of ASCII punctuation and symbol +characters (other than `{`) in between; the corresponding closer is +then the same sequence in reverse, except that some characters are +flpped: `{` to `}`, `(` to `)`, `)` to `(`, `[` to `]`, `]` to `[`, +`<` to `>`, and `>` to `<`. With an `|`...`{` opener, an escape is +formed by using the opener followed by `@`, while opener–closer pairs +balance within the body text. When multiple body parts are provided, +each can use a different opener and closer. The parsed form of the +body breaks up the body text into lines and `"\n"` as separate string +literals in the parsed list form, with each escape also being its own +element in the list form. Parsed body text also has leading and +trailing whitespace adjusted the same as with `#lang at-exp`. + +After the `@` of an escape in body text, the escape has the same +form as an at-notaton form that starts with `@` as a shubbery. That +is, `@` forms are essentially the same whether starting in shrubbery +mode or body-text mode. + +In body text, there are two additional comment forms that are not +supported in shrubbery mode. A `@//{` starts a block comment that ends +with `}`, and the comment form is not part of the body text. The `@//` +comment form must be prefixed with an opener when its enclosing body +is started with an opener that isn't just `{`, and the `{` after `@//` +can more generally be an `|`...`{` opener with the corresponding +closer. Opener–closer pairs must be balanced in the commented block, +the same as in body text. A `@//` comment form (prefixed with an +opener as needed to form an escape) that is not followed by `{` or an +`|`...`{` opener comments out the rest of the line, including a +comment-terminating newline. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +See [parse.rkt](parse.rkt). Note that if you run with no arguments, +the that program will read from stdin. Supply one or more files to +read from those files instead of stdin. Supply `--recover` to continue +parsing after indentation or _closer_ errors. + +See [demo.shrb](demo.shrb), [interp.shrb](interp.shrb), and +[weird.shrb](weird.shrb) for more examples. + +# Drawbacks +[drawbacks]: #drawbacks + +Shrubbery notation may not be a good choice where precise and complete +grouping is needed, both because its grouping is coarse-grained and +the grouping rules generate lots of extra `group` layers. + +Shrubbery notation does not resolve the question of how infix +expressions parse. There is no precedence at the shrubbery level, for +example, other than the way that a `:` has higher precedence (in a +sense) than `|`. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +The lexeme-level syntax is chosen to be familiar to programmers +generally. The sequence `1+2` is one plus two, not a strangely spelled +identifier. Tokens like `(`, `,`, `{` and `;` are used in familiar +ways. Shrubbery notation provides enough grouping structure that code +navigation and transformation should be useful and straightforward in an +editor. + +Parentheses in shrubbery notation do not disable indentation, unlike +some indentation-sensitive notations. That choice supports a language +in shrubbery notation where parentheses can be added around any +expression — even if the expression is written with indentation +(although the expression may need to be shifted right to preserve +relative indentation, depending on how parentheses are added). + +The inclusion of `|` in shrubbery notation reflects the fact that +conditional forms (such a `if`, `cond`, and `match`) are important and +common. A distinct, pleasant, and uniform pattern for conditionals +deserves direct support in the notation. + +Requiring a preceding `:` or preceding/following `|` for +block-creating indentation is mostly a kind of consistency check to +enable better and earlier errors when indentation goes wrong. It also +allows indentation that starts with an operator to continue a group; +it's possible for bad indentation to inadvertently cause an operator +to be treated as continuing a group, but hopefully that will be rare. +Always requiring a preceding `:` before an indented `|` line would be +consistent, but it adds extras `:`s where `|` already provides one +consistency check. Allowing an optional `:` before `|` would work, but +programmers may then choose differently on omitting or including the +`:`, leading to subtly divergent conventions. + +Explicit block grouping via `«` and `»` is expected to be rare. The +grouping characters were intentionally chosen from the Latin-1 +extension of ASCII to avoid reserving additional ASCII characters. + +Making whitespace and comment lines ignored in all contexts means that +they can be freely added without intefering with grouping. The `\` +continuation operator is somewhat unusual in that it skips blank and +comment lines to continue, as opposed to requiring `\` on every +continuing line; that, too, allows extra blank and comment lines to be +added, even amid continuing lines. + +The interaction of indentation and `\` differs slightly from Python, +which does not count the space for `\` itself or any leading +whitespace on a continuing line toward indentation. Counting the +leading whitespace on a continuing line has the advantage that it can +reach an arbitrary amount of identation within a constrained textual +width. Counting the `\` itself is consistent with ignoring `\` when it +appears within a line, so grouping stays the same whether there's a +newline or the continue line immediately after `\`. The whitespace +role of `\` also means that spaces can be turned into `\` to “harden” +code for transfer via media (such as email) that might mangle +consecutive spaces. + +Using `~` for keywords has a precedent in OCaml. Using `~` for +keywords uses up a character that might otherwise be used for +operators, but keywords seem useful enough to be worth this cost. The +notion of keywords as distinct from identifiers has been liberating +for Racket syntax (particularly since keywords can be kept disintinct +from expressions more generally), and we expect similar benefits for +having keywords in shrubbery notation. + +The `#{....}` escape to S-expressions bridges between shrubbery +notation and Racket identifiers. For example, `#{exact-integer?}` is +an identifier with `-` and `?` as part of the identifier. Shrubbery +notation could be adapted to support Lisp-style identifiers by +requiring more space around operators, but the rule for continuing a +group between `(` and `)` or `[` and `]` currently depends on +distinguishing operators from non-operators. + +For `@`, the choice of treating `@f[arg]{text}` as `f(arg, ["text"])` +instead of `f(arg, "text")` reflects experience with S-expression `@` +notation. Although it seems convenient that, say `@bold{x}` is treated +as `(bold "x")`, the consequence is that a function like `bold` might +be implemented at first to take a single argument; later, a use like +`@bold{Hello @name}` breaks, because two arguments are provided. +Making explicit the list that's inherent in body parsing should help +reduce such mistakes (or bad design choices) for functions that are +meant to be used with `@` notation. + +# Prior art +[prior-art]: #prior-art + +Indentation-sensitive parsing and the use of `:` is obviously informed +by Python. + +Sampling notation's rules relating indentation, lines, `;`, and `:` +are originally based on the [`#lang +something`](https://github.com/tonyg/racket-something) reader, which +also targets an underlying expander that further groups tokens. +Shrubbery notation evolved away from using `{}` for blocks, however, +because `:` was nearly always preferred in experiements with the +notation. For the very rare case that explicit gropuing is needed for +a block, `«` and `»` can be used. Freeing `{}` from use for blocks, +meanwhile, allows its use for set and map notations. + +Shrubbery notation is also based on +[Lexprs](https://github.com/jeapostrophe/racket2-rfcs/blob/lexpr/lexpr/0004-lexpr.md), +particularly its use of `|`. Lexprs uses mandatory `:` and `|` tokens +as a prefix for indentation, and it absorbs an additional line after +an indented section to allow further chaining of the group. Although +`«»` can be used to form multiple subgroups within a shrubbery group, +the notation discourages that style in favor of further nesting (or, +in the case of `if`, in favor of `|` notation like other +conditionals). + +Shrubbery notation is in some sense a follow-up to [sapling +notation](https://github.com/mflatt/racket2-rfcs/blob/sapling/sapling/0005-sapling.md). +The primary difference is that shrubbery notation is +indentation-sensitive, while sapling notation is +indentation-insensitive. Indentation sensitivity and block conventions +in shrubbery notation avoid some delimiters and blank lines that are +needed in sapling notation. + +More generally, shrubbery notation takes inspiration from +S-expressions and alternative S-expression notations. The idea that, +even in an S-expression-like setting, some parsing can be deferred a +later parser has many precedents, including Clojure's choice of where +to put parentheses and notations that use something like `$` to escape +to infix mode. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + + +# Future possibilities +[future-possibilities]: #future-possibilities + +Like other notation designs, this one leaves open exactly the way that +the notation would be used to express a new programming language. The +examples are meant to be suggestive and have influenced many of the +notational choices, though. diff --git a/shrubbery/demo.shrb b/shrubbery/demo.shrb new file mode 100644 index 000000000..5f2911d11 --- /dev/null +++ b/shrubbery/demo.shrb @@ -0,0 +1,296 @@ +let (x = 1, + y = 2): + x+y + +let (x = 1, y = 2): x+y + +define pi: 3.14 + +define fib(n): + log_error("fib called") + cond | n == 0: 0 + | n == 1: 1 + | else: fib(n-1) + fib(n-2) + +define + | fib(0): 0 + | fib(1): 1 + | fib(n): fib(n-1) + fib(n-2) + +define + | fib(0): 0 + | fib(1): 1 + | fib(n): fib(n-1) + + fib(n-2) + +define fib: + lambda (n): + cond + | n == 0: 0 + | n == 1: 1 + | else: fib(n-1) + fib(n-2) + +// Ok to add `:` before `|`. This parses the +// same as the prevous example, but this is not the standard +// style (which is to omit unnecessary colons). +define fib: + lambda (n): + cond + | n == 0: 0 + | n == 1: 1 + | else: fib(n-1) + fib(n-2) + +// Adding parentheses is ok, at least with the obvious handling +// of parentheses by the use of sapling notation, but the +// parentheses are apparent in the sampling parse. +(define fib: + (lambda (n): + (cond + | (n == 0): 0 + | (n == 1): 1 + | else: (fib(n-1) + fib(n-2))))) + +// For maximal noise, you could add parentheses and trailing colons. +// But we won't. + +// START: all of the next `fib` definitions are exactly the same + +define fib(n): + match n + | 0: 0 + | 1: 1 + | n: fib(n-1) + fib(n-2) + +define fib(n): + match n + | 0: 0 + | 1: 1 + | n: fib(n-1) + fib(n-2) + +define fib(n): + match n | 0: 0 + | 1: 1 + | n: fib(n-1) + fib(n-2) + +define fib(n): + match n + | 0: + 0 + | 1: + 1 + | n: + fib(n-1) + fib(n-2) + +define fib(n): + match n « | 0: « 0 » + | 1: « 1 » + | n: « fib(n-1) + fib(n-2) » » + +define fib(n): + match n « | « 0: 0 » | « 1: 1 » | n: fib(n-1) + fib(n-2) » + +define fib(n): match n « | « 0: 0 » | « 1: 1 » | n: fib(n-1) + fib(n-2) » + +define fib(n): « match n « | « 0: 0 » | « 1: 1 » | n: fib(n-1) + fib(n-2) » » + +define fib(n): « match n « | 0: «0» | 1: «1» | n: « fib(n-1) + fib(n-2) » » » + +define fib(n): « match n « | « 0: «0» » | « 1: «1» » | « n: « fib(n-1) + fib(n-2) » » » » + +define fib(n): « match n | « 0: «0» » | « 1: «1» » | « n: « fib(n-1) + fib(n-2) » » » + +// END equivalent `fib` definitions + +define make_adder(n): + lambda (m): + printf("adding to ~a\n", m) + m+n + +define analyze(n): + if n == 0 + | printf("zero\n") + | printf("other\n") + printf("done\n") + +define analyze(n): + if n == 0 + | printf("zero\n") + printf("done saying zero\n") + | printf("other\n") + printf("done saying other\n") + +struct posn(x, y) + +struct color_posn(col): + extends posn + mutable + +struct posn(x mutable, + y = 7): + methods equality: + define equal(a, b): + is_posn(b) => (a.x == b.x && a.y == b.y) + define hash(a): + 17 + define secondary_hash(a): + 19 + +struct posn(x, y): + property prop_equal_and_hash: + let (hc = lambda (a: posn, hc): + hc(a.x) + hc(a.y), + eql = lambda (a: posn, b: posn, eql): + eql(a.x, b.x) && eql(a.y, b.y)): + values(eql, hc, hc) + +struct posn(x, y): + property prop_equal_and_hash: + let (hc = lambda (a: posn, hc): + hc(a.x) + hc(a.y)): + (lambda (a: posn, b: posn, eql): + eql(a.x, b.x) && eql(a.y, b.y), + hc, + hc) + +// Another possibile approach to syntax for `struct`: +struct posn: + fields: + x mutable + y = 7 + methods equality: + define equal(a, b): + is_posn(b) => (a.x == b.x && a.y == b.y) + define hash(a): + 17 + define secondary_hash(a): + 19 + property prop_quality: "high" + +define fourth(n :: Integer): + define m: n*n + define v: m*m + printf("~a^4 = ~a\n", n, v) + v + +define exp(n :: Integer, 'base': base = 2.718281828459045): + if (n == 1) + | base + | base * exp(n-1, 'base': base) + +define positive_p(n): if n > 0 | true | false + +define go(): + define helper(n): + list(n, n) + define more(m): + if m == 0 | "done" + | more(m - 1) + helper(more(9)) + +define approx(x): + match x + | something(v): + printf("got it\n") + v + | nothing: 0 + +// With two `:`s on one line, there's no way to +// add to the first `:` +define approx_thunk(x): + match x + | something(v): lambda (): v + | nothing: lambda (): 0 + +// Enough indentation for `v` means that it continues the +// implicit second `:`, so the `lambda` body has `v`: +define approx_thunk(x): + match x + | something(v): lambda (): + v + | nothing: lambda (): 0 + +define approx_thunk(x): + match x + | something(v): lambda + | (): v + | (n): v+n + | nothing: lambda + | (): 0 + | (n): n + +define curried: + lambda (x): + lambda (y): + lambda (z): + list(x, y, z) + +define curried: lambda (x): + lambda (y): + lambda (z): + list(x, y, z) + +define dictionary: dict: + foo: 17 + bar: string + baz: true + +define colors: + list( + red, + green, + blue, + orange, + ) + +define f(x_something, + y_something_else, + z_also_long_name): + 5 + +define sum(l): + let loop(l = l): + if is_null(l) + | 0 + | first(l) + loop(rest(l)) + +define show_all(l): + for (x = in_list(l)): + print(x) + newline() + +define show_zip(l, l2): + for (x = in_list(l), + x2 = in_list(l2)): + print(x) + print_string(" ") + print(x2) + newline() + +define show_combos_not_same(l, l2): + for (x = in_list(l)): + then (x2 = in_list(l2)): + when !is_equal(x, x2): + print(x) + print_string(" ") + print(x2) + newline() + +define map(f, l): + for list (x = in_list(l)): + f(x) + +define partition(l, pred): + for fold (yes = empty, + no = empty, + result (reverse(yes), reverse(no))): + with (x = in_list(l)): + if pred(x) + | (cons(x, yes), no) + | (yes, cons(x, no)) + +local: + with: + define x: 1 + define y: 2 + in: + x+y diff --git a/shrubbery/indentation.rkt b/shrubbery/indentation.rkt new file mode 100644 index 000000000..3fd6e047a --- /dev/null +++ b/shrubbery/indentation.rkt @@ -0,0 +1,516 @@ +#lang racket/base +(require racket/class + racket/list + "lex.rkt" + "private/edit-help.rkt") + +;; Conventions: +;; pos = arbitary position +;; s, e = range positions +;; start = start of a line +;; delta = amount to add to a line due to `\` on preceding lines +;; col, candidate, limit = virtual column, includes any relevant delta +;; tab = indentation relative to start, does not include delta + +(provide shrubbery-indentation + shrubbery-range-indentation + shrubbery-paren-matches) + +(define NORMAL-INDENT 2) +(define BAR-INDENT 0) + +(define (shrubbery-indentation t pos + #:multi? [multi? #f] + #:always? [always? multi?]) + (define start (line-start t pos)) + (define current-tab (get-current-tab t start)) + (cond + ;; If `always?` is #f, we got here by the Return key; + ;; don't indent when just inserting new lines + [(and (not always?) (= (line-start t (sub1 pos)) (sub1 pos))) + current-tab] + [else + ;; tabbing only makes sense if the target line is not a continuation + ;; or if it continues an empty line + (define delta (or (line-delta t start #:unless-empty? #t) 0)) + (cond + [(eqv? delta 0) + (define (like-enclosing #:as-bar? [as-bar? #f] + #:as-operator? [as-operator? #f] + #:also-zero? [also-zero? #f]) + (indent-like-enclosing-group t start current-tab + #:multi? multi? + #:as-bar? as-bar? + #:as-operator? as-operator? + #:also-zero? also-zero?)) + (case (send t classify-position (+ start current-tab)) + [(parenthesis) + (indent-like-parenthesis t start current-tab)] + [(bar-operator) + (like-enclosing #:as-bar? #t)] + [(comment) + (define group-comment? (is-group-comment? t (+ start current-tab))) + (like-enclosing #:as-bar? (and group-comment? + (bar-after-group-comment? t (+ start current-tab) start)) + #:also-zero? group-comment?)] + [(operator) + (like-enclosing #:as-operator? #t)] + [(string) + ;; check for being inside a string, which might be in `@` content + (define-values (s e) (send t get-token-range (+ start current-tab))) + (cond + [(< s (+ start current-tab)) + current-tab] + [else + (like-enclosing)])] + [else + (like-enclosing)])] + [else + ;; don't change indentation for a continuation line + current-tab])])) + +(define (shrubbery-range-indentation t s e) + (define s-line (send t position-paragraph s)) + (define e-line (let ([line (send t position-paragraph e)]) + (if (and (s . < . e) + (= e (send t paragraph-start-position line))) + (sub1 line) + line))) + (cond + [(= s-line e-line) + ;; use single-line mode + (define pos (send t paragraph-start-position s-line)) + (define amt (shrubbery-indentation t pos #:always? #t)) + (define current-amt (get-current-tab t pos)) + (list (list (max 0 (- current-amt amt)) + (make-string (max 0 (- amt current-amt)) #\space)))] + [else + ;; compute indentation for the first non-empty line; as long as that + ;; involves inserting space or deleting no more space than is + ;; available in all lines, shift all the lines the same + (define lines (get-non-empty-lines t s-line e-line)) + (cond + [(null? lines) '()] + [else + (define (line-position line) (send t paragraph-start-position line)) + (define pos (line-position (car lines))) + (define amt-or-multi-amt (shrubbery-indentation t pos #:multi? #t)) + (define amts (if (list? amt-or-multi-amt) + amt-or-multi-amt + (list amt-or-multi-amt))) + (define current-amt (get-current-tab t pos)) + (or + ;; try each possible shift: + (for/or ([amt (in-list amts)]) + (cond + [(current-amt . < . amt) + ;; insert in all lines + (define ins-str (make-string (- amt current-amt) #\space)) + (for/list ([line (in-range s-line (add1 e-line))]) + (list 0 (if (memv line lines) ins-str "")))] + [(current-amt . > . amt) + (define delta (- current-amt amt)) + (and (for/and ([line (in-list lines)]) + (delta . <= . (get-current-tab t (send t paragraph-start-position line)))) + (for/list ([line (in-range s-line (add1 e-line))]) + (list (if (memv line lines) delta 0) "")))] + [else #f])) + ;; no change + '())])])) + +(define (indent-like-enclosing-group t start current-tab + #:as-bar? [as-bar? #f] + #:as-operator? [as-operator? #f] + #:multi? [multi? #f] + #:also-zero? [also-zero? #f]) + ;; candidates are sorted right (larger tab) to left (smaller tab) + (define (add-zero l) (if also-zero? (add-zero-to-end l) l)) + (define candidates (remove-dups (indentation-candidates t (sub1 start) + #:as-bar? as-bar? + #:as-operator? as-operator?))) + (define delta (line-delta t start)) + (define tabs (add-zero + (for/list ([col (in-list candidates)] + #:when (col . >= . delta)) + (- col delta)))) + ;; if the current state matches a candidate tab, we'll + ;; use the next one (to the left) + (define next-tabs (memv current-tab tabs)) + (cond + [multi? + (if next-tabs + (append next-tabs + (take tabs (- (length tabs) (length next-tabs)))) + tabs)] + [(and next-tabs (pair? (cdr next-tabs))) + (cadr next-tabs)] + [(null? tabs) 0] + [else + ;; default to rightmost: + (car tabs)])) + +(define (indent-like-parenthesis t start current-tab) + (define-values (s e) (send t get-token-range (+ start current-tab))) + (define o-s (send t backward-match e 0)) + (cond + [o-s ; => s is closer + (define o-start (line-start t o-s)) + (define o-delta (line-delta t o-start)) + (define col (col-of o-s o-start o-delta)) + ;; line up with indentation position of opener's group + (define use-col + (or (and (positive? o-s) + (let () + (define paren (send t get-text (sub1 e) e)) + (define next-s (if (equal? paren "}") + (skip-redundant-block-operators t (sub1 o-s) o-start) + (sub1 o-s))) + (get-block-column t next-s col o-start + #:for-outdent? #t))) + col)) + (define s-delta (line-delta t start)) + (if (use-col . > . s-delta) + (- use-col s-delta) + 0)] + [else + ;; didn't find match, so treat like other tokens + (indent-like-enclosing-group t start current-tab)])) + +;; Gets list of candiates with further-right candidates first starting +;; search with the token that contains `pos` (inclusive on the left +;; edge of the token, starts before the line that we want to indent). +;; The search works going backwards to find enclosing groups. +(define (indentation-candidates t pos + #:as-bar? [as-bar? #f] + #:as-operator? [as-operator? #f]) + (let loop ([pos pos] + ;; possible target column, depending on what's we + ;; find by looking further back; for example, this + ;; candidate will be discarded if we find another + ;; identifier just before it + [candidate #f] + ;; filter out possibilities to the right (exclusive) + ;; of the limit + [limit #f] + ;; saw a bar in this group? + [bar-after? as-bar?] + ;; can also indent by an extra step? + [plus-one-more? as-operator?]) + ;; helper + (define (maybe-list col [plus-one-more? #f]) + (cond + [(not col) null] + [(or (not limit) (col . <= . limit)) + (if plus-one-more? + (list (+ col NORMAL-INDENT) col) + (list col))] + [else null])) + ;; helper: loops where the candidate also works as a refined limit + (define (loop* pos new-candidate plus-one-more?) + (loop pos new-candidate (min new-candidate (or limit new-candidate)) #f plus-one-more?)) + (define (keep s) + (define start (line-start t s)) + (define delta (line-delta t start)) + (define new-candidate (col-of (if as-bar? (+ s BAR-INDENT) s) start delta)) + (loop* (sub1 s) + ;; don't forget the old candidate if the new candidate would + ;; be too deeply indented + (if (or (not candidate) (new-candidate . <= . limit)) + new-candidate + candidate) + (and plus-one-more? + (or (eqv? new-candidate 0) + (= start (line-start t (sub1 s))))))) + (cond + [(eqv? limit -1) null] + [(negative? pos) (maybe-list candidate plus-one-more?)] + [else + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (case category + [(white-space comment continue-operator) + ;; we don't do anything special with continue-operator here, + ;; because we avoid looking at line numbers, anyway, and `line-delta` + ;; is responsible for computing continuation columns + (loop (sub1 s) candidate limit bar-after? plus-one-more?)] + [(block-operator) + ;; a block creates an indentation candidate that's + ;; to the right of the enclosing group's indentation + (define start (line-start t pos)) + (define delta (line-delta t start)) + (define block-col (if (zero? s) + 0 + (get-block-column t (sub1 s) (col-of s start delta) start))) + ;; redundant operators are not valid indentation points: + (define next-s (skip-redundant-block-operators t (sub1 s) start)) + ;; indentation under the block operator is valid + (define next-candidate (col-of (add1 next-s) start delta)) + ;; a `|` cannot appear just after a `:`, so look before that block + (define adj-block-col (if as-bar? (sub1 block-col) block-col)) + ;; look further outside this block, and don't consider anything + ;; that would appear to be nested in the block: + (define outer-candidates (loop next-s next-candidate (min* adj-block-col limit) #f #f)) + (append (cond + [(and bar-after? (not as-bar?)) + null] + [candidate + ;; we already have something after `:`, so + ;; use its indentation + (maybe-list candidate plus-one-more?)] + [else + ;; we haven't found anything in the block, so + ;; indent as the first thing in the block + (maybe-list (+ block-col (if as-bar? BAR-INDENT NORMAL-INDENT)))]) + outer-candidates)] + [else + (cond + [(and candidate + (eqv? candidate (if as-bar? BAR-INDENT 0)) + (or (not limit) (limit . >= . (if as-bar? BAR-INDENT 0)))) + ;; already found minimal column, so stop here + (maybe-list candidate plus-one-more?)] + [else + (case category + [(parenthesis) + (define paren (send t get-text (sub1 e) e)) + (cond + [(opener? paren) + ;; we're inside parentheses, brackets, etc. + (cond + [(and bar-after? (not as-bar?)) + ;; opener followed by a bar: no more candidates + null] + [candidate (maybe-list candidate plus-one-more?)] + [(zero? s) (maybe-list NORMAL-INDENT)] + [else + (define start (line-start t pos)) + (define delta (line-delta t start)) + ;; first position within parens/braces/brackets; if + ;; indentation for the bracket's group is before the bracket, + ;; then "outdent" that far + (define col (col-of s start delta)) + (define next-s (if (equal? paren "{") + ;; outdent past redundant operators: + (skip-redundant-block-operators t (sub1 s) start) + (sub1 s))) + (define block-col (get-block-column t next-s col start + #:for-outdent? #t)) + (if (and block-col (block-col . < . col)) + (maybe-list (+ block-col NORMAL-INDENT)) + (maybe-list (+ col NORMAL-INDENT)))])] + [else + ;; found parenthesized while walking backward + (define r (send t backward-match e 0)) + (cond + [(not r) + ;; matching open not found + (cond + [(zero? s) (maybe-list candidate)] + [else (keep s)])] + [(zero? r) null] + [else (keep r)])])] + [(bar-operator) + (define start (line-start t pos)) + ;; look back to see whether there's another bar on the + ;; same line: + (define-values (another-bar-start limit-pos) + (find-bar-same-line t s start as-bar?)) + (cond + [(or (and another-bar-start as-bar?) + (next-is-block-bracket? t e)) + ;; don't treat the current bar as a source + ;; of indentation: + (loop (sub1 s) #f (min* s limit) #t #f)] + [as-bar? + (define delta (line-delta t start)) + (define col (col-of s start delta)) + (cond + [(not limit-pos) (maybe-list col)] + [else + ;; within the current group but outside the found bar, + ;; a new bar can only line up with outer candidates + ;; beyond the found bar + (define b-start (line-start t limit-pos)) + (define b-delta (line-delta t b-start)) + (define b-col (col-of limit-pos b-start b-delta)) + (append (maybe-list col) + ;; outer candidates: + (loop (sub1 limit-pos) + b-col + (min* (min col (sub1 b-col)) limit) + #t + #f))])] + [else + ;; line up within bar or outside [another] bar + (define bar-column (col-of s start (line-delta t start))) + (append (maybe-list (or candidate (+ bar-column NORMAL-INDENT)) (and candidate plus-one-more?)) + (if (not limit-pos) + null + ;; outer candidates + (loop (sub1 (or another-bar-start s)) #f (min* bar-column limit) #t #f)))])] + [(separator) + (cond + [(equal? ";" (send t get-text (sub1 e) e)) + (loop (sub1 s) candidate limit bar-after? #f)] + [candidate (maybe-list candidate plus-one-more?)] + [else + (define i-pos (get-inside-start t pos)) + (define start (line-start t i-pos)) + (define delta (line-delta t start)) + (maybe-list (col-of i-pos start delta) plus-one-more?)])] + [else + (keep s)])])])]))) + +;; Returns: +;; * the position of a bar before `orig-pos` with the same line +;; start as reflected by `at-start` (but taking continuing lines +;; into account), or #f if there's not one +;; * the position where the bar's enclosing block starts, but only +;; reliably if `as-bar?` and the first result is #f, or #f (all modes) +;; to mean that not outer group is possible +(define (find-bar-same-line t orig-pos at-start as-bar?) + (let loop ([pos (sub1 orig-pos)] [at-start at-start] [limit-pos (sub1 orig-pos)]) + (cond + [(negative? pos) (values #f limit-pos)] + [else + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (case category + [(white-space comment) + (loop (sub1 s) at-start limit-pos)] + [(continue-operator) + (loop (sub1 s) (line-start t s) limit-pos)] + [else + (define pos-start (line-start t pos)) + (cond + [(pos-start . < . at-start) (values #f limit-pos)] + [else + (case category + [(parenthesis) + (cond + [(opener? (send t get-text (sub1 e) e)) + (values #f #f)] + [else + ;; Found parenthesized while walking backward + (define r (send t backward-match e 0)) + (cond + [(not r) (loop (sub1 s) at-start s)] + [else (loop (sub1 r) (line-start t r) r)])])] + [(bar-operator) + (cond + [(not as-bar?) + ;; keep looking back + (define-values (another-bar-pos limit-pos) (loop (sub1 s) at-start 0)) + (values (or another-bar-pos s) limit-pos)] + [else + ;; don't need to find limit-pos + (values s 0)])] + [else (loop (sub1 s) at-start s)])])])]))) + +;; Skips back to an unmatched opener and returns the +;; position of the first thing after it (not counting +;; whitespace or comments) +(define (get-inside-start t pos) + (let loop ([pos pos] [last-pos #f]) + (cond + [(pos . <= . 0) 0] + [else + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (case category + [(white-space comment continue-operator) + (loop (sub1 s) last-pos)] + [(parenthesis) + (cond + [(opener? (send t get-text (sub1 e) e)) + (or last-pos e)] + [else + ;; Found parenthesized while walking backward + (define r (send t backward-match e 0)) + (loop (sub1 (or r s)) (or r s))])] + [else (loop (sub1 s) s)])]))) + +;; block operator just at `pos`+1 +(define (skip-redundant-block-operators t pos at-start) + ;; no block operators are redundant, anymore + pos) + +(define (is-group-comment? t pos) + (define-values (s e) (send t get-token-range pos)) + (and (= e (+ s 3)) + (equal? "#//" (send t get-text s e)))) + +(define (bar-after-group-comment? t pos at-start) + (let loop ([pos pos]) + (cond + [(= pos (send t last-position)) #f] + [else + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (case category + [(white-space comment) (loop e)] + [(bar-operator) (> (line-start t pos) at-start)] + [else #f])]))) + +(define (next-is-block-bracket? t pos) + (let loop ([pos pos]) + (cond + [(= pos (send t last-position)) #f] + [else + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (case category + [(white-space comment) (loop e)] + [(parenthesis) (equal? (send t get-text s e) "«")] + [else #f])]))) + +(define (get-non-empty-lines t s-line e-line) + (let loop ([line s-line]) + (cond + [(line . > . e-line) '()] + [(only-whitespace-between? t + (send t paragraph-start-position line) + (send t paragraph-end-position line)) + (loop (add1 line))] + [else (cons line (loop (add1 line)))]))) + +;; determine current indentation starting with `start` +(define (get-current-tab t start) + (define e (send t last-position)) + (let loop ([pos start]) + (cond + [(= pos e) 0] + [else + (define str (send t get-text pos (add1 pos))) + (cond + [(whitespace? str) + (+ 1 (loop (add1 pos)))] + [else 0])]))) + +(define (whitespace? str) + (and (= (string-length str) 1) + (char-whitespace? (string-ref str 0)) + (not (equal? str "\n")))) + +(define (remove-dups l) + (cond + [(null? l) null] + [(null? (cdr l)) l] + [(eqv? (car l) (cadr l)) (remove-dups (cdr l))] + [else (cons (car l) (remove-dups (cdr l)))])) + +(define (min* a b) + (min a (or b a))) + +(define (add-zero-to-end l) + (let loop ([l l]) + (cond + [(null? l) (list 0)] + [(eqv? (car l) 0) l] + [else (cons (car l) (loop (cdr l)))]))) + +(define shrubbery-paren-matches + '((|(| |)|) + (|[| |]|) + (|{| |}|) + (« »))) diff --git a/shrubbery/interp.shrb b/shrubbery/interp.shrb new file mode 100644 index 000000000..b99032457 --- /dev/null +++ b/shrubbery/interp.shrb @@ -0,0 +1,119 @@ +/* An interpreter from Utah CS 3520, originally witten in Plait, + just to see how thins work out. The curly-brace S-expressions + may not make sense. Given the usefulness of S-expressions + in the class, it's not clear that departing from S-expressions + would be a good idea there, but it seemed like an interesting + realistic program to try */ + +type Value + | numV(n :: Number) + | closV(arg :: Symbol, + body :: Exp, + env :: Env) + +type Exp + | numE(n :: Number) + | idE(s :: Symbol) + | plusE(l :: Exp, + r :: Exp) + | multE(l :: Exp, + r :: Exp) + | letE(n :: Symbol, + rhs :: Exp, + body :: Exp) + | lamE(n :: Symbol, + body :: Exp) + | appE(fun :: Exp, + arg :: Exp) + +type Binding + | bind(name :: Symbol, + val :: Value) + +type_alias Env :: Listof(Binding) + +define mt_env :: empty + +define extend_env :: cons + +modulet test: + print_only_errors(#true) + +// interp ---------------------------------------- +define interp(a :: Exp, env :: Env) :: Value: + typecase Exp a + | numE(n): numV(n) + | idE(s): lookup(s, env) + | plusE(l, r): num_plus(interp(l, env), interp(r, env)) + | multE(l, r): num_times(interp(l, env), interp(r, env)) + | letE(n, rhs, body): interp(body, + extend_env(bind(n, interp(rhs, env)), + env)) + | lamE(n, body): closV(n, body, env) + | appE(fun, arg): typecase Value interp(fun, env) + | closV(n, body, c_env): + interp(body, + extend_env(bind(n, interp(arg, env)), + c_env)) + | error('interp, "not a function") +modulet test: + test: + interp(parse(`2), mt_env) + numV(2) + + test { interp(parse(`2), mt_env) ; numV(2) } + test { interp(parse(`3), mt_env) ; numV(3) } + + test_exn: + interp(parse(`x), mt_env) + "free variable" + + test: + interp(parse(`x), + extend_env(bind('x, numV(9)), mt_env)) + numV(9) + + test: + interp(parse(`{+ 2 1}), mt_env) + numV(3) + test: + interp(parse(`{* 2 1}), mt_env) + numV(2) + + test: + interp(parse(`{+ {* 2 3} {+ 5 8}}), mt_env) + numV(1) + + test: + interp(parse(`{lambda {x} {+ x x}}), + mt_env) + closV('x, plusE(idE('x), idE('x)), mt_env) + + test: + interp(parse(`{let {[x 5]} + {+ x x}}), + mt_env) + numV(1) + +// num+ and num* ---------------------------------------- +define num_op(op :: Number Number :: Number, l :: Value, r :: Value) :: Value: + cond + | is_numV(l) && is_numV(r): + numV(op(numV_n(l), numV_n(r))) + | else: + error('interp, "not a number") + +define num_plus(l :: Value, r :: Value) :: Value: + num_op(op +, l, r) + +define num_mult(l :: Value, r :: Value) :: Value: + num_op(op *, l, r) + +modulet test: + test: + num_plus(numV(1), numV(2)) + numV(3) + + test: + num_mult(numV(2), numV(3)) + numV(6) diff --git a/shrubbery/lex.rkt b/shrubbery/lex.rkt new file mode 100644 index 000000000..71d5f6fa0 --- /dev/null +++ b/shrubbery/lex.rkt @@ -0,0 +1,760 @@ +#lang racket/base + +(require parser-tools/lex + racket/contract + (prefix-in : parser-tools/lex-sre) + "private/property.rkt") + +(provide lex/status + lex-all + + token-name + ;; 'identifier + ;; 'literal + ;; 'comment + ;; 'whitespace + ;; + ;; 'operator + ;; 'block-operator + ;; 'continue-operator + ;; 'bar-operator + ;; + ;; 'opener + ;; 'closer + ;; 'comma-operator + ;; 'semicolon-operator + ;; + ;; 's-exp + ;; + ;; 'EOF + ;; + ;; 'fail + + token? + token-value + token-e + token-line + token-column + token-srcloc + token-rename + + syntax->token + stx-for-original-property + + current-lexer-source) + +(define-lex-abbrevs + + ;; For case insensitivity + [e (char-set "eE")] + + [digit (:/ "0" "9")] + [digit_ (:or digit (:: digit "_"))] + [digit16 (:/ "af" "AF" "09")] + [digit16_ (:or digit16 (:: digit16 "_"))] + [digit8 (:/ "0" "7")] + + [langchar (:or (:/ "az" "AZ" "09") "+" "-" "_")] + + ;; does not constrain to avoid surrogates: + [unicode (:or (:: "u" (:** 1 4 digit16)) + (:: "U" (:** 1 6 digit16)))] + + [str (:: "\"" (:* string-element ) "\"")] + + [string-element (:or (:~ "\"" "\\") + (:: "\\" unicode) + string-escape)] + + [byte-str (:: "#\"" (:* byte-string-element) "\"")] + [byte-string-element (:or (:- (:/ "\x00" "\xFF") "\"" "\\") + string-escape)] + [string-escape (:or "\\\"" + "\\\\" + "\\a" + "\\b" + "\\t" + "\\n" + "\\v" + "\\f" + "\\r" + "\\e" + "\\'" + (:: "\\" (:** 1 3 digit8)) + (:: "\\x" (:** 1 2 digit16)) + (:: "\\" #\newline))] + + [bad-str (:: (:? "#") "\"" + (:* (:~ "\"" "\\") + (:: "\\" any-char)) + (:? "\\" "\""))] + + [boolean (:or "#true" "#false")] + + [special-number (:: "#" + (:or "#inf" + "#neginf" + "#nan"))] + + [bad-hash (:- (:or (:: "#" (:* non-delims)) + "#/") + boolean + special-number)] + + [exponent-marker e] + [sign (char-set "+-")] + + [script (:: "#!" (:or #\space #\/) (:* (:~ #\newline) (:: #\\ #\newline)))] + + [identifier (:: (:or alphabetic "_") + (:* (:or alphabetic numeric "_")))] + [opchar (:or (:- symbolic (:or "~")) + (:- punctuation (:or "," ";" "#" "\\" "_" "@" "\"" + "(" ")" "[" "]" "{" "}" "«" "»")))] + [operator (:- (:or opchar + (:: (:* opchar) (:- opchar "+" "-" "." "/")) + (:+ ".") + (:+ "+") + (:+ "-")) + "|" ":" + (:: (:* any-char) (:or "//" "/*") (:* any-char)))] + + [keyword (:: "~" identifier)] + [bad-keyword (:: "~")] + + ;; disallows a number that starts +, -, or "." + [number/continuing (:or decimal-number/continuing + hex-number)] + [number (:: (:? sign) + (:or decimal-number + hex-number))] + + [uinteger (:: (:* digit_) digit)] + [uinteger16 (:: (:* digit16_) digit16)] + + [decimal-number/continuing (:or (:: uinteger number-exponent) + (:: uinteger "." (:? uinteger) number-exponent))] + [decimal-number (:or decimal-number/continuing + (:: "." uinteger number-exponent))] + [number-exponent (:or "" (:: exponent-marker (:? sign) uinteger))] + [hex-number (:: "0x" uinteger16)] + + [bad-number/continuing (:- (:: digit (:+ non-number-delims)) + identifier + number/continuing)] + [bad-number (:- (:: (:? sign) digit (:+ non-number-delims)) + identifier + number)] + [bad-comment "*/"] + + [non-number-delims (:or non-delims ".")] + [non-delims (:or alphabetic numeric "_")]) + +(define (ret name lexeme #:raw [raw #f] type paren start-pos end-pos status) + (values (make-token name lexeme start-pos end-pos raw) + type paren (position-offset start-pos) (position-offset end-pos) status)) + +(define stx-for-original-property (read-syntax #f (open-input-string "original"))) +(define current-lexer-source (make-parameter "input")) + +(define (make-token name e start-pos end-pos [raw #f]) + (define offset (position-offset start-pos)) + (define loc (vector (current-lexer-source) + (position-line start-pos) + (position-col start-pos) + offset + (- (position-offset end-pos) + offset))) + (token name (let loop ([e e] [raw raw]) + (let ([e (if (pair? e) + (let p-loop ([e e]) + (cond + [(null? (cdr e)) (list (loop (car e) raw))] + [else (cons (loop (car e) #f) + (p-loop (cdr e)))])) + e)] + [raw (if (pair? e) #f raw)]) + (define stx (datum->syntax #f + e + loc + stx-for-original-property)) + (if (eq? name 'comment) + stx + (syntax-raw-property stx (or raw (if (string? e) e '())))))))) + +(define (read-line-comment name lexeme input-port start-pos + #:status [status 'initial] + #:consume-newline? [consume-newline? #f]) + (let ([comment (apply string (append (string->list lexeme) (read-line/skip-over-specials input-port + consume-newline?)))]) + (define-values (end-line end-col end-offset) (port-next-location input-port)) + (values (make-token name comment start-pos (position end-offset end-line end-col)) + 'comment #f + (position-offset start-pos) + end-offset + status))) + +(define get-next-comment + (lexer + ["/*" (values 1 end-pos lexeme)] + ["*/" (values -1 end-pos lexeme)] + [(:or "/" "*" (:* (:~ "*" "/"))) + (let-values ([(delta end-pos rest-lexeme) (get-next-comment input-port)]) + (values delta end-pos (string-append lexeme rest-lexeme)))] + [(eof) (values 'eof end-pos "")] + [(special) + (get-next-comment input-port)] + [(special-comment) + (get-next-comment input-port)])) + +(define (read-nested-comment num-opens start-pos lexeme input) + (define-values (diff end next-lexeme) (get-next-comment input)) + (cond + [(eq? 'eof diff) (ret 'fail eof 'error #f start-pos end 'initial)] + [else + (define all-lexeme (string-append lexeme next-lexeme)) + (define next-num-opens (+ diff num-opens)) + (cond + [(= 0 next-num-opens) (ret 'comment all-lexeme 'comment #f start-pos end 'initial)] + [else (read-nested-comment next-num-opens start-pos all-lexeme input)])])) + +(define (get-offset i) + (let-values (((x y offset) (port-next-location i))) + offset)) + +(define (read-line/skip-over-specials i consume-newline?) + (let loop () + (define next (peek-char-or-special i)) + (cond + [(eq? next #\newline) + (cond + [consume-newline? + (read-char-or-special i) + (list #\newline)] + [else null])] + [(eof-object? next) + null] + [else + (read-char-or-special i) + (if (char? next) + (cons next (loop)) + (loop))]))) + +(struct s-exp-mode (depth) #:prefab) +(struct in-at (mode opener shrubbery-status openers) #:prefab) +(struct in-escaped (shrubbery-status at-status) #:prefab) + +(define (lex/status in pos status racket-lexer/status) + (let-values ([(tok type paren start end status) + (let loop ([status status]) + (cond + [(s-exp-mode? status) + ;; within `#{}` + (unless racket-lexer/status + (error "shouldn't be in S-expression mode without a Racket lexer")) + (define depth (s-exp-mode-depth status)) + (cond + [(and (zero? depth) + (eqv? #\} (peek-char in))) + ;; go out of S-expression mode by using shrubbery lexer again + (shrubbery-lexer/status in)] + [else + (define-values (tok type paren start end s-exp-status) + (racket-lexer/status in)) + (values tok type paren start end (case s-exp-status + [(open) + (s-exp-mode (add1 depth))] + [(close) + (s-exp-mode (sub1 depth))] + [else status]))])] + [(in-at? status) + ;; within an `@` sequence + (at-lexer in status (lambda (status) (loop status)))] + [(in-escaped? status) + (define-values (t type paren start end sub-status) + (loop (in-escaped-shrubbery-status status))) + (values t type paren start end (struct-copy in-escaped status + [shrubbery-status sub-status]))] + [(eq? status 'continuing) + ;; normal mode, after a form + (shrubbery-lexer-continuing/status in)] + [else + ;; normal mode, at start or after an operator or whitespace + (shrubbery-lexer/status in)]))]) + (cond + [(and (eq? (token-name tok) 'at-content) + (eqv? 0 (string-length (token-e tok)))) + ;; a syntax coloring lexer must not return a token that + ;; consumes no characters, so just drop it by recurring + (lex/status in pos status racket-lexer/status)] + [else + (define backup (cond + ;; If we have "@/{" and we add a "/" after the existing one, + ;; we'll need to back up more: + [(eq? (token-name tok) 'at-opener) 1] + [(and (in-at? status) (eq? (token-name tok) 'operator)) 2] + [else 0])) + (values tok type paren start end backup status)]))) + +(define-syntax-rule (make-lexer/status number bad-number) + (lexer + [(:+ whitespace) + (ret 'whitespace lexeme 'white-space #f start-pos end-pos 'initial)] + [str (ret 'literal (parse-string lexeme) #:raw lexeme 'string #f start-pos end-pos 'datum)] + [byte-str (ret 'literal (parse-byte-string lexeme) #:raw lexeme 'string #f start-pos end-pos 'datum)] + [bad-number + (ret 'fail lexeme 'error #f start-pos end-pos 'continuing)] + [number + (ret 'literal (parse-number lexeme) #:raw lexeme 'constant #f start-pos end-pos 'continuing)] + [special-number + (let ([num (case lexeme + [("#inf") +inf.0] + [("#neginf") -inf.0] + [("#nan") +nan.0])]) + (ret 'literal num #:raw lexeme 'constant #f start-pos end-pos 'continuing))] + [boolean + (ret 'literal (equal? lexeme "#true") #:raw lexeme 'constant #f start-pos end-pos 'continuing)] + ["//" (read-line-comment 'comment lexeme input-port start-pos)] + ["/*" (read-nested-comment 1 start-pos lexeme input-port)] + ["#//" + (ret 'group-comment lexeme 'comment #f start-pos end-pos 'initial)] + [(:: (:or "#lang " "#!") + (:or langchar + (:: langchar (:* (:or langchar "/")) langchar))) + (ret 'comment lexeme 'other #f start-pos end-pos 'initial)] + [(:: (:or "#lang " "#!") (:* (:& any-char (complement whitespace)))) + (ret 'fail lexeme 'error #f start-pos end-pos 'initial)] + [script + (ret 'comment lexeme 'comment #f start-pos end-pos 'initial)] + [(:or "(" "[" "{" "«") + (ret 'opener lexeme 'parenthesis (string->symbol lexeme) start-pos end-pos 'initial)] + [(:or ")" "]" "}" "»") + (ret 'closer lexeme 'parenthesis (string->symbol lexeme) start-pos end-pos 'continuing)] + ["#{" + (ret 's-exp lexeme 'parenthesis '|{| start-pos end-pos (s-exp-mode 0))] + [":" + (ret 'block-operator lexeme 'block-operator #f start-pos end-pos 'initial)] + ["|" + (ret 'bar-operator lexeme 'bar-operator #f start-pos end-pos 'initial)] + ["\\" + (ret 'continue-operator lexeme 'continue-operator #f start-pos end-pos 'initial)] + ["," + (ret 'comma-operator lexeme 'separator #f start-pos end-pos 'initial)] + [";" + (ret 'semicolon-operator lexeme 'separator #f start-pos end-pos 'initial)] + [identifier + (ret 'identifier (string->symbol lexeme) #:raw lexeme 'symbol #f start-pos end-pos 'continuing)] + [operator + (ret 'operator (list 'op (string->symbol lexeme)) #:raw lexeme 'operator #f start-pos end-pos 'initial)] + [keyword + (let ([kw (string->keyword (substring lexeme 1))]) + (ret 'identifier kw #:raw lexeme 'keyword #f start-pos end-pos 'continuing))] + ["@//" + (let ([opener (peek-at-opener input-port)]) + (if opener + (ret 'at-comment lexeme 'at-comment (string->symbol lexeme) start-pos end-pos (in-at 'open opener 'initial '())) + (read-line-comment 'at-comment lexeme input-port start-pos)))] + ["@" + (let ([opener (peek-at-opener input-port)]) + (define mode (if opener 'open 'initial)) + (ret 'at lexeme 'at (string->symbol lexeme) start-pos end-pos (in-at mode opener 'initial '())))] + [(special) + (cond + [(or (number? lexeme) (boolean? lexeme)) + (ret 'literal lexeme 'constant #f start-pos end-pos 'continuing)] + [(string? lexeme) + (ret 'literal lexeme 'string #f start-pos end-pos 'continuing)] + [(keyword? lexeme) + (ret 'literal lexeme 'hash-colon-keyword #f start-pos end-pos 'continuing)] + [else + (ret 'literal lexeme 'no-color #f start-pos end-pos 'continuing)])] + [(special-comment) + (ret 'comment "" 'comment #f start-pos end-pos 'initial)] + [(eof) (ret-eof start-pos end-pos)] + [(:or bad-str bad-keyword bad-hash bad-comment) + (ret 'fail lexeme 'error #f start-pos end-pos 'bad)] + [any-char (extend-error lexeme start-pos end-pos input-port)])) + +(define (ret-eof start-pos end-pos) + (values (make-token 'EOF eof start-pos end-pos) 'eof #f #f #f #f)) + +(define shrubbery-lexer/status (make-lexer/status number bad-number)) +(define shrubbery-lexer-continuing/status (make-lexer/status number/continuing bad-number/continuing)) + +;; after reading `@`, we enter an at-exp state machine for whether +;; we're in the initial part, within `[]`, or within `{}`; we have to +;; perform some parsing here to balance openers and closers; we leave +;; wehite trimming to the parser layer +(define (at-lexer in status recur) + (define in-mode (in-at-mode status)) + (define (get-expected opener/ch ch/closer) + (define (get-all-expected s) + (for ([ch (in-string s)]) + (unless (eqv? ch (read-char in)) + (error "inconsistent input" ch)))) + (define start-pos (next-location-as-pos in)) + (define eof? + (cond + [(string? opener/ch) + (get-all-expected opener/ch) + (unless (eqv? ch/closer (read-char in)) + (error "inconsistent opener input" ch/closer)) + #f] + [else + (define ch (read-char in)) + (cond + [(eof-object? ch) #t] + [else + (unless (eqv? opener/ch ch) + (error "inconsistent closer input" opener/ch)) + (get-all-expected ch/closer) + #f])])) + (define end-pos (next-location-as-pos in)) + (values start-pos end-pos eof?)) + (case in-mode + ;; 'initial mode is right after `@` without immediate `{`, and we + ;; may transition from 'initial mode to 'brackets mode at `[` + [(initial brackets) + ;; recur to parse in shrubbery mode: + (define-values (t type paren start end sub-status) + (recur (in-at-shrubbery-status status))) + ;; to keep the term and possibly exit 'initial or 'brackets mode: + (define (ok status) + (values t type paren start end (cond + [(and (not (s-exp-mode? sub-status)) + (null? (in-at-openers status))) + ;; either `{`, `[`, or back to shrubbery mode + (define opener (peek-at-opener in)) + (cond + [opener + (in-at 'open opener sub-status '())] + [(and (not (eq? in-mode 'brackets)) + (eqv? #\[ (peek-char in))) + (in-at 'brackets #f sub-status '())] + [(in-escaped? sub-status) + (in-escaped-at-status sub-status)] + [else sub-status])] + [else + ;; continue in-at mode + status]))) + ;; converts a token to an error token: + (define (error status) + (values (struct-copy token t [name 'fail]) 'error #f start end status)) + ;; update the shrubbery-level status, then keep the term or error, + ;; trackig nesting depth through the status as we continue: + (let ([status (struct-copy in-at status + [shrubbery-status sub-status])]) + (case (token-name t) + [(opener) (ok (struct-copy in-at status + [openers (cons (token-e t) + (in-at-openers status))]))] + [(closer) + (cond + [(and (pair? (in-at-openers status)) + (closer-for? (token-e t) (car (in-at-openers status)))) + (ok (struct-copy in-at status + [openers (cdr (in-at-openers status))]))] + [else + (error status)])] + [else (ok status)]))] + ;; 'open mode is right `@` when the next character is `{` + [(open) + (define opener (in-at-opener status)) + (define-values (start-pos end-pos eof?) (get-expected opener #\{)) + (ret 'at-opener (string-append opener "{") 'parenthesis '|{| start-pos end-pos + (struct-copy in-at status [mode 'inside] [openers 0]))] + ;; 'inside mode means in `{}` and not currently escaped, and we + ;; transition to 'escape mode on a `@`, and we transition to 'close mode + ;; on a `}` that is not balancing a `{` within `{}` + [(inside) + (define opener (in-at-opener status)) + (define start-pos (next-location-as-pos in)) + (define o (open-output-string)) + (let loop ([depth (in-at-openers status)]) + (define ch (peek-char in)) + (cond + [(eqv? ch #\newline) + ;; convert a newline into a separate string input + (define s (get-output-string o)) + (cond + [(= 0 (string-length s)) + (read-char in) + (define end-pos (next-location-as-pos in)) + (ret 'at-content "\n" 'string #f start-pos end-pos + (struct-copy in-at status [mode 'inside] [openers depth]))] + [else + (define end-pos (next-location-as-pos in)) + (ret 'at-content s 'string #f start-pos end-pos + (struct-copy in-at status [mode 'inside] [openers depth]))])] + [(or (eof-object? ch) + (peek-at-closer in #:opener opener)) + (cond + [(zero? depth) + ;; `lex/status` will handle the case that the content is empty + (define end-pos (next-location-as-pos in)) + (ret 'at-content (get-output-string o) 'string #f start-pos end-pos + (struct-copy in-at status [mode 'close]))] + [else + (if (equal? opener "") + (write-char (read-char in) o) + (write-string (read-string (add1 (bytes-length opener)) in) o)) + (loop (sub1 depth))])] + [(peek-at-prefixed #\@ in #:opener opener) + ;; `lex/status` will handle the case that the content is empty + (define end-pos (next-location-as-pos in)) + (ret 'at-content (get-output-string o) 'string #f start-pos end-pos + (struct-copy in-at status [mode 'escape] [openers depth]))] + [(peek-at-opener in #:opener opener) + (if (equal? opener "") + (write-char (read-char in) o) + (write-string (read-string (add1 (bytes-length opener)) in) o)) + (loop (add1 depth))] + [else + (write-char (read-char in) o) + (loop depth)]))] + ;; 'escape mode means in `{}`, not currently escaped, and expect `@` next + [(escape) + (define opener (in-at-opener status)) + (define-values (start-pos end-pos eof?) (get-expected opener #\@)) + (cond + [(read-at-comment in) + => (lambda (slashes) + (cond + [(peek-at-opener in) + => (lambda (opener) + ;; block comment + (define end-pos (next-location-as-pos in)) + (ret 'at-comment (string-append opener "@" slashes) 'at-comment #f start-pos end-pos + (in-at 'open opener (in-escaped 'initial (struct-copy in-at status [mode 'inside])) '())))] + [else + ;; line comment + (read-line-comment 'comment (string-append opener "@" slashes) in start-pos + #:status (struct-copy in-at status [mode 'inside]) + #:consume-newline? #t)]))] + [else + (define next-opener (peek-at-opener in)) + (define mode (if next-opener 'open 'initial)) + (ret 'at (string-append opener "@") 'at #f start-pos end-pos + (in-at mode next-opener (in-escaped 'initial (struct-copy in-at status [mode 'inside])) '()))])] + ;; 'close mode handles the final `}` of a `{}` + [(close) + (define closer (at-opener->closer (in-at-opener status))) + (define-values (start-pos end-pos eof?) (get-expected #\} closer)) + (cond + [eof? (ret-eof start-pos end-pos)] + [else + (define sub-status (in-at-shrubbery-status status)) + (ret 'at-closer (string-append "}" closer) 'parenthesis '|}| start-pos end-pos + (if (in-escaped? sub-status) + (in-escaped-at-status sub-status) + sub-status))])] + [else (error "unknown at-exp state")])) + +(define (peek-at-opener in #:opener [opener #f]) + (cond + [opener + ;; look for another instance of the current opener + (peek-at-prefixed #\{ in #:opener opener)] + [else + ;; look for a fresh opener + (define ch (peek-char in)) + (cond + [(eqv? ch #\{) ""] + [(eqv? ch #\|) + (let loop ([chars '(#\|)] [offset 1]) + (define ch (peek-char in offset)) + (cond + [(eqv? ch #\{) (list->string (reverse chars))] + [(and ((char->integer ch) . < . 128) + (or (char-symbolic? ch) + (char-punctuation? ch))) + (loop (cons ch chars) (add1 offset))] + [else #f]))] + [else #f])])) + +(define (peek-at-prefixed ch in #:opener opener) + (let loop ([offset 0]) + (cond + [(= offset (string-length opener)) + (if (eqv? ch (peek-char in offset)) + opener + #f)] + [(eqv? (peek-char in offset) (string-ref opener offset)) + (loop (add1 offset))] + [else #f]))) + +(define (peek-at-closer in #:opener [opener #f]) + (define ch (peek-char in)) + (cond + [(eqv? ch #\}) + (let loop ([offset 0]) + (cond + [(= offset (string-length opener)) opener] + [(eqv? (peek-char in (add1 offset)) + (flip-at-bracket (string-ref opener (- (string-length opener) offset 1)))) + (loop (add1 offset))] + [else #f]))] + [else #f])) + +(define (read-at-comment in) + (and (eqv? (peek-char in) #\/) + (eqv? (peek-char in 1) #\/) + (begin + (read-char in) + (read-char in) + "//"))) + +(define (flip-at-bracket ch) + (case ch + [(#\<) #\>] + [(#\>) #\<] + [(#\[) #\]] + [(#\]) #\[] + [(#\() #\)] + [(#\)) #\(] + [else ch])) + +(define (at-opener->closer opener) + (cond + [(eqv? 0 (string-length opener)) ""] + [else + (list->string (reverse (for/list ([ch (in-string opener)]) + (flip-at-bracket ch))))])) + +(define (next-location-as-pos in) + (define-values (line col pos) (port-next-location in)) + (position pos line col)) + +(define (extend-error lexeme start end in) + (define next (peek-char-or-special in)) + (if (or (memq next + `(special + #\" #\, #\' #\` #\( #\) #\[ #\] #\{ #\} #\; + ,eof)) + (char-whitespace? next)) + (ret 'fail lexeme 'error #f start end 'bad) + (let-values (((rest end-pos) (get-chunk in))) + (ret 'fail (string-append lexeme rest) 'error #f start end-pos 'bad)))) + +(define get-chunk + (lexer + [(:+ whitespace) (values lexeme end-pos)])) + +(define (parse-number s) + (if (and ((string-length s) . > . 2) + (eqv? #\x (string-ref s 1))) + (string->number (regexp-replace* #rx"_" (substring s 2) "") 16) + (string->number (regexp-replace* #rx"_" s "")))) + +(define (parse-string s) + (read (open-input-string s))) + +(define (parse-byte-string s) + (read (open-input-string s))) + +(define (parse-char s) + (define str + (read (open-input-string (string-append "\"" + (substring s 1 (sub1 (string-length s))) + "\"")))) + (string-ref str 0)) + +(struct token (name value)) +(struct located-token token (srcloc)) + +(define (token-e t) + (syntax-e (token-value t))) + +(define (token-line t) + (if (located-token? t) + (srcloc-line (located-token-srcloc t)) + (syntax-line (token-value t)))) + +(define (token-column t) + (let ([c (if (located-token? t) + (srcloc-column (located-token-srcloc t)) + (syntax-column (token-value t)))]) + (if (and c (eq? (token-name t) 'bar-operator)) + (+ c 0.5) + c))) + +(define (token-srcloc t) + (cond + [(located-token? t) + (located-token-srcloc t)] + [else + (define s (token-value t)) + (srcloc (syntax-source s) + (syntax-line s) + (syntax-column s) + (syntax-position s) + (syntax-span s))])) + +(define (token-rename t name) + (struct-copy token t [name name])) + +(define (syntax->token name s [srcloc #f]) + (if srcloc + (located-token name s srcloc) + (token name s))) + +;; Runs `lex/status` in a loop, but switches to `finish-s-exp` +;; for an S-expression escape: +(define (lex-all in fail + #:keep-type? [keep-type? #f] + #:source [source (object-name in)]) + (parameterize ([current-lexer-source source]) + (let loop ([status 'initial]) + (define-values (tok type paren start-pos end-pos backup new-status) + (lex/status in (file-position in) status #f)) + (define (wrap r) + (if keep-type? + (vector r type paren) + r)) + (case (token-name tok) + [(EOF) '()] + [(fail) (fail tok "read error")] + [(s-exp) + (cons (wrap (finish-s-exp tok in fail)) (loop 'continuing))] + [else + (cons (wrap tok) (loop new-status))])))) + +(define (finish-s-exp open-tok in fail) + (define v (read-syntax (current-lexer-source) in)) + (when (eof-object? v) + (fail open-tok "expected S-expression after `#{`")) + (define end-pos + (let loop () + (define-values (line col pos) (port-next-location in)) + (define c (read-char in)) + (cond + [(eof-object? c) + (fail v "expected `}` after S-expression")] + [(eqv? c #\}) + (add1 pos)] + [(char-whitespace? c) + (loop)] + [else + (define bad (datum->syntax #f c (list (current-lexer-source) + line + col + pos + 1))) + (fail bad "expected only whitespace or `}` after S-expression")]))) + (define result + (syntax->token (if (identifier? v) 'identifier 'literal) + v + (let ([loc (token-srcloc open-tok)]) + (struct-copy srcloc loc + [span (- end-pos (srcloc-position loc))])))) + (when (pair? (syntax-e v)) + (fail result "S-expression in `#{` and `}` must not be a pair")) + result) + +(define (closer-for? cl op) + (equal? cl (case op + [("(") ")"] + [("[") "]"] + [("{") "}"] + [("«") "»"] + [else #f]))) diff --git a/shrubbery/main.rkt b/shrubbery/main.rkt new file mode 100644 index 000000000..61ee569ad --- /dev/null +++ b/shrubbery/main.rkt @@ -0,0 +1,42 @@ +#lang racket/base + +(module reader racket/base + (require syntax/strip-context + "parse.rkt") + + (provide (rename-out [shrubbery-read read] + [shrubbery-read-syntax read-syntax]) + get-info + get-info-proc) + + (define (shrubbery-read in) + (syntax->datum + (shrubbery-read-syntax #f in))) + + (define (shrubbery-read-syntax src in) + (strip-context + #`(module anything racket/base + '#,(parse-all in)))) + + (define (get-info in mod line col pos) + (lambda (key default) + (get-info-proc key default (lambda (key default) default)))) + + (define (get-info-proc key default make-default) + (case key + [(color-lexer) + (dynamic-require 'shrubbery/syntax-color + 'shrubbery-lexer)] + [(drracket:indentation) + (dynamic-require 'shrubbery/indentation + 'shrubbery-indentation)] + [(drracket:range-indentation) + (dynamic-require 'shrubbery/indentation + 'shrubbery-range-indentation)] + [(drracket:paren-matches) + (dynamic-require 'shrubbery/indentation + 'shrubbery-paren-matches)] + [(drracket:grouping-position) + (dynamic-require 'shrubbery/navigation + 'shrubbery-grouping-position)] + [else (make-default key default)]))) diff --git a/shrubbery/navigation.rkt b/shrubbery/navigation.rkt new file mode 100644 index 000000000..0233bcbca --- /dev/null +++ b/shrubbery/navigation.rkt @@ -0,0 +1,299 @@ +#lang racket/base +(require racket/class + racket/list + "lex.rkt" + "private/edit-help.rkt") + +;; Conventions are the same as in "indentation.rkt" + +(provide shrubbery-grouping-position) + +(define (shrubbery-grouping-position t pos limit-pos direction) + (case direction + [(backward) + (define-values (s e) (skip-whitespace #:and-separators? #t t (sub1 pos) -1)) + (define category (send t classify-position s)) + (case category + [(parenthesis) + (or (send t backward-match e 0) + e)] + [(bar-operator block-operator) e] + [else s])] + [(forward) + (define-values (s e) (skip-whitespace #:and-separators? #t t pos 1)) + (define category (send t classify-position s)) + (case category + [(parenthesis) + (or (send t forward-match s (send t last-position)) + s)] + [(block-operator bar-operator) + (define-values (next-s next-e) (skip-whitespace t e 1)) + (define start (line-start t next-s)) + (define delta (line-delta t start)) + (skip-to-shallower t e (col-of next-s start delta) + #:bar-stop-line (and (eq? category 'bar-operator) + start))] + [else e])] + [(down) + (define-values (s e) (skip-whitespace #:and-separators? #t t pos 1)) + (define category (send t classify-position s)) + (case category + [(parenthesis) + (and (opener? (send t get-text s e)) + e)] + [(block-operator bar-operator) + e] + [else #f])] + [(up) + (define start (line-start t pos)) + (case (send t classify-position pos) + [(bar-operator) + ;; immediately before a `|` is a special case + (start-of-alts t pos)] + [else + (define-values (s e) (skip-whitespace #:and-separators? #t t (sub1 pos) -1 + #:stay-on-line start)) + (define category (send t classify-position s)) + (case category + [(parenthesis) + (if (opener? (send t get-text s e)) + s + (start-of-group #:or-out? #t t e start))] + [(block-operator bar-operator) + (start-of-group #:or-out? #t t e start)] + [else + (start-of-group #:or-out? (eq? s pos) t s start)])])] + [else #f])) + +(define (skip-whitespace t pos dir + #:and-separators? [and-separators? #f] + #:stay-on-line [stay-on-line #f]) + (define end-pos (send t last-position)) + (cond + [(= pos -1) (send t get-token-range 0)] + [(pos . > . end-pos) (send t get-token-range (sub1 end-pos))] + [else + (let loop ([pos pos] [stay-on-line stay-on-line]) + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (define (continue #:ok-to-change-line? [ok-to-change-line? #f]) + (define start (and stay-on-line (line-start t s))) + (cond + [(and stay-on-line + (not ok-to-change-line?) + (not (eqv? start stay-on-line)) + (not (eq? category 'white-space))) + (if (dir . < . 0) + (send t get-token-range e) + (send t get-token-range (sub1 s)))] + [else + (if (dir . < . 0) + (if (zero? s) + (values s e) + (loop (- s 1) start)) + (if (eqv? e end-pos) + (values s e) + (loop e start)))])) + (case category + [(white-space comment) + (continue)] + [(continue-operator) + (continue #:ok-to-change-line? #t)] + [(separator) + (if and-separators? + (continue) + (values s e))] + [else (values s e)]))])) + +;; find the end of a block as a column less than `col`, but +;; if `bar-stop-line` is given, also stop at a bar on that line +(define (skip-to-shallower t pos col + #:bar-stop-line [bar-stop-line #f]) + (define end-pos (send t last-position)) + (let loop ([pos pos] [last-e #f]) + (cond + [(= pos end-pos) end-pos] + [else + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (case category + [(white-space comment continue-operator) + (loop e last-e)] + [(separator) s] + [else + (define start (line-start t s)) + (define delta (line-delta t start)) + (define new-col (col-of s start delta)) + (cond + [(new-col . < . col) (or last-e s)] + [(and (eq? category 'bar-operator) + (eqv? start bar-stop-line)) + (or last-e s)] + [else + (case category + [(parenthesis) + (define o-e (send t forward-match s end-pos)) + (if o-e + (loop o-e o-e) + s)] + [else (loop e e)])])])]))) + +;; return the start of the group containing `pos`, but if `pos` +;; is already the start of the group and `or-out?1, return the start of the +;; enclosing group +(define (start-of-group t orig-pos at-start + #:or-out? or-out?) + (define (finish last-pos) + (or last-pos + (if or-out? + (start-of-enclosing-block t orig-pos) + orig-pos))) + (let loop ([pos orig-pos] [last-pos #f] [at-start at-start]) + (cond + [(= pos 0) (or last-pos 0)] + [else + (define-values (s e) (send t get-token-range (sub1 pos))) + (define category (send t classify-position s)) + (case category + [(white-space comment) + (define start (line-start t s)) + (if (eqv? start at-start) + (loop s last-pos at-start) + (finish last-pos))] + [(continue-operator) + (loop s last-pos (line-start t s))] + [(block-operator) + (or last-pos (if or-out? s orig-pos))] + [(bar-operator) + (or last-pos (if or-out? s orig-pos))] + [(parenthesis) + (define o-s (send t backward-match e 0)) + (cond + [o-s ; => `s` is a closer + (define start (line-start t e)) + (if (eqv? start at-start) + (loop o-s o-s (line-start t o-s)) + (finish last-pos))] + [else ; `s` is an opener + (or last-pos (if or-out? s orig-pos))])] + [(separator) + (finish last-pos)] + [else + (define start (line-start t s)) + (if (eqv? start at-start) + (loop s s at-start) + (finish last-pos))])]))) + +;; return the start of the block or parens containing the group containg `pos` +(define (start-of-enclosing-block t pos) + (define start (line-start t pos)) + (define col (col-of pos start (line-delta t start))) + (cond + [(zero? col) + ;; use column 0 as a proxy for being in a top-level block; we + ;; don't really want to go up (to the start of the buffer) + ;; from there + pos] + [else + (let loop ([pos pos] [last-pos #f]) + (cond + [(= pos 0) 0] + [else + (define-values (s e) (send t get-token-range (sub1 pos))) + (define category (send t classify-position s)) + (case category + [(white-space comment continue-operator) (loop s last-pos)] + [(parenthesis) + (define o-s (send t backward-match e 0)) + (cond + [o-s ; => `s` is a closer + (loop o-s o-s)] + [else ; `s` is an opener + s])] + [(bar-operator) + ;; needs to be outdented relative to initial `pos` + (define s-start (line-start t pos)) + (define s-col (col-of pos s-start (line-delta t s-start))) + (cond + [(s-col . < . col) last-pos] + [else (loop s last-pos)])] + [(block-operator) + (define s-col (get-block-column t (sub1 s) #f (line-start t s))) + (cond + [(s-col . < . col) (or last-pos s)] + [else (loop s last-pos)])] + [else + (loop s s)])]))])) + +;; give a `pos` that's right before a bar, return the start of the +;; first `|` in the block, or return the start of the enclsing block +;; if `pos` is at the first bar +(define (start-of-alts t pos) + (define at-start (line-start t pos)) + (define col (col-of pos at-start (line-delta t at-start))) + (define (outdented? s) + (define start (line-start t s)) + (define s-col (col-of s start (line-delta t s))) + (s-col . <= . col)) + (define (select last-bar last-block last-pos s) + (or last-bar (and last-pos last-block) last-pos s)) + (let loop ([pos pos] ; checking before + [last-bar #f] ; candidate bar (to move just before) most recently found (right line or column) + [col col] ; column of most recently found `last-bar` + [last-block #f] ; candidate `:`, only counts if `last-pos` is set for outdented + [last-pos #f] ; candiate start of an outdented block + [bar-start at-start] ; line for current block (to detect same-line candidates) + [at-start at-start]) ; line for candidate block start (to detect when moved too early) + (cond + [(= pos 0) 0] + [else + (define-values (s e) (send t get-token-range (sub1 pos))) + (define category (send t classify-position s)) + (case category + [(white-space comment) + (loop s last-bar col last-block last-pos + bar-start at-start)] + [(continue-operator) + (define s-start (line-start t s)) + (loop s last-bar col last-block last-pos + (if (= at-start bar-start) s-start bar-start) s-start)] + [else + (define s-start (line-start t pos)) + (cond + [(and (s-start . < . at-start) + (select last-bar last-block last-pos #f)) + => (lambda (pos) pos)] + [else + (case category + [(parenthesis) + (define o-s (send t backward-match e 0)) + (cond + [o-s ; => `s` is a closer + (loop o-s last-bar col last-block (if (outdented? o-s) o-s last-pos) + bar-start (line-start t o-s))] + [else ; `s` is an opener + (select last-bar last-block last-pos s)])] + [(bar-operator) + (define s-col (col-of s s-start (line-delta t s-start))) + (cond + [(eqv? bar-start s-start) + ;; same line, so earlier bar the in same block + (loop s s s-col #f #f bar-start 0)] + [else + (cond + [(s-col . > . col) + ;; more indented: ignore + (loop s last-bar col last-block last-pos bar-start at-start)] + [(s-col . < . col) + ;; less indented + (select last-bar last-block last-pos s)] + [else + ;; same indented: + (loop s s col #f #f s-start 0)])])] + [(block-operator) + (cond + [(outdented? s) (select last-bar last-block last-pos s)] + [else (loop s last-bar col s #f bar-start s-start)])] + [else + (loop s last-bar col last-block (if (outdented? s) s last-pos) bar-start s-start)])])])]))) + diff --git a/shrubbery/parse.rkt b/shrubbery/parse.rkt new file mode 100644 index 000000000..cfc4857b9 --- /dev/null +++ b/shrubbery/parse.rkt @@ -0,0 +1,1270 @@ +#lang racket/base +(require racket/pretty + "lex.rkt" + "srcloc.rkt" + (submod "print.rkt" for-parse) + "private/property.rkt" + "private/at-space.rkt") + +(provide parse-all) + +;; Parsing state at the group level: +(struct state (line ; current group's line and last consumed token's line + column ; group's column; below ends group, above starts indented + operator-column ; column for operator that continues group on a new line + paren-immed? ; immediately in `()` or `[]`? + bar-closes? ; does `|` always end a group? + bar-closes-line ; `|` (also) ends a group on this line + block-mode ; 'inside, #f, `:` or `|` token, or 'end + delta ; column delta created by `\`, applied to `line` continuation + raw ; reversed whitespace (and comments) to be remembered + at-mode)) ; look for `@` continuation after term: #f, 'initial, or 'no-initial + +(define (make-state #:line line + #:column column + #:operator-column [operator-column #f] + #:paren-immed? [paren-immed? #f] + #:bar-closes? [bar-closes? #f] + #:bar-closes-line [bar-closes-line #f] + #:block-mode [block-mode #f] + #:delta delta + #:raw [raw null]) + (state line + column + operator-column + paren-immed? + bar-closes? + bar-closes-line + block-mode + delta + raw + #f)) + +;; Parsing state for group sequences: top level, in opener-closer, or after `:` +(struct group-state (closer ; expected closer: a string, EOF, or column (maybe 'any as column) + paren-immed? ; immediately in `()` or `[]`? + column ; not #f => required indentation check checking + check-column? ; #f => allow any sufficiently large (based on closer) indentation + bar-closes? ; does `|` always end the sequence of groups? + bar-closes-line ; `|` (also) ends a sequence of groups on this line + block-mode ; 'inside, #f, `:` or `|` token, or 'end + comma-time? ; allow and expect a comma next + sequence-mode ; 'any, 'one, or 'none + last-line ; most recently consumed line + delta ; column delta created by `\`, applies to `last-line` continuation + commenting ; pending group-level `#//` token; exclusive with `tail-commenting` + tail-commenting ; pending group-level `#//` at end (so far) + raw)) ; reversed whitespace (and comments) to be remembered + +(define (make-group-state #:closer closer + #:paren-immed? [paren-immed? #f] + #:column column + #:check-column? [check-column? #t] + #:bar-closes? [bar-closes? #f] + #:bar-closes-line [bar-closes-line #f] + #:block-mode [block-mode #f] + #:sequence-mode [sequence-mode 'any] + #:last-line last-line + #:delta delta + #:commenting [commenting #f] + #:raw [raw null]) + (group-state closer + paren-immed? + column + check-column? + bar-closes? + bar-closes-line + block-mode + #f + sequence-mode + last-line + delta + #f + commenting + raw)) + +(define (closer-column? c) (or (eq? c 'any) (number? c))) + +(define closer-expected? pair?) +(define (closer-expected closer) (if (pair? closer) (car closer) closer)) +(define (closer-expected-opener closer) (and (pair? closer) (cdr closer))) +(define (make-closer-expected str tok) (cons str tok)) + +(define group-tag (syntax-raw-property (datum->syntax #f 'group) '())) +(define top-tag (syntax-raw-property (datum->syntax #f 'top) '())) +(define parens-tag (syntax-raw-property (datum->syntax #f 'parens) '())) + +;; ---------------------------------------- + +;; In the parsed representation of a shrubbery, source locations are +;; not associated with sequences tagged `group` or `top`. For terms +;; tagged with `parens`, `braces`, `block`, `alts`, and `op`, the same +;; source location is associated with the tag and the parentheses that +;; group the tag with its members, and that location spans the content +;; and any delimiters used to form it (such as parentheses or a `:` +;; that starts a block). + +;; Raw text is preserved as syntax properties so that the original +;; program text can be reconstructed. Raw text is represented as a +;; `cons`-based tree of strings to be rendered through an inorder +;; traversal. The relevant syntax properties are 'raw, 'raw-prefix, +;; and 'raw-tail. The 'raw property on an atom is the original text +;; for the atom; whitespace and comments that precede the atom are +;; normally attached as 'raw-prefix. For a compound term, the +;; contained terms must be traversed to gather all the raw text; in +;; addition to 'raw-prefix and 'raw attached to the compound-term tag, +;; a 'raw-tail property holds raw text to be written after all of the +;; compound term's subterms. For example, the closing `)` for a +;; `parens` term will be attached to 'raw-tail on the 'parens tag. The +;; `group` and `top` tags may have non-empty 'raw-prefix and 'raw-tail +;; syntax properties, even though those tags don't have source +;; locations. + +;; Parse all groups in a stream +(define (parse-top-groups l) + (define-values (gs rest-l end-line end-delta end-t tail-commenting tail-raw) + (parse-groups l (make-group-state #:closer eof + #:column #f + #:check-column? #f + #:last-line -1 + #:delta 0))) + (when tail-commenting (fail-no-comment-group tail-commenting)) + (unless (null? rest-l) + (error "had leftover items" rest-l)) + (define top (lists->syntax (cons top-tag (bars-insert-alts gs)))) + (add-raw-tail top tail-raw)) + +;; Parse a sequence of groups (top level, in opener-closer, or after `:`) +;; consuming the closer in the case of opener-closer context. +;; Returns: the list of groups +;; remaining tokens after a closer +;; line of last consumed (possibly closer) +;; delta of last consumed +;; last token consumed (if a closer) +;; pending group-comment token from end +;; pending raw whitespace/comments +(define (parse-groups l sg) + (define (check-no-commenting #:tail-ok? [tail-ok? #f]) + (define commenting (or (group-state-commenting sg) + (and (not tail-ok?) (group-state-tail-commenting sg)))) + (when commenting + (fail-no-comment-group commenting))) + (define (done) + (check-no-commenting #:tail-ok? #t) + (values null l (group-state-last-line sg) (group-state-delta sg) #f + (group-state-tail-commenting sg) + (group-state-raw sg))) + (define (check-column t column) + (when (group-state-check-column? sg) + (unless (eqv? column (group-state-column sg)) + (fail t "wrong indentation")))) + (define closer (group-state-closer sg)) + (cond + [(null? l) + ;; Out of tokens + (when (string? (closer-expected closer)) + (fail (closer-expected-opener closer) (format "expected ~s" (closer-expected closer)))) + (done)] + [else + (define t (car l)) + (define column (column+ (token-column t) (group-state-delta sg))) + (cond + [(eq? (token-name t) 'group-comment) + ;; column doesn't matter + (define line (token-line t)) + (define-values (rest-l last-line delta raw) + (next-of (cdr l) line (group-state-delta sg) (cons t (group-state-raw sg)))) + (cond + [(and line + (line . > . (group-state-last-line sg)) + (next-line? rest-l last-line)) + ;; structure comment is on its own line, so it comments out next group + (check-no-commenting) + (parse-groups rest-l (struct-copy group-state sg + [last-line last-line] + [delta delta] + [tail-commenting t] + [raw raw]))] + [else + (fail t "misplaced group comment") + (parse-groups rest-l (struct-copy group-state sg + [last-line last-line] + [delta delta] + [raw raw]))])] + [(and (closer-column? closer) + column + (column . < . closer)) + ;; Next token is less indented than this group sequence + (done)] + [else + ;; Dispatch on token + (case (token-name t) + [(closer) + (cond + [(closer-column? closer) + (done)] + [else + (unless (equal? (closer-expected closer) (token-e t)) + (fail t (format "expected ~s; closing at ~a" (closer-expected closer) (token-value t)))) + (define raw (cons t (group-state-raw sg))) + (if (eof-object? (closer-expected closer)) + ;; continue after extra closer: + (parse-groups (cdr l) (struct-copy group-state sg + [last-line (token-line t)] + [raw raw])) + ;; stop at closer + (begin + (check-no-commenting) + (values null (cdr l) (token-line t) (group-state-delta sg) t #f raw)))])] + [(whitespace comment continue-operator) + (define-values (next-l last-line delta raw) + (next-of l (group-state-last-line sg) (group-state-delta sg) (group-state-raw sg))) + (parse-groups next-l (struct-copy group-state sg + [last-line last-line] + [delta delta] + [raw raw]))] + [(comma-operator) + (cond + [(closer-column? (group-state-closer sg)) + (done)] + [else + (unless (and (group-state-paren-immed? sg) + (group-state-comma-time? sg)) + (fail t (format "misplaced comma~a" + (if (group-state-paren-immed? sg) + "" + " (not immdiately within parentheses or brackets)")))) + (define-values (rest-l last-line delta raw) + (next-of (cdr l) (token-line t) (group-state-delta sg) (cons t (group-state-raw sg)))) + ;; In top level or immediately in opener-closer: + (parse-groups rest-l (struct-copy group-state sg + [check-column? (next-line? rest-l last-line)] + [last-line last-line] + [comma-time? #f] + [delta delta] + [raw raw]))])] + [(semicolon-operator) + (when (group-state-paren-immed? sg) + (fail t (format "misplaced semicolon (~a)" + "immdiately within parentheses or brackets"))) + (cond + [(eq? (group-state-block-mode sg) 'inside) (done)] + [else + (check-column t column) + (define-values (rest-l last-line delta raw) + (next-of (cdr l) (token-line t) (group-state-delta sg) (cons t (group-state-raw sg)))) + (parse-groups rest-l (struct-copy group-state sg + [check-column? (next-line? rest-l last-line)] + [column (or (group-state-column sg) column)] + [last-line last-line] + [delta delta] + [commenting (group-state-tail-commenting sg)] + [tail-commenting #f] + [block-mode (next-block-mode (group-state-block-mode sg))] + [raw raw]))])] + [else + (when (group-state-comma-time? sg) + (fail t (format "missing comma before new group (~a)" + "within parentheses or braces"))) + (case (token-name t) + [(bar-operator) + (define line (token-line t)) + (cond + [(or (group-state-bar-closes? sg) + (and line + (eqv? line (group-state-bar-closes-line sg)))) + (done)] + [(eq? (group-state-block-mode sg) 'inside) + ;; Bar at the start of a group + (define same-line? (or (not (group-state-last-line sg)) + (= line (group-state-last-line sg)))) + (when (group-state-check-column? sg) + (unless (or same-line? + (= column (group-state-column sg))) + (fail t "wrong indentation"))) + (define pre-raw (group-state-raw sg)) + (define commenting (or (group-state-commenting sg) + (group-state-tail-commenting sg))) + (define-values (g rest-l group-end-line group-end-delta block-tail-commenting block-tail-raw) + (parse-block t (cdr l) + #:line line + #:closer (or (column-next column) 'any) + #:bar-closes? #t + #:bar-closes-line line + #:delta (group-state-delta sg) + #:raw (if commenting + null + pre-raw))) + (define-values (gs rest-rest-l end-line end-delta end-t tail-commenting tail-raw) + (parse-groups rest-l (struct-copy group-state sg + [column (if same-line? + (group-state-column sg) + column)] + [check-column? (next-line? rest-l group-end-line)] + [last-line group-end-line] + [delta group-end-delta] + [comma-time? (group-state-paren-immed? sg)] + [commenting #f] + [tail-commenting block-tail-commenting] + [raw (if commenting + (append block-tail-raw + (cons (syntax-to-raw (datum->syntax #f g)) + pre-raw)) + block-tail-raw)]))) + (values (if commenting + gs + (cons (list group-tag + (add-span-srcloc + t end-t + (cons 'bar g))) + gs)) + rest-rest-l + end-line + end-delta + end-t + tail-commenting + tail-raw)] + [else + (if (and (token? (group-state-block-mode sg)) + (eq? 'block-operator (token-name (group-state-block-mode sg)))) + (fail (group-state-block-mode sg) "unnecessary `:` before `|`") + (fail t "misplaced `|`"))])] + [else + ;; Parse one group, then recur to continue the sequence: + (check-column t column) + (define line (token-line t)) + (define pre-raw (group-state-raw sg)) + (define commenting (or (group-state-commenting sg) + (group-state-tail-commenting sg))) + (when (and (pair? l) + (eq? 'none (group-state-sequence-mode sg)) + (not commenting)) + (fail t "second group not allowed within `@«` and `»`")) + (define-values (g rest-l group-end-line group-delta group-tail-commenting group-tail-raw) + (parse-group l (make-state #:paren-immed? (group-state-paren-immed? sg) + #:line line + #:column column + #:bar-closes? (group-state-bar-closes? sg) + #:bar-closes-line (group-state-bar-closes-line sg) + #:block-mode (group-state-block-mode sg) + #:delta (group-state-delta sg) + #:raw null))) + (define-values (gs rest-rest-l end-line end-delta end-t tail-commenting tail-raw) + (parse-groups rest-l (struct-copy group-state sg + [column (or (group-state-column sg) column)] + [check-column? (next-line? rest-l group-end-line)] + [last-line group-end-line] + [comma-time? (group-state-paren-immed? sg)] + [delta group-delta] + [commenting #f] + [tail-commenting group-tail-commenting] + [raw (if commenting + (append group-tail-raw + (cons (syntax-to-raw (datum->syntax #f g)) + pre-raw)) + group-tail-raw)] + [sequence-mode (if (and (not commenting) + (eq? 'one (group-state-sequence-mode sg))) + 'none + (group-state-sequence-mode sg))]))) + (values (if commenting + gs + (cons (cons (add-pre-raw group-tag + pre-raw) + g) + gs)) + rest-rest-l + end-line + end-delta + end-t + tail-commenting + tail-raw)])])])])) + +;; Parse one group. +;; Returns: the list of items in the group +;; remaining tokens after group +;; line of last consumed +;; delta at last consumed +;; pending group-comment token from end +(define (parse-group l s) + (define (done) + (values null l (state-line s) (state-delta s) #f (state-raw s))) + (cond + [(null? l) (done)] + [else + (define t (car l)) + (define line (token-line t)) + (define (check-block-mode) + (when (eq? (state-block-mode s) 'end) + (fail t "no terms allowed after `»` within a group"))) + (define (check-nested-block-mode t) + (when (eq? (state-block-mode s) 'no) + (fail t "blocks not allowed immediately within `@«` and `»`"))) + ;; Consume a token + (define (keep delta + #:operator-column [operator-column (state-operator-column s)] + #:at-mode [at-mode (state-at-mode s)]) + (check-block-mode) + (define-values (at-adjust new-at-mode at-l at-line at-delta) + (continue-at at-mode #f (cdr l) line delta)) + (define-values (g rest-l end-line end-delta tail-commenting tail-raw) + (parse-group at-l (struct-copy state s + [line at-line] + [delta at-delta] + [raw null] + [block-mode (next-block-mode (state-block-mode s))] + [operator-column operator-column] + [at-mode new-at-mode]))) + (define elem (record-raw (token-value t) #f (state-raw s))) + (values (at-adjust (cons elem g)) rest-l end-line end-delta tail-commenting tail-raw)) + ;; Dispatch + (cond + [(and line (line . > . (state-line s))) + ;; new line + (case (token-name t) + [(whitespace comment) + (parse-group (cdr l) (struct-copy state s + [line line] + [delta 0] + [raw (cons t (state-raw s))]))] + [(closer at-content at-closer) + (done)] + [else + ;; consume any group comments that are on their own line: + (define-values (group-commenting use-t use-l last-line delta raw) + (get-own-line-group-comment t l (state-line s) (state-delta s) (state-raw s))) + (define column (token-column use-t)) + (cond + [(column . > . (state-column s)) + ;; More indented forms a nested block when there's + ;; a preceding `:` (doesn't get here) or starting with `|`; + ;; more indented continues a group when starting with an + ;; operator + (cond + [(eq? 'bar-operator (token-name use-t)) + (when (and (state-operator-column s) + (<= column (state-operator-column s))) + (fail use-t "wrong indentation")) + (check-nested-block-mode use-t) + (parse-block #f use-l + #:block-mode 'inside + #:line (token-line use-t) + #:closer (or (token-column use-t) 'any) + #:bar-closes? #f + #:bar-closes-line #f + #:delta delta + #:raw raw + #:group-commenting group-commenting)] + [(and (eq? 'operator (token-name use-t)) + (or (not (state-operator-column s)) + (= column (state-operator-column s)))) + (when group-commenting (fail group-commenting "misplaced group comment")) + (keep 0 #:operator-column column)] + [(and (eq? 'opener (token-name use-t)) + (equal? "«" (token-e t))) + (when group-commenting (fail group-commenting "misplaced group comment")) + ;; will error as out-of-place + (parse-group l (struct-copy state s + [line line]))] + [else + (fail use-t "wrong indentation (or missing `:` on previous line)")])] + [else + ;; using `done` here means that we leave any group comments in place; + ;; not consuming inspected tokens is normally worrisome, but we'll parse + ;; them at most one more time + (done)])])] + [else + ;; Not a new line + (case (token-name t) + [(closer comma-operator semicolon-operator at-content at-closer) + (done)] + [(identifier number literal operator) + (keep (state-delta s))] + [(block-operator) + (check-block-mode) + (check-nested-block-mode t) + (parse-block t (cdr l) + #:line (token-line t) + #:closer (or (column-half-next (or (state-operator-column s) + (state-column s))) + 'any) + #:delta (state-delta s) + #:raw (state-raw s) + #:bar-closes? (and (state-bar-closes? s) + (not (state-bar-closes-line s))) + #:bar-closes-line (state-bar-closes-line s))] + [(bar-operator) + (define line (token-line t)) + (cond + [(or (state-bar-closes? s) + (and line + (eqv? line (state-bar-closes-line s)))) + (done)] + [else + (check-block-mode) + (check-nested-block-mode t) + (parse-block #f l + #:block-mode 'inside + #:line line + #:closer (or (token-column t) 'any) + #:bar-closes? #f + #:bar-closes-line #f + #:delta (state-delta s) + #:raw (state-raw s))])] + [(opener) + (check-block-mode) + (define-values (closer tag) + (case (token-e t) + [("(") (values ")" 'parens)] + [("{") (values "}" 'braces)] + [("[") (values "]" 'brackets)] + [("«") (if (state-at-mode s) + (values "»" 'at) + (fail t "misplaced `«`"))] + [else (error "unknown opener" t)])) + (define pre-raw (state-raw s)) + (define-values (group-commenting next-l last-line delta raw) + (next-of/commenting (cdr l) line (state-delta s) null)) + (define sub-column + (if (pair? next-l) + (column+ (token-column (car next-l)) (state-delta s)) + (column-next (column+ (token-column t) (state-delta s))))) + (define-values (gs rest-l close-line close-delta end-t never-tail-commenting group-tail-raw) + (parse-groups next-l (make-group-state #:closer (make-closer-expected closer t) + #:paren-immed? #t + #:block-mode (if (eq? tag 'at) 'no #f) + #:column sub-column + #:last-line last-line + #:delta delta + #:commenting group-commenting + #:raw raw + #:sequence-mode (if (eq? tag 'at) 'one 'any)))) + (define-values (at-adjust new-at-mode at-l at-line at-delta) + (continue-at (state-at-mode s) (equal? closer "]") rest-l close-line close-delta)) + (define-values (g rest-rest-l end-line end-delta tail-commenting tail-raw) + (parse-group at-l (struct-copy state s + [line at-line] + [delta at-delta] + [block-mode (next-block-mode (state-block-mode s))] + [raw null] + [at-mode new-at-mode]))) + (define new-g (at-adjust + (cons (add-raw-to-prefix + t pre-raw #:tail group-tail-raw + (add-span-srcloc + t end-t + (cons tag gs))) + g))) + (define-values (result-g result-tail-raw) + (if (eq? tag 'at) + (splice-at t new-g tail-raw) + (values new-g tail-raw))) + (values result-g + rest-rest-l + end-line + end-delta + tail-commenting + result-tail-raw)] + [(whitespace comment continue-operator) + (define-values (next-l line delta raw) + (next-of l (state-line s) (state-delta s) (state-raw s))) + (parse-group next-l (struct-copy state s + [line line] + [delta delta] + [raw raw]))] + [(group-comment) + (fail t "misplaced group comment") + (parse-group (cdr l) s)] + [(at) + (check-block-mode) + (cond + [(null? (cdr l)) + (fail t "missing term after `@`") + (parse-group null s)] + [else + (define next-t (cadr l)) + (case (token-name next-t) + [(opener) + (case (token-e next-t) + [("(" "«") + (parse-group (cdr l) (struct-copy state s + [raw (cons t (state-raw s))] + [at-mode 'initial]))] + [("[") + (keep (state-delta s) #:at-mode 'no-initial)] + [else (error "unexpected" (token-name next-t))])] + [(identifier number literal operator opener) + (parse-group (cdr l) (struct-copy state s + [raw (cons t (state-raw s))] + [at-mode 'initial]))] + [(at-opener) + (keep (state-delta s) #:at-mode 'no-initial)] + [else + (fail next-t "invalid after `@`")])])] + [(at-comment) + (fail t "comments using `@//` are allowed only within an `@` body")] + [else + (error "unexpected" (token-value t))])])])) + +(define (parse-block t l + #:line line + #:closer closer + #:bar-closes? [bar-closes? #f] + #:bar-closes-line [bar-closes-line #f] + #:delta in-delta + #:raw in-raw + #:group-commenting [in-group-commenting #f] + #:block-mode [block-mode t]) + (define-values (opener-t opener-l opener-line opener-delta opener-raw) + (next-of/opener l line in-delta null)) + (define-values (group-commenting next-l last-line delta raw) + (if in-group-commenting + (values in-group-commenting opener-l opener-line opener-delta opener-raw) + (next-of/commenting opener-l opener-line opener-delta opener-raw))) + (cond + [(pair? next-l) + (define next-t (car next-l)) + (define-values (indent-gs rest-l end-line end-delta end-t tail-commenting tail-raw) + (parse-groups next-l + (make-group-state #:closer (if opener-t + (make-closer-expected "»" opener-t) + closer) + #:column (column+ (token-column next-t) delta) + #:last-line last-line + #:bar-closes? (and (not opener-t) bar-closes?) + #:bar-closes-line (and (not opener-t) bar-closes-line) + #:block-mode block-mode + #:delta delta + #:commenting group-commenting + #:raw raw))) + (define used-closer? (or opener-t + (closer-expected? closer))) + (define-values (null-g post-l post-line post-delta post-tail-commenting post-tail-raw) + (if used-closer? + ;; in 'end mode, so errors or returns a null group: + (parse-group rest-l (make-state #:line end-line + #:column +inf.0 + #:bar-closes? bar-closes? + #:bar-closes-line bar-closes-line + #:block-mode 'end + #:delta end-delta + #:raw null)) + (values '() rest-l end-line end-delta tail-commenting (if used-closer? + null + tail-raw)))) + (unless (null? null-g) (error "internal error, parsed more")) + (values (list (add-raw-to-prefix + t in-raw #:tail (and used-closer? tail-raw) + (add-span-srcloc + t end-t #:alt next-t + (tag-as-block indent-gs)))) + post-l + post-line + post-delta + post-tail-commenting + post-tail-raw)] + [else + (when opener-t (fail opener-t (format "expected `»`"))) + (values (list (add-raw-to-prefix + t in-raw + (add-span-srcloc + t #f + (tag-as-block null)))) + next-l + line + delta + group-commenting + raw)])) + +(define (tag-as-block gs) + (cond + [(and (pair? gs) + ;; really only need to check the first one: + (for/and ([g (in-list gs)]) + (and (pair? g) + (tag? 'group (car g)) + (pair? (cdr g)) + (null? (cddr g)) + (let ([b (cadr g)]) + (and (pair? b) + (tag? 'bar (car b)) + ;; the rest should always be true: + (pair? (cdr b)) + (null? (cddr b)) + (pair? (cadr b)) + (tag? 'block (caadr b))))))) + (cons 'alts (for/list ([g (in-list gs)]) + (move-pre-raw + (car g) + (let ([b (cadr g)]) + (cadr b)))))] + [else (cons 'block gs)])) + +(define (bars-insert-alts gs) + (cond + [(and (pair? gs) + ;; same check is in `tag-as-block + (let ([g (car gs)]) + (and (pair? g) + (tag? 'group (car g)) + (pair? (cdr g)) + (null? (cddr g)) + (let ([b (cadr g)]) + (and (pair? b) + (tag? 'bar (car b)) + (car b)))))) + => (lambda (bar-tag) + (list (list group-tag + (add-span-srcloc + bar-tag #f + (tag-as-block gs)))))] + [else gs])) + +(define (tag? sym e) + (or (eq? sym e) + (and (syntax? e) + (eq? sym (syntax-e e))))) + +;; Look for `{` (as 'at-opener) next or a `[` that might be followed +;; by a `{`, and prepare to convert by rearranging info a splice +;; followed by parentheses +(define (continue-at at-mode after-bracket? l line delta) + (define (at-call rator parens g) + (if (eq? at-mode 'no-initial) + (cons (move-pre-raw rator + (add-raw-to-prefix #f (syntax-to-raw rator) parens)) + g) + (list* rator parens g))) + (cond + [(not at-mode) + (values (lambda (g) g) #f l line delta)] + [(and (not after-bracket?) + (pair? l) + (eq? 'opener (token-name (car l))) + (equal? "[" (token-e (car l)))) + (values (lambda (g) + (define a (cadr g)) + (define tag (car a)) + (cond + [(tag? 'brackets tag) + (at-call (car g) + (cons (datum->syntax tag 'parens tag tag) + (cdr a)) + (cddr g))] + [(eq? at-mode 'no-initial) + (add-raw-to-prefix* #f (syntax-to-raw (car g)) + (cdr g))] + [else g])) + at-mode l line delta)] + [(and (pair? l) + (eq? 'at-opener (token-name (car l)))) + ;; process a `{`...`}` body, handling escapes and then trimming whitespace + (define init-t (car l)) + (let loop ([l (cdr l)] [content '()]) + (case (if (null? l) 'at-closer (token-name (car l))) + [(at-closer) + (when (null? l) + (fail init-t "missing closer for `@` content")) + (define-values (prefix-syntaxes new-content post-syntaxes) + (adjust-content-space content group-tag)) + (define c + (list group-tag (add-tail-raw-to-prefix + (list (car l)) + post-syntaxes + (cons (datum->syntax (token-value init-t) + 'brackets + (token-value init-t) + (token-value init-t)) + (add-raw-to-prefix* #f (map syntax-to-raw prefix-syntaxes) + new-content))))) + (values (lambda (g) (cond + [(not after-bracket?) + (at-call (car g) + (list parens-tag c) + (cdr g))] + [else + (define bracket (caar g)) + (define new-g (cons (cons parens-tag + (append + (cdar g) + (list + (move-post-raw-to-prefix bracket c)))) + (cdr g))) + (move-pre-raw bracket + (add-raw-to-prefix* #f (syntax-to-raw bracket) + new-g))])) + 'initial (if (null? l) null (cdr l)) line delta)] + [(at-content) + (loop (cdr l) + ;; mark as 'content instead of 'group for now, so we + ;; can split and trim whitespace after finding all of it + (cons (list 'content (token-value (car l))) + content))] + [(at at-comment) + (define t (car l)) + (define comment? (eq? (token-name t) 'at-comment)) + ;; `parse-group` work will be delimited by 'at-content or 'at-closer + (define-values (g rest-l group-end-line group-delta group-tail-commenting group-tail-raw) + (parse-group (if comment? + (cons (token-rename t 'at) (cdr l)) + l) + (make-state #:line (token-line t) + #:column (token-column t) + #:delta 0 + #:raw null))) + (loop rest-l (cons (if comment? + (list 'comment (cons (token-raw t) (syntax-to-raw g))) + (cons group-tag g)) + content))] + [(comment) + (loop (cdr l) (cons (list 'comment (token-e (car l))) content))] + [else (error "unexpected in at" (token-name (car l)))]))] + [else + (values (lambda (g) g) #f l line delta)])) + +(define (splice-at t g tail-raw) + (define gs (car g)) + (define at (car gs)) + (unless (tag? 'at at) (error "expected at")) + (when (null? (cdr gs)) (fail t "empty group within within `@«` and `»`")) + (unless (null? (cddr gs)) (error "extra groups in at")) + (define rest (cdr g)) + (values + (move-pre-raw* at + (add-raw-to-prefix* #f (syntax-to-raw at) + (append (cdadr gs) + (if (null? rest) + rest + (move-post-raw-to-prefix at rest))))) + (if (null? rest) + (append tail-raw (list (or (syntax-raw-tail-property at) '()))) + tail-raw))) + +;; Like `datum->syntax`, but propagates the source location of +;; a start of a list (if any) to the list itself. That starting +;; item is expected to be a tag that has the span of the whole +;; term already as its location +(define (lists->syntax l) + (cond + [(pair? l) + (define a (car l)) + (define new-l (for/list ([e (in-list l)]) + (lists->syntax e))) + (if (syntax? a) + (datum->syntax* #f new-l a stx-for-original-property) + (datum->syntax* #f new-l))] + [else l])) + +;; Consume whitespace and comments, including continuing backslashes, +;; where lookahead is needed +;; Arguments/returns: +;; list of input tokens (after whitespace, comments, and continues) +;; most recently consumed non-whitesace line (to determine whether +;; next is on same line); on input; the line can be #f, which means +;; "treat as same line"; the result is never #f +;; current line delta (created by continues) +;; accumulated reversed raw whitespace +(define (next-of l last-line delta raw) + (cond + [(null? l) (values null (or last-line 0) delta raw)] + [else + (define t (car l)) + (case (token-name t) + [(whitespace comment) + (next-of (cdr l) last-line delta (cons (car l) raw))] + [(continue-operator) + (define line (token-line t)) + ;; a continue operator not followed only by whitespace and + ;; comments is just treated as whitespace + (define-values (next-l next-raw) + (let loop ([l (cdr l)] [raw (cons t raw)]) + (cond + [(null? l) (values null raw)] + [else (case (token-name (car l)) + [(whitespace comment) (loop (cdr l) (cons (car l) raw))] + [else (values l raw)])]))) + (cond + [(and (pair? next-l) + (eqv? line (token-line (car next-l)))) + ;; like whitespace: + (next-of next-l last-line delta next-raw)] + [else + (next-of next-l + ;; whitespace-only lines don't count, so next continues + ;; on the same line by definition: + #f + (column+ (token-column t) + (+ (if (or (not last-line) (eqv? line last-line)) + delta + 0) + 1)) + next-raw)])] + [else + (define line (token-line t)) + (values l + (or last-line line) + (if (or (not last-line) (eqv? line last-line)) + delta + 0) + raw)])])) + +(define (next-of/commenting l last-line delta raw) + (define-values (rest-l rest-last-line rest-delta rest-raw) + (next-of l last-line delta raw)) + (cond + [(pair? rest-l) + (define-values (group-commenting use-t use-l last-line delta raw) + (get-own-line-group-comment (car rest-l) rest-l rest-last-line rest-delta rest-raw)) + (values group-commenting use-l last-line delta raw)] + [else + (values #f rest-l rest-last-line rest-delta rest-raw)])) + +(define (next-of/opener l last-line delta raw) + (define-values (rest-l rest-last-line rest-delta rest-raw) + (next-of l last-line delta raw)) + (cond + [(pair? rest-l) + (define t (car rest-l)) + (cond + [(and (eq? (token-name t) 'opener) + (equal? (token-e t) "«")) + (values t (cdr rest-l) (token-line t) rest-delta (cons t rest-raw))] + [else + (values #f rest-l rest-last-line rest-delta rest-raw)])] + [else + (values #f rest-l rest-last-line rest-delta rest-raw)])) + +;; t is at the start of l on input and output +(define (get-own-line-group-comment t l line delta raw) + (let loop ([commenting #f] [t t] [l (cdr l)] [line line] [delta delta] [raw raw]) + (case (token-name t) + [(group-comment) + (cond + [((token-line t) . > . line) + (define-values (next-l last-line next-delta next-raw) + (next-of l line delta (cons t raw))) + (cond + [(null? next-l) (fail-no-comment-group t)] + [(next-line? next-l last-line) + (when commenting (fail-no-comment-group commenting)) + (loop t (car next-l) (cdr next-l) last-line next-delta next-raw)] + [else + ;; `t` is still in `next-raw`, but this is going to lead to an error + (values commenting t (cons t next-l) last-line next-delta next-raw)])] + [else + (fail t "misplaced group comment")])] + [else + (values commenting t (cons t l) line delta raw)]))) + +(define (next-line? l last-line) + (and (pair? l) + (let ([line (token-line (car l))]) + (and line + (line . > . last-line))))) + +(define (fail-no-comment-group t) + (fail t "no group for term comment")) + +;; Report an error on failure, but then keep parsing anyway +;; if in recover mode +(define current-recover-mode (make-parameter #f)) +(define (fail tok msg) + (define loc (if (syntax? tok) + (srcloc (syntax-source tok) + (syntax-line tok) + (syntax-column tok) + (syntax-position tok) + (syntax-span tok)) + (token-srcloc tok))) + (cond + [(current-recover-mode) + (log-error "~a: ~a" (srcloc->string loc) msg)] + [else + (raise + (exn:fail:read + (if (error-print-source-location) + (format "~a: ~a" (srcloc->string loc) msg) + msg) + (current-continuation-marks) + (list loc)))])) + +(define (column-next c) + (and c + (if (integer? c) + (add1 c) + (add1 (inexact->exact (floor c)))))) + +(define (column-half-next c) + (if (integer? c) + (+ c 0.5) + (column-next c))) + +(define (column+ c n) + (and c (+ c n))) + +(define (next-block-mode mode) + (if (eq? mode 'no) 'no #f)) + +;; ---------------------------------------- + +(define (add-span-srcloc start-t end-t l #:alt [alt-start-t #f]) + (define (add-srcloc l loc) + (cons (let ([stx (datum->syntax* #f (car l) loc stx-for-original-property)]) + (if (syntax? start-t) + (let* ([stx (syntax-property-copy stx start-t syntax-raw-property)] + [stx (syntax-property-copy stx start-t syntax-raw-prefix-property)]) + stx) + stx)) + (cdr l))) + (define last-t/e (or end-t + ;; when `end-t` is false, we go looking for the + ;; end in `l`; this search walks down the end + ;; of `l`, and it may recur into the last element, + ;; but it should go only a couple of levels that way, + ;; since non-`group` tags have spanning locations + ;; gather from their content + (let loop ([e l]) + (cond + [(syntax? e) e] + [(not (pair? e)) #f] + [(null? (cdr e)) + (define a (car e)) + (if (and (pair? a) + (syntax? (car a))) + ;; found a tag like `block` + (car a) + (loop a))] + [else (loop (cdr e))])))) + (define s-loc (cond + [(not start-t) + (token-srcloc alt-start-t)] + [(syntax? start-t) + (syntax-srcloc start-t)] + [else + (token-srcloc start-t)])) + (define e-loc (and last-t/e + (if (syntax? last-t/e) + (syntax-srcloc last-t/e) + (token-srcloc last-t/e)))) + (define s-position (srcloc-position s-loc)) + (add-srcloc l (srcloc (srcloc-source s-loc) + (srcloc-line s-loc) + (srcloc-column s-loc) + s-position + (if e-loc + (let ([s s-position] + [e (srcloc-position e-loc)] + [sp (srcloc-span e-loc)]) + (and s e sp + (+ (max (- e s) 0) sp))) + (srcloc-span s-loc))))) + +(define (token-raw t) + (define value (token-value t)) + (or (syntax-raw-property value) + (syntax-e value))) + +(define (record-raw stx t pre-raw) + (define stx+raw (if t + (syntax-raw-property stx (raw-cons (token-raw t) + (or (syntax-raw-property stx) '()))) + (if (syntax-raw-property stx) + stx + (syntax-raw-property stx '())))) + (if (null? pre-raw) + stx+raw + (syntax-raw-prefix-property stx+raw (raw-cons (raw-tokens->raw pre-raw) + (or (syntax-raw-prefix-property stx) '()))))) + +(define (add-raw-tail top raw) + (if (null? raw) + top + (datum->syntax* top + (cons (syntax-raw-tail-property (car (syntax-e top)) (raw-tokens->raw raw)) + (cdr (syntax-e top))) + top + top))) + +(define (add-pre-raw stx pre-raw) + (if (null? pre-raw) + stx + (syntax-raw-prefix-property stx (raw-tokens->raw pre-raw)))) + +(define (move-pre-raw from-stx to) + (define pre-raw (and (syntax? from-stx) + (syntax-raw-prefix-property from-stx))) + (cond + [pre-raw + (define a (car to)) + (cons (syntax-raw-prefix-property a (raw-cons pre-raw + (or (syntax-raw-prefix-property a) '()))) + (cdr to))] + [else to])) + +(define (move-pre-raw* from-stx to) + (cond + [(syntax? (car to)) (move-pre-raw from-stx to)] + [else (cons (move-pre-raw* from-stx (car to)) + (cdr to))])) + +(define (move-post-raw from-stx to) + (define post-raw (and (syntax? from-stx) + (syntax-raw-tail-property from-stx))) + (cond + [post-raw + (define a (datum->syntax* #f (car to))) + (cons (syntax-raw-tail-property a (raw-cons (or (syntax-raw-tail-property a) '()) + post-raw)) + (cdr to))] + [else to])) + +(define (move-post-raw-to-prefix from-stx to) + (define post-raw (and (syntax? from-stx) + (syntax-raw-tail-property from-stx))) + (cond + [post-raw + (define a (datum->syntax* #f (car to))) + (cons (syntax-raw-prefix-property a (raw-cons post-raw + (or (syntax-raw-prefix-property a) '()))) + (cdr to))] + [else to])) + +(define (raw-tokens->raw pre-raw) + (for/list ([raw-t (in-list (reverse pre-raw))]) + (if (token? raw-t) + (token-raw raw-t) + raw-t))) + +(define (add-raw-to-prefix t pre-raw l #:tail [post-raw #f]) + (cons (let ([stx (record-raw (car l) t pre-raw)]) + (if post-raw + (syntax-raw-tail-property stx (raw-cons (or (syntax-raw-tail-property stx) '()) + (raw-tokens->raw post-raw))) + stx)) + (cdr l))) + +(define (add-raw-to-prefix* t pre-raw l) + (cond + [(syntax? (car l)) (add-raw-to-prefix t pre-raw l)] + [else (cons (add-raw-to-prefix* t pre-raw (car l)) + (cdr l))])) + +(define (add-tail-raw-to-prefix post-raw post-stxes l) + (cons (syntax-raw-tail-property (datum->syntax* #f (car l)) + (raw-cons (map syntax-to-raw post-stxes) + (raw-tokens->raw post-raw))) + (cdr l))) + +(define (raw-cons a b) + (cond + [(null? a) b] + [(null? b) a] + [else (cons a b)])) + +(define (syntax-property-copy dest src prop) + (define v (prop src)) + (if v + (prop dest v) + dest)) + +;; like `datum->syntax`, but ensures that a sequence of pairs is not +;; too long before there's a syntax pair +(define (datum->syntax* ctx v [src #f] [props #f]) + (datum->syntax + ctx + (let loop ([v v] [depth 0]) + (cond + [(pair? v) + (cond + [(depth . >= . 32) + (datum->syntax ctx + (cons (loop (car v) 0) (loop (cdr v) 0)) + src)] + [else + (cons (loop (car v) 0) (loop (cdr v) (add1 depth)))])] + [else v])) + src + props)) + +;; ---------------------------------------- + +;; check that line-counting is consistent (always on or always off), +;; and when it's off, make sure there are no newlines except maybe at +;; the beginning and/or end +(define (check-line-counting l) + (unless (null? l) + (define (fail-inconsistent t) + (fail t "port did not consistently report lines and columns")) + (cond + [(and (token-line (car l)) + (token-column (car l))) + (for ([t (in-list l)]) + (unless (and (token-line t) + (token-column t)) + (fail-inconsistent t)))] + [else + (let loop ([l l] [saw-non-ws? #f] [newline-t #f]) + (unless (null? l) + (define t (car l)) + (when (or (token-line t) + (token-column t)) + (fail-inconsistent t)) + (case (token-name t) + [(whitespace comment continue-operator) + (loop (cdr l) + saw-non-ws? + (and saw-non-ws? + (or newline-t + (and (regexp-match? #rx"[\r\n]" (syntax-e (token-value t))) + t))))] + [else + (when newline-t + (fail newline-t "port does not count lines, but input includes a newline")) + (loop (cdr l) #t #f)])))]))) + +;; ---------------------------------------- + +(define (parse-all in #:source [source (object-name in)]) + (define l (lex-all in fail #:source source)) + (check-line-counting l) + (if (null? l) + eof + (parse-top-groups l))) + +(module+ main + (require racket/cmdline + "print.rkt") + + (define show-raw? #f) + (define one-line? #f) + + (define (parse-all* in) + (unless one-line? + (port-count-lines! in)) + (define e (parse-all in)) + (unless (eof-object? e) + (cond + [show-raw? + (for ([s (syntax->list e)]) + (printf "#|\n~a\n|#\n" (shrubbery-syntax->string s)) + (pretty-write (syntax->datum s)))] + [else + (pretty-write + (syntax->datum e))]))) + + (command-line + #:once-each + [("--recover") "Continue parsing after an error" + (current-recover-mode #t)] + [("--raw") "Show raw strings when printing" + (set! show-raw? #t)] + [("--one-line") "Disable line counting to assume a single line" + (set! one-line? #t)] + #:args file + (if (null? file) + (parse-all* (current-input-port)) + (for-each (lambda (file) + (call-with-input-file* + file + parse-all*)) + file)))) diff --git a/shrubbery/print.rkt b/shrubbery/print.rkt new file mode 100644 index 000000000..f7d1a500f --- /dev/null +++ b/shrubbery/print.rkt @@ -0,0 +1,97 @@ +#lang racket/base +(require "private/property.rkt") + +;; Printing syntax object using raw-text properties + +(provide shrubbery-syntax->string) + +(module+ for-parse + (provide syntax-to-raw)) + +(define (shrubbery-syntax->string s #:max-length [max-length #f]) + (cond + [(all-raw-available? s) + (define o (open-output-string)) + (define (full?) + (and max-length + ((file-position o) . > . max-length))) + (let loop ([l (syntax-to-raw s)]) + (cond + [(pair? l) + (unless (full?) + (loop (car l)) + (unless (full?) + (loop (cdr l))))] + [(null? l) (void)] + [(string? l) (display l o)] + [else (void)])) + (define orig-str (get-output-string o)) + (define starting-col (extract-starting-column s)) + ;; strip `string-col` spaces from the start of lines after the first one: + (define str (regexp-replace* (string-append "\n" (make-string starting-col #\space)) + orig-str + "\n")) + (if (and max-length + ((string-length str) . > . max-length)) + (string-append (substring str 0 (max 0 (- max-length 3))) + "...") + str)] + [else + (define v (if (syntax? s) (syntax->datum s) s)) + (if max-length + (parameterize ([error-print-width max-length]) + (format "~.s" v)) + (format "~s" v))])) + +(define (syntax-to-raw g) + (let loop ([g g] [tail null] [use-prefix? #f]) + (cond + [(null? g) tail] + [(pair? g) + (define a-stx (car g)) + (define post (and (syntax? a-stx) + (syntax-raw-tail-property a-stx))) + (define a (loop a-stx null use-prefix?)) + (define d (loop (cdr g) + (if post + (if (null? tail) + post + (cons tail post)) + tail) + #t)) + (if (null? a) d (cons a d))] + [(syntax? g) + (define pre (and use-prefix? + (syntax-raw-prefix-property g))) + (define r (syntax-raw-property g)) + (define raw (if (and pre r) + (cons pre r) + (or pre r null))) + (define d (loop (syntax-e g) tail use-prefix?)) + (if (null? raw) d (cons raw d))] + [else tail]))) + +(define (all-raw-available? s) + (cond + [(syntax? s) + (or (syntax-raw-property s) + (let ([e (syntax-e s)]) + (or (and (pair? e) + (all-raw-available? e)) + (null? e))) + #; + (and (log-error "?? ~s" s) + #f))] + [(pair? s) (and (all-raw-available? (car s)) + (all-raw-available? (cdr s)))] + [else #t])) + +(define (extract-starting-column s) + (cond + [(syntax? s) + (or (syntax-column s) + (let ([e (syntax-e s)]) + (and (pair? e) + (extract-starting-column (car e)))) + 0)] + [else 0])) diff --git a/shrubbery/private/at-space.rkt b/shrubbery/private/at-space.rkt new file mode 100644 index 000000000..70588a457 --- /dev/null +++ b/shrubbery/private/at-space.rkt @@ -0,0 +1,176 @@ +#lang racket/base +(require "property.rkt") + +(provide adjust-content-space) + +(define (adjust-content-space rev group-tag) + (let*-values ([(rev) (remove-trailing-spaces rev)] + [(rev end2) (discard-immediate-space rev)] + [(rev end1) (discard-immediate-newline rev #f)]) + (let*-values ([(content) (reverse rev)] + [(content start1) (discard-immediate-space content)] + [(content start2) (discard-immediate-newline content #f)]) + (let* ([min-col (get-min-column content)] + [content (if (eqv? min-col 0) + content + (trim-shared-column content min-col (pair? start2)))]) + (let-values ([(content comments) (convert-content content group-tag)]) + (values (append start1 start2) + content + (append (reverse comments) end1 end2))))))) + +(define rx:whitespace #px"^\\s*$") + +(define (is-newline? c) + (and (eq? 'content (car c)) + (equal? "\n" (syntax-e (cadr c))))) + +;; discard space immediately before closer or after opener +(define (discard-immediate-space lst) + (cond + [(and (pair? lst) + (eq? 'content (caar lst)) + (pair? (cdr lst)) + (let ([prev (cadr lst)]) + (and (eq? 'content (car prev)) + (equal? "\n" (syntax-e (cadr prev))))) + (regexp-match? rx:whitespace (syntax-e (cadar lst)))) + (values (cdr lst) (cdar lst))] + [else + (values lst null)])) + +;; discard newline just before closer, +;; unless a newline is all there will be +(define (discard-immediate-newline lst check-next-ws?) + (cond + [(and (pair? lst) + (is-newline? (car lst)) + (not (null? (cdr lst))) + (not (and check-next-ws? + (null? (cddr lst)) + (let ([prev (cadr lst)]) + (and (eq? 'content (car prev)) + (regexp-match? rx:whitespace (cadr prev))))))) + (values (cdr lst) (cdar lst))] + [else + (values lst null)])) + +(define (remove-trailing-spaces rev) + (define (trailing-space-count s) + (let loop ([i (string-length s)]) + (cond + [(eqv? i 0) 0] + [(char-whitespace? (string-ref s (sub1 i))) (add1 (loop (sub1 i)))] + [else 0]))) + (let loop ([rev rev]) + (cond + [(null? rev) rev] + [(is-newline? (car rev)) + (define next (cdr rev)) + (cons (car rev) + (cond + [(null? next) null] + [(is-newline? (car next)) (loop next)] + [(eq? 'content (caar next)) + (define n (trailing-space-count (syntax-e (cadar next)))) + (cond + [(eqv? n 0) (loop next)] + [else + (define a (cadar next)) + (define s (syntax-e a)) + (cons (list 'content + (datum->syntax a + (substring s 0 (- (string-length s) n)) + a + a)) + (loop (cdr next)))])] + [else (loop next)]))] + [else (cons (car rev) (loop (cdr rev)))]))) + +(define (leading-space-count s) + (let loop ([i 0]) + (cond + [(eqv? i (string-length s)) 0] + [(char-whitespace? (string-ref s i)) (add1 (loop (add1 i)))] + [else 0]))) + +(define (get-min-column lst) + (let loop ([lst lst] [min-col #f] [saw-nl? #t]) + (cond + [(null? lst) min-col] + [(is-newline? (car lst)) + (loop (cdr lst) min-col #t)] + [(and saw-nl? + (eq? 'content (caar lst))) + (define stx (cadar lst)) + (define s (syntax-e stx)) + (define n (+ (leading-space-count s) (or (syntax-column stx) 0))) + (loop (cdr lst) (min n (or min-col n)) #f)] + [else + (loop (cdr lst) min-col #f)]))) + +(define (trim-shared-column lst min-col start-nl?) + (let loop ([lst lst] [saw-nl? start-nl?]) + (cond + [(null? lst) null] + [(is-newline? (car lst)) + (cons (car lst) + (loop (cdr lst) #t))] + [(and saw-nl? + (eq? 'content (caar lst))) + (define a (cadar lst)) + (define s (syntax-e a)) + (define col (or (syntax-column a) 0)) + (define n (leading-space-count s)) + (define add-back-n (max 0 (- n min-col))) + (define trimmed-s (if (eqv? n 0) + s + (substring s n))) + (define trimmed + (cons (list 'content + (datum->syntax a + trimmed-s + a + a)) + (loop (cdr lst) #f))) + (if (eqv? add-back-n 0) + trimmed + (cons (list 'content + (syntax-raw-property (datum->syntax a + (make-string add-back-n #\space) + a) + '())) + trimmed))] + [else (cons (car lst) (loop (cdr lst) #f))]))) + +(define (convert-content lst group-tag) + (let loop ([lst lst] [accum '()] [comments '()]) + (cond + [(null? lst) (values (reverse accum) (reverse comments))] + [else + (define c (car lst)) + (cond + [(eq? 'comment (car c)) + (loop (cdr lst) accum (cons (cadr c) comments))] + [(eq? 'content (car c)) + (loop (cdr lst) + (cons (add-comments (cons group-tag (cdr c)) + comments) + accum) + null)] + [else + (loop (cdr lst) + (cons (add-comments c comments) + accum) + null)])]))) + +(define (add-comments c comments) + (cond + [(null? comments) c] + [else + (define stx (car c)) + (cons (syntax-raw-prefix-property stx + (cons (reverse comments) + (or (syntax-raw-prefix-property stx) '()))) + (cdr c))])) + diff --git a/shrubbery/private/edit-help.rkt b/shrubbery/private/edit-help.rkt new file mode 100644 index 000000000..c2fba84dc --- /dev/null +++ b/shrubbery/private/edit-help.rkt @@ -0,0 +1,103 @@ +#lang racket/base +(require racket/class) + +(provide opener? + line-start + line-delta + col-of + only-whitespace-between? + get-block-column) + +(define (opener? s) + (member s '("(" "{" "[" "«"))) + +(define (line-start t pos) + (send t paragraph-start-position (send t position-paragraph pos #t))) + +(define (line-delta t start #:unless-empty? [unless-empty? #f]) + (let loop ([pos start]) + (cond + [(eqv? pos 0) 0] + [else + (case (send t classify-position (sub1 pos)) + [(white-space comment) + (define-values (s e) (send t get-token-range (sub1 pos))) + (loop s)] + [(continue-operator) + ;; since we've only skipped comments and whitespace, this + ;; continue operator applies + (define-values (s e) (send t get-token-range (sub1 pos))) + (define c-start (line-start t s)) + (define more-delta (line-delta t c-start #:unless-empty? unless-empty?)) + (and more-delta + (if unless-empty? + (not (only-whitespace-between? t c-start s #:or-ws-like? #t)) + #t) + (+ (- e c-start) more-delta))] + [else 0])]))) + +;; skip back over whitespace and comments to find `\` +(define (col-of pos start delta) + (+ (- pos start) delta)) + +(define (only-whitespace-between? t s-pos e-pos + #:or-ws-like? [or-ws-like? #f]) + (let loop ([pos s-pos]) + (and (case (send t classify-position pos) + [(white-space) #t] + [(comment continue-operator) or-ws-like?] + [else #f]) + (let () + (define-values (s e) (send t get-token-range pos)) + (or (e . >= . e-pos) + (loop e)))))) + +;; find the current indentation of the block that starts at or before `pos`, +;; a long as the block continues (not nested) on the line at `at-start`; +;; if `for-outdent?` is true, don't treat leading operators as the start +(define (get-block-column t pos candidate at-start + #:for-outdent? [for-outdent? #t]) + (let loop ([pos pos] [candidate candidate] [at-start at-start]) + (define pos-start (line-start t pos)) + (cond + [(pos-start . < . at-start) + candidate] + [else + (define-values (s e) (send t get-token-range pos)) + (define category (send t classify-position s)) + (case category + [(white-space comment continue-operator) + (if (zero? s) + (or candidate 0) + (loop (sub1 s) candidate at-start))] + [(parenthesis) + (cond + [(opener? (send t get-text (sub1 e) e)) + candidate] + [else + ;; Found parenthesized while walking backward + (define r (send t backward-match e 0)) + (cond + [(not r) + (define start (line-start t pos)) + (define delta (line-delta t start)) + (loop (sub1 s) (col-of s start delta) at-start)] + [(zero? r) (list 0)] + [else + (define start (line-start t r)) + (define delta (line-delta t start)) + (loop (sub1 r) (col-of r start delta) start)])])] + [(block-operator bar-operator separator) candidate] + [else + (cond + [(and #f ;; not sure why special-casing operators seemed like a good idea + for-outdent? + (eq? category 'operator)) + (if (zero? s) + (or candidate 0) + (loop (sub1 s) candidate at-start))] + [(zero? s) 0] + [else + (define start (line-start t pos)) + (define delta (line-delta t start)) + (loop (sub1 s) (col-of s start delta) start)])])]))) diff --git a/shrubbery/private/property.rkt b/shrubbery/private/property.rkt new file mode 100644 index 000000000..fb190570d --- /dev/null +++ b/shrubbery/private/property.rkt @@ -0,0 +1,20 @@ +#lang racket/base + +(provide syntax-raw-property + syntax-raw-prefix-property + syntax-raw-tail-property) + +(define syntax-raw-property + (case-lambda + [(stx) (syntax-property stx 'raw)] + [(stx val) (syntax-property stx 'raw val #t)])) + +(define syntax-raw-prefix-property + (case-lambda + [(stx) (syntax-property stx 'raw-prefix)] + [(stx val) (syntax-property stx 'raw-prefix val #t)])) + +(define syntax-raw-tail-property + (case-lambda + [(stx) (syntax-property stx 'raw-tail)] + [(stx val) (syntax-property stx 'raw-tail val #t)])) diff --git a/shrubbery/srcloc.rkt b/shrubbery/srcloc.rkt new file mode 100644 index 000000000..1e66f06f0 --- /dev/null +++ b/shrubbery/srcloc.rkt @@ -0,0 +1,22 @@ +#lang racket/base +(require (for-syntax racket/base)) + +(provide syntax-srcloc) + +(define-syntax (get stx) + (syntax-case stx () + [(_ syntax-srcloc) + (cond + [(identifier-binding #'syntax-srcloc) + #'(begin)] + [(file-exists? (collection-file-path "syntax-srcloc.rkt" "racket")) + #`(require #,(datum->syntax #'syntax-srcloc 'racket/syntax-srcloc))] + [else + #'(define (syntax-srcloc stx) + (srcloc (syntax-source stx) + (syntax-line stx) + (syntax-column stx) + (syntax-position stx) + (syntax-span stx)))])])) + +(get syntax-srcloc) diff --git a/shrubbery/syntax-color.rkt b/shrubbery/syntax-color.rkt new file mode 100644 index 000000000..db2516314 --- /dev/null +++ b/shrubbery/syntax-color.rkt @@ -0,0 +1,8 @@ +#lang racket/base +(require "lex.rkt" + syntax-color/racket-lexer) + +(provide shrubbery-lexer) + +(define (shrubbery-lexer in pos status) + (lex/status in pos status racket-lexer/status)) diff --git a/shrubbery/tests/indent.rkt b/shrubbery/tests/indent.rkt new file mode 100644 index 000000000..58f705970 --- /dev/null +++ b/shrubbery/tests/indent.rkt @@ -0,0 +1,379 @@ +#lang at-exp racket/base +(require racket/class + "../lex.rkt" + "../indentation.rkt" + (for-syntax racket/base)) + +(define failed? #f) + +(define TOKEN-SLOT 0) +(define TYPE-SLOT 1) + +;; fix 1-based indexing... +(define (srcloc-0position loc) + (sub1 (srcloc-position loc))) + +(define like-text% + (class object% + (init-field content) + + (super-new) + + (define-values (position-paragraphs paragraph-starts) + (let loop ([pos 0] [para 0] [pos-para #hasheqv()] [para-pos #hasheqv((0 . 0))]) + (cond + [(= pos (string-length content)) + (values (hash-set pos-para pos para) para-pos)] + [(char=? #\newline (string-ref content pos)) + (loop (add1 pos) (add1 para) + (hash-set pos-para pos para) + (hash-set para-pos (add1 para) (add1 pos)))] + [else + (loop (add1 pos) para (hash-set pos-para pos para) para-pos)]))) + + (define tokens + (lex-all (let ([p (open-input-string content)]) + (port-count-lines! p) + p) + (lambda args + (error "unexpected lex failure: ~s" args)) + #:keep-type? #t)) + + ;; position -> token + (define mapping + (let loop ([tokens tokens] [pos 0] [mapping #hasheqv()]) + (cond + [(null? tokens) mapping] + [else + (define t+type (car tokens)) + (define t (vector-ref t+type TOKEN-SLOT)) + (define loc (token-srcloc t)) + (unless (= pos (srcloc-0position loc)) + (error 'test "token discontinuity ~s vs. ~v @ ~s" pos (token-e t) (srcloc-0position loc))) + (loop (cdr tokens) + (+ pos (srcloc-span loc)) + (for/fold ([mapping mapping]) ([i (in-range (srcloc-span loc))]) + (hash-set mapping (+ pos i) t+type)))]))) + + (define/public (get-text s e) + (substring content s e)) + + (define/public (classify-position pos) + (define t+type (or (hash-ref mapping pos #f) + (error 'classify-position "lookup failed: ~e" pos))) + (vector-ref t+type TYPE-SLOT)) + + (define/public (get-token-range pos) + (define t+type (or (hash-ref mapping pos #f) + (error 'get-token-range "lookup failed: ~e" pos))) + (define t (vector-ref t+type TOKEN-SLOT)) + (define loc (token-srcloc t)) + (values (srcloc-0position loc) + (+ (srcloc-0position loc) (srcloc-span loc)))) + + (define/public (last-position) + (string-length content)) + + (define/public (position-paragraph pos [eol? #f]) + (or (hash-ref position-paragraphs pos #f) + (error 'position-paragraph "lookup failed: ~e" pos))) + + (define/public (paragraph-start-position para) + (or (hash-ref paragraph-starts para #f) + (error 'paragraph-start-position "lookup failed: ~e" para))) + + (define/public (backward-match pos cutoff) + (let loop ([pos (sub1 pos)] [depth -1] [need-close? #t]) + (cond + [(pos . < . 0) #f] + [else + (define-values (s e) (get-token-range pos)) + (define category (classify-position pos)) + (case category + [(parenthesis) + (case (get-text s e) + [("{" "(" "[" "«") (and (not need-close?) + (if (= depth 0) + s + (loop (sub1 s) (sub1 depth) #f)))] + [("}" ")" "]" "»") (loop (sub1 s) (add1 depth) #f)] + [else (error "unexpected parenthesis-class text")])] + [(whitespace comment) + (loop (sub1 s) depth need-close?)] + [else (if need-close? + s + (loop (sub1 s) depth #f))])]))))) + + +(define (check e line) + (printf "Checking ~s\n" line) + ;; replacing each "^" by two spaces pushes the token too + ;; far to the right, which makes the `#:multi?` result + ;; keep its original sorting right-to-left: + (define clean-e (regexp-replace* #rx"\\!" (regexp-replace* #rx"\\^" e " ") "x")) + (define m (regexp-match-positions #rx"[ ]*(?:\\^|[!])" e)) + (define start-pos (caar m)) + (define expected (reverse + (let loop ([pos start-pos]) + (cond + [(= pos (string-length e)) '()] + [(memv (string-ref e pos) '(#\^ #\!)) + (cons (- pos start-pos) (loop (add1 pos)))] + [else (loop (add1 pos))])))) + (define t (new like-text% [content clean-e])) + (define raw-candidates (shrubbery-indentation t start-pos #:multi? #t)) + (define candidates (if (list? raw-candidates) raw-candidates (list raw-candidates))) + (unless (equal? candidates expected) + (set! failed? #t) + (eprintf (string-append + "failed at line ~a\n" + " expected: ~a\n" + " received: ~a\n") + line + expected + candidates))) + +(define-syntax-rule (check-all e ...) + (begin + (check e (at e)) + ...)) + +(define-syntax (at stx) + (syntax-case stx () + [(_ e) #`#,(syntax-line #'e)])) + +(define e string-append) + +(check-all + + ;; In these tests, the last line of each is the one to be indented, + ;; and the "^"s (which will be replaced by two spaces before applying + ;; the indenter) or "!"s (replaced by an "x") are the expected + ;; indentation candidates. The content after the last "^" or "!", if + ;; any, can affect the indenters results (e.g., the answer tends to + ;; be different if it starts "|"). Whitespace after a final "^" or + ;; "!" will not matter. + + @e{a + ^x} + + @e{a: + ^ ^x} + + @e{apple: + ^ ^x} + + @e{apple: banana: + ^ ^ ^x} + + @e{apple: : + ^ ^ ^x} + + @e{(a: + ^ ^x} + + @e{(: + ^ ^x} + + @e{[a: + ^ ^x} + + @e{[: + ^ ^x} + + @e|{{a: + ^ ^x}| + + @e|{{: + ^ ^x}| + + @e{define pi: + ^ ^x} + + @e{define pi: 3.14 + ^ ^x} + + @e{define fib(n): + cond | n == 0: 0 + ^ ^|} + + @e{define fib(n): + log_error("fib called") + cond | n == 0: 0 + ^ ^|} + + @e{define + | fib(0): 0 + | fib(1): 1 + | fib(n): fib(n-1) // in parens => `+` continues this line + ^ ^ ^ ^+ fib(n-2))} + + @e{define + | fib(0): 0 + | fib(1): 1 + | fib(n): (fib(n-1) // in parens => `+` continues this line + ^ ^+ fib(n-2))} + + @e{define fib: + lambda (n): + cond + | n == 0: 0 + ^ ^| n == 1: 1} + + @e|{define fib(n): + match n { | 0 { 0 } + ^| 1 { 1 }}| + + @e{1 + x: « 3 + ^4 »} + + @e{1 + x: « 3 + ^ ^+ 4 »} + + @e{1 + x: 3 + ^ ^ ^+ 4} + + @e{x: 1 + x: 3 + ^ ^ ^ ^+ 4} + + @e{x: 1 + x | 3 + ^ ^ ^ ^+ 4} + + @e{define analyze(n): + if n == 0 + | printf("zero\n") + ^ ^ ^printf} + + @e{define go(): + define more(m): + if m == 0 | "done" + ^|} + + @e{define go(): + define more(m): + if m == 0 | "done" + ^ ^ ^ ^x} + + + @e{define approx_thunk(x): + match x + | something(v): lambda + | (): v + ^ ^ ^| (n): v+n} + + @e{define colors: + list( + ^red,} + + @e{define colors: + list( + red, + ^green,} + + @e{define colors: + list( + red + ^)} + + @e{define colors: + list(red, + ^green,} + + @e{this is a \ + !ong linear group} + + @e{this is a \ + very long linear group \ + !hat spans multiple lines} + + @e{this is | one: a \ + long result + ^ ^| two} + + @e{this is | one: a \ + long result + | two \ + !lso long} + + @e{this is a group \ with (a, + ^nested} + + @e{this is a group \ + with (a, + ^nested} + + @e{this is a group \ + with (a, + \ + ^nested} + + @e{hello | a | c\ + : + ^ ^ ^d} + + @e{nonsense: + hello | there 4.5 + | more f(8) + next (ok | + ^|} + + @e{nonsense: + hello | there 4.5 + | more f(8) + next (ok | + | stuff: (begin: + more + ^ ^things} + + @e{x + w | z : : + ^ ^ ^ ^y} + + @e{x something | a + y: + w: + ^ ^ ^ ^q} + + @e{z: | x + ^ ^y} + + @e{z: + | x + ^ ^y} + + @e|{|z { + | x + ^y|}| + + @e{define fib(n): + match n + | 0 + ^ ^ ^: 0} + + @e{+ `( + ^x} + + @e{+ `( + x + ^)} + + @e{+ x`( + ^x} + + @e{+ x`( + x + ^)} + + @e{(match v + | cons(bv, av): + define values(is_a_match, ar): + ¿a_pred(av) + if is_a_match + ^ ^| define} + + @e{a |« b» + ^ c} + + ) + +(when failed? + (error "failed")) diff --git a/shrubbery/tests/parse.rkt b/shrubbery/tests/parse.rkt new file mode 100644 index 000000000..2b2fc517a --- /dev/null +++ b/shrubbery/tests/parse.rkt @@ -0,0 +1,1683 @@ +#lang racket/base +(require "../parse.rkt" + "../print.rkt" + "../write.rkt" + racket/pretty) + +(define input1 +#< (a.x == b.x && a.y == b.y) + define hash(a): + 17 + define secondary_hash(a): + 19 + +struct posn(x, y): + property prop_equal_and_hash: + let (hc = lambda (a: posn, hc): + hc(a.x) + hc(a.y), + eql = lambda (a: posn, b: posn, eql): + eql(a.x, b.x) && eql(a.y, b.y)): + values(eql, hc, hc) + +struct posn(x, y): + property prop_equal_and_hash: + let (hc = lambda (a: posn, hc): + hc(a.x) + hc(a.y)): + (lambda (a: posn, b: posn, eql): + eql(a.x, b.x) && eql(a.y, b.y), + hc, + hc) + +// Another possibile approach to syntax for `struct`: +struct posn: + fields: + x mutable + y = 7 + methods equality: + define equal(a, b): + is_posn(b) => (a.x == b.x && a.y == b.y) + define hash(a): + 17 + define secondary_hash(a): + 19 + property prop_quality: "high" + +define fourth(n :: Integer): + define m: n*n + define v: m*m + printf("~a^4 = ~a\n", n, v) + v + +define exp(n :: Integer, ~base: base = 2.718281828459045): + if (n == 1) + | base + | base * exp(n-1, ~base: base) + +define positive_p(n): if n > 0 | true | false + +define go(): + define helper(n): + list(n, n) + define more(m): + if m == 0 | "done" + | more(m - 1) + helper(more(9)) + +define approx(x): + match x + | something(v): + printf("got it\n") + v + | nothing: 0 + +// With two `:`s on one line, there's no way to +// add to the first `:` +define approx_thunk(x): + match x + | something(v): lambda (): v + | nothing: lambda (): 0 + +// Enough indentation for `v` means that it continues the +// implicit second `:`, so the `lambda` body has `v`: +define approx_thunk(x): + match x + | something(v): lambda (): + v + | nothing: lambda (): 0 + +define approx_thunk(x): + match x + | something(v): lambda + | (): v + | (n): v+n + | nothing: lambda + | (): 0 + | (n): n + +define curried: + lambda (x): + lambda (y): + lambda (z): + list(x, y, z) + +define curried: lambda (x): + lambda (y): + lambda (z): + list(x, y, z) + +define dictionary: dict: + foo: 17 + bar: string + baz: true + +define colors: + list( + red, + green, + blue, + orange, + ) + +define f(x_something, + y_something_else, + z_also_long_name): + 5 + +define sum(l): + let loop(l = l): + if is_null(l) + | 0 + | first(l) + loop(rest(l)) + +define show_all(l): + for (x = in_list(l)): + print(x) + newline() + +define show_zip(l, l2): + for (x = in_list(l), + x2 = in_list(l2)): + print(x) + print_string(" ") + print(x2) + newline() + +define show_combos_not_same(l, l2): + for (x = in_list(l)): + then (x2 = in_list(l2)): + when !is_equal(x, x2): + print(x) + print_string(" ") + print(x2) + newline() + +define map(f, l): + for list (x = in_list(l)): + f(x) + +define partition(l, pred): + for fold (yes = empty, + no = empty, + result (reverse(yes), reverse(no))): + with (x = in_list(l)): + if pred(x) + | (cons(x, yes), no) + | (yes, cons(x, no)) + +local: + with: + define x: 1 + define y: 2 + in: + x+y + +if t | if f | a | b | y +if t |« if f | a | b » | y +if t |« tag: if f | a | b » | y +if t |« tag: «if f | a | b » » | y + +x: y: a; b ; c +x: y:« a; b »; c + +if t | x | y; z +if t | x |« y »; z + +x:« + #// + 2 + 3 » + +branch |« x»; +INPUT +) + +(define expected1 + '(top + (group + let + (parens (group x (op =) 1) (group y (op =) 2)) + (block (group x (op +) y))) + (group + let + (parens (group x (op =) 1) (group y (op =) 2)) + (block (group x (op +) y))) + (group define pi (block (group 3.14))) + (group + define + fib + (parens (group n)) + (block + (group log_error (parens (group "fib called"))) + (group + cond + (alts + (block (group n (op ==) 0 (block (group 0)))) + (block (group n (op ==) 1 (block (group 1)))) + (block + (group + else + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + (alts + (block (group fib (parens (group 0)) (block (group 0)))) + (block (group fib (parens (group 1)) (block (group 1)))) + (block + (group + fib + (parens (group n)) + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))) + (group + define + (alts + (block (group fib (parens (group 0)) (block (group 0)))) + (block (group fib (parens (group 1)) (block (group 1)))) + (block + (group + fib + (parens (group n)) + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))) + (group + define + fib + (block + (group + lambda + (parens (group n)) + (block + (group + cond + (alts + (block (group n (op ==) 0 (block (group 0)))) + (block (group n (op ==) 1 (block (group 1)))) + (block + (group + else + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))))) + (group + define + fib + (block + (group + lambda + (parens (group n)) + (block + (group + cond + (alts + (block (group n (op ==) 0 (block (group 0)))) + (block (group n (op ==) 1 (block (group 1)))) + (block + (group + else + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))))) + (group + (parens + (group + define + fib + (block + (group + (parens + (group + lambda + (parens (group n)) + (block + (group + (parens + (group + cond + (alts + (block (group (parens (group n (op ==) 0)) (block (group 0)))) + (block (group (parens (group n (op ==) 1)) (block (group 1)))) + (block + (group + else + (block + (group + (parens + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))))))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0 (block (group 0)))) + (block (group 1 (block (group 1)))) + (block + (group + n + (block + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))))))) + (group + define + make_adder + (parens (group n)) + (block + (group + lambda + (parens (group m)) + (block + (group printf (parens (group "adding to ~a\n") (group m))) + (group m (op +) n))))) + (group + define + analyze + (parens (group n)) + (block + (group + if + n + (op ==) + 0 + (alts + (block (group printf (parens (group "zero\n")))) + (block (group printf (parens (group "other\n")))))) + (group printf (parens (group "done\n"))))) + (group + define + analyze + (parens (group n)) + (block + (group + if + n + (op ==) + 0 + (alts + (block + (group printf (parens (group "zero\n"))) + (group printf (parens (group "done saying zero\n")))) + (block + (group printf (parens (group "other\n"))) + (group printf (parens (group "done saying other\n")))))))) + (group struct posn (parens (group x) (group y))) + (group + struct + color_posn + (parens (group col)) + (block (group extends posn) (group mutable))) + (group + struct + posn + (parens (group x mutable) (group y (op =) 7)) + (block + (group + methods + equality + (block + (group + define + equal + (parens (group a) (group b)) + (block + (group + is_posn + (parens (group b)) + (op =>) + (parens + (group + a + (op |.|) + x + (op ==) + b + (op |.|) + x + (op &&) + a + (op |.|) + y + (op ==) + b + (op |.|) + y))))) + (group define hash (parens (group a)) (block (group 17))) + (group define secondary_hash (parens (group a)) (block (group 19))))))) + (group + struct + posn + (parens (group x) (group y)) + (block + (group + property + prop_equal_and_hash + (block + (group + let + (parens + (group + hc + (op =) + lambda + (parens (group a (block (group posn))) (group hc)) + (block + (group + hc + (parens (group a (op |.|) x)) + (op +) + hc + (parens (group a (op |.|) y))))) + (group + eql + (op =) + lambda + (parens + (group a (block (group posn))) + (group b (block (group posn))) + (group eql)) + (block + (group + eql + (parens (group a (op |.|) x) (group b (op |.|) x)) + (op &&) + eql + (parens (group a (op |.|) y) (group b (op |.|) y)))))) + (block (group values (parens (group eql) (group hc) (group hc))))))))) + (group + struct + posn + (parens (group x) (group y)) + (block + (group + property + prop_equal_and_hash + (block + (group + let + (parens + (group + hc + (op =) + lambda + (parens (group a (block (group posn))) (group hc)) + (block + (group + hc + (parens (group a (op |.|) x)) + (op +) + hc + (parens (group a (op |.|) y)))))) + (block + (group + (parens + (group + lambda + (parens + (group a (block (group posn))) + (group b (block (group posn))) + (group eql)) + (block + (group + eql + (parens (group a (op |.|) x) (group b (op |.|) x)) + (op &&) + eql + (parens (group a (op |.|) y) (group b (op |.|) y))))) + (group hc) + (group hc))))))))) + (group + struct + posn + (block + (group fields (block (group x mutable) (group y (op =) 7))) + (group + methods + equality + (block + (group + define + equal + (parens (group a) (group b)) + (block + (group + is_posn + (parens (group b)) + (op =>) + (parens + (group + a + (op |.|) + x + (op ==) + b + (op |.|) + x + (op &&) + a + (op |.|) + y + (op ==) + b + (op |.|) + y))))) + (group define hash (parens (group a)) (block (group 17))) + (group define secondary_hash (parens (group a)) (block (group 19))))) + (group property prop_quality (block (group "high"))))) + (group + define + fourth + (parens (group n (op ::) Integer)) + (block + (group define m (block (group n (op *) n))) + (group define v (block (group m (op *) m))) + (group printf (parens (group "~a^4 = ~a\n") (group n) (group v))) + (group v))) + (group + define + exp + (parens + (group n (op ::) Integer) + (group #:base (block (group base (op =) 2.718281828459045)))) + (block + (group + if + (parens (group n (op ==) 1)) + (alts + (block (group base)) + (block + (group + base + (op *) + exp + (parens (group n (op -) 1) (group #:base (block (group base)))))))))) + (group + define + positive_p + (parens (group n)) + (block + (group if n (op >) 0 (alts (block (group true)) (block (group false)))))) + (group + define + go + (parens) + (block + (group + define + helper + (parens (group n)) + (block (group list (parens (group n) (group n))))) + (group + define + more + (parens (group m)) + (block + (group + if + m + (op ==) + 0 + (alts + (block (group "done")) + (block (group more (parens (group m (op -) 1)))))))) + (group helper (parens (group more (parens (group 9))))))) + (group + define + approx + (parens (group x)) + (block + (group + match + x + (alts + (block + (group + something + (parens (group v)) + (block (group printf (parens (group "got it\n"))) (group v)))) + (block (group nothing (block (group 0)))))))) + (group + define + approx_thunk + (parens (group x)) + (block + (group + match + x + (alts + (block + (group + something + (parens (group v)) + (block (group lambda (parens) (block (group v)))))) + (block + (group nothing (block (group lambda (parens) (block (group 0)))))))))) + (group + define + approx_thunk + (parens (group x)) + (block + (group + match + x + (alts + (block + (group + something + (parens (group v)) + (block (group lambda (parens) (block (group v)))))) + (block + (group nothing (block (group lambda (parens) (block (group 0)))))))))) + (group + define + approx_thunk + (parens (group x)) + (block + (group + match + x + (alts + (block + (group + something + (parens (group v)) + (block + (group + lambda + (alts + (block (group (parens) (block (group v)))) + (block (group (parens (group n)) (block (group v (op +) n))))))))) + (block + (group + nothing + (block + (group + lambda + (alts + (block (group (parens) (block (group 0)))) + (block (group (parens (group n)) (block (group n))))))))))))) + (group + define + curried + (block + (group + lambda + (parens (group x)) + (block + (group + lambda + (parens (group y)) + (block + (group + lambda + (parens (group z)) + (block (group list (parens (group x) (group y) (group z))))))))))) + (group + define + curried + (block + (group + lambda + (parens (group x)) + (block + (group + lambda + (parens (group y)) + (block + (group + lambda + (parens (group z)) + (block (group list (parens (group x) (group y) (group z))))))))))) + (group + define + dictionary + (block + (group + dict + (block + (group foo (block (group 17))) + (group bar (block (group string))) + (group baz (block (group true))))))) + (group + define + colors + (block + (group + list + (parens (group red) (group green) (group blue) (group orange))))) + (group + define + f + (parens + (group x_something) + (group y_something_else) + (group z_also_long_name)) + (block (group 5))) + (group + define + sum + (parens (group l)) + (block + (group + let + loop + (parens (group l (op =) l)) + (block + (group + if + is_null + (parens (group l)) + (alts + (block (group 0)) + (block + (group + first + (parens (group l)) + (op +) + loop + (parens (group rest (parens (group l)))))))))))) + (group + define + show_all + (parens (group l)) + (block + (group + for + (parens (group x (op =) in_list (parens (group l)))) + (block (group print (parens (group x))) (group newline (parens)))))) + (group + define + show_zip + (parens (group l) (group l2)) + (block + (group + for + (parens + (group x (op =) in_list (parens (group l))) + (group x2 (op =) in_list (parens (group l2)))) + (block + (group print (parens (group x))) + (group print_string (parens (group " "))) + (group print (parens (group x2))) + (group newline (parens)))))) + (group + define + show_combos_not_same + (parens (group l) (group l2)) + (block + (group + for + (parens (group x (op =) in_list (parens (group l)))) + (block + (group + then + (parens (group x2 (op =) in_list (parens (group l2)))) + (block + (group + when + (op !) + is_equal + (parens (group x) (group x2)) + (block + (group print (parens (group x))) + (group print_string (parens (group " "))) + (group print (parens (group x2))) + (group newline (parens)))))))))) + (group + define + map + (parens (group f) (group l)) + (block + (group + for + list + (parens (group x (op =) in_list (parens (group l)))) + (block (group f (parens (group x))))))) + (group + define + partition + (parens (group l) (group pred)) + (block + (group + for + fold + (parens + (group yes (op =) empty) + (group no (op =) empty) + (group + result + (parens + (group reverse (parens (group yes))) + (group reverse (parens (group no)))))) + (block + (group + with + (parens (group x (op =) in_list (parens (group l)))) + (block + (group + if + pred + (parens (group x)) + (alts + (block + (group + (parens (group cons (parens (group x) (group yes))) (group no)))) + (block + (group + (parens + (group yes) + (group cons (parens (group x) (group no)))))))))))))) + (group + local + (block + (group + with + (block + (group define x (block (group 1))) + (group define y (block (group 2))))) + (group in (block (group x (op +) y))))) + (group if t (alts (block (group if f)) (block (group a)) (block (group b)) (block (group y)))) + (group if t (alts (block (group if f (alts (block (group a)) (block (group b))))) (block (group y)))) + (group if t (alts (block (group tag (block (group if f (alts (block (group a)) (block (group b))))))) (block (group y)))) + (group if t (alts (block (group tag (block (group if f (alts (block (group a)) (block (group b))))))) (block (group y)))) + (group x (block (group y (block (group a) (group b) (group c))))) + (group x (block (group y (block (group a) (group b))) (group c))) + (group if t (alts (block (group x)) (block (group y) (group z)))) + (group if t (alts (block (group x)) (block (group y)))) + (group z) + (group x (block (group 3))) + (group branch (alts (block (group x)))))) + + +(define input2 +#< body + more + +x something | a + y: + w: + q + | c + z + & b + +x | indentize + y + z + | indentize: + y + +define fib(n): + match n + | 0 + : 0 + | 1 + : 1 + | n + : fib(n-1) + + fib(n-2) + more + +begin: + dictionary = [ + "foo" : 17, + "bar" : "string", + "baz" : #true + ] + +cond | null: 0 | list(x): 1 + | cons(a, d): f(a) + f(d) +INPUT + ) + +(define expected2 + '(top + (group somthing else (block (group 8))) + (group this is a very long linear group that spans multiple lines) + (group + this + is + (alts + (block (group one (block (group a long result)))) + (block (group two also long)) + (block (group three (block (group result nested here)))))) + (group this is the first group) + (group this is the second group) + (group this is a group with (parens (group a) (group nested) (group list))) + (group this is a group with (parens (group a) (group nested) (group list))) + (group this is a group with (parens (group a) (group nested) (group list))) + (group this is a group with (parens (group a) (group nested) (group list))) + (group hello (alts (block (group a)) (block (group c (block (group d)))))) + (group this (block (group is more) (group foo))) + (group + foo + (alts (block (group more)) (block (group again (block (group sub)))))) + (group something (op +)) + (group more stuff) + (group something (block (group more stuff))) + (group + define + (alts + (block (group fib (parens (group 0)) (op =) 0)) + (block (group fib (parens (group 1)) (op =) 1)) + (block + (group fib (parens (group n)) (op =)) + (group + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))))) + (group + define + (alts + (block (group fib (parens (group 0)) (op =) 0)) + (block (group fib (parens (group 1)) (op =) 1)) + (block + (group + fib + (parens (group n)) + (op =) + fib + (parens (group n (op -) 1)) + (op +) + fib + (parens (group n (op -) 2)))) + (block (group more)))) + (group + nonsense + (block + (group + hello + (alts + (block (group there 4.5)) + (block + (group more f (parens (group 8))) + (group + next + (parens + (group + ok + (alts + (block) + (block + (group + stuff + (block + (group + (parens + (group begin (block (group more) (group things))) + (group separately))) + (group again)))))))) + (group something)))))) + (group q (block (group x) (group y))) + (group z) + (group q (block (group x) (group y))) + (group z) + (group q (block (group w (block (group x) (group y))))) + (group z) + (group x (op +) w (alts (block (group z (block (group (block (group y)))))))) + (group + let + (alts + (block (group x (op =) 8)) + (block + (group + y + (op =) + 9 + (block (group 10) (group let (alts (block (group x (op =) 8))))))) + (block + (group + y + (op =) + 9 + (block + (group 10) + (group show (parens (group x (op +) y))) + (group x (op -) y)))))) + (group + letrec_syntax_and_values + (block + (group + (parens + (group m (op =) (parens (group syntax_rules (op ....)))) + (group n (op =) (parens (group syntax_rules (op ....))))) + (block + (group (parens (group x (op =) 10) (group y (op =) 12))) + (group (op =>) body) + (group more))))) + (group + x + something + (alts + (block (group a) (group y (block (group w (block (group q)))))) + (block (group c) (group z) (group (op &) b)))) + (group + x + (alts + (block (group indentize) (group y) (group z)) + (block (group indentize (block (group y)))))) + (group + define + fib + (parens (group n)) + (block + (group + match + n + (alts + (block (group 0) (group (block (group 0)))) + (block (group 1) (group (block (group 1)))) + (block + (group n) + (group + (block + (group fib (parens (group n (op -) 1))) + (group (op +) fib (parens (group n (op -) 2))) + (group more)))))))) + (group + begin + (block + (group + dictionary + (op =) + (brackets + (group "foo" (block (group 17))) + (group "bar" (block (group "string"))) + (group "baz" (block (group #t))))))) + (group + cond + (alts + (block (group null (block (group 0)))) + (block (group list (parens (group x)) (block (group 1)))) + (block + (group + cons + (parens (group a) (group d)) + (block (group f (parens (group a)) (op +) f (parens (group d)))))))))) + +(define input3 +#<)| +@x|(<[{hello}|(<[@6 there}]>)| + +@x{ 2 + 1 + 4 + 5 } + +@{4 @//{ this is a comment! } + 5 @// line comment + 6} + +The end +INPUT + ) + +(define expected3 + '(top + (group (parens (group a) (group b))) + (group (parens (group a) (group b)) (parens (group 0))) + (group (parens (group 7))) + (group (parens (group (brackets (group "9"))))) + (group (parens (group 7) (group (brackets (group "8, 10 ") (group "more"))))) + (group + 5 + (parens + (group 3) + (group + (brackets + (group "yohoo ") + (group + 9 + (parens + (group a) + (group b) + (group c) + (group (brackets (group "this is plain text") (group "\n") (group "inside braces"))))) + (group "\n") + (group "0"))))) + (group bracketed (parens (group "apple"))) + (group (braces (group bracketed)) (parens (group "apple"))) + (group 1 2 3 (block (group 5))) + (group 1 2 3 (parens (group (brackets (group "data")))) (block (group 5))) + (group x (parens (group (brackets (group "hello}there"))))) + (group x (parens (group (brackets (group "hello}there"))))) + (group x (parens (group (brackets (group "hello}") (group 6) (group " there"))))) + (group + x + (parens + (group + (brackets (group " 2") (group "\n") (group "1") (group "\n") (group " ") (group "4") (group "\n") (group " ") (group "5 "))))) + (group + (parens + (group (brackets (group "4 ") (group "\n") (group "5 ") (group " 6"))))) + (group The end))) + +(define (check input expected) + (let ([in (open-input-string input)]) + (define (out name parsed write) + (define path (build-path (find-system-path 'temp-dir) name)) + (printf "~a\n" path) + (call-with-output-file* + path + #:exists 'truncate + (lambda (o) (write parsed o)))) + (port-count-lines! in) + (define parsed-stx (parse-all in)) + (define parsed (syntax->datum parsed-stx)) + (unless (equal? expected parsed) + (out "expected" expected pretty-write) + (out "parsed" parsed pretty-write) + (error "parse failed")) + (define printed (shrubbery-syntax->string parsed-stx)) + (unless (equal? input printed) + (out "expected" input display) + (out "printed" printed display) + (error "print failed")) + (define (check-reparse count-lines?) + (define reparsed (let ([o (open-output-bytes)]) + (write-shrubbery parsed o) + (or (with-handlers ([exn:fail? (lambda (exn) (eprintf "~a\n" (exn-message exn)) #f)]) + (define in (open-input-bytes (get-output-bytes o))) + (when count-lines? + (port-count-lines! in)) + (syntax->datum (parse-all in))) + (begin + (out "wrote" (get-output-bytes o) displayln) + (error "parse of wrote failed"))))) + (unless (equal? parsed reparsed) + (out "expected" parsed pretty-print) + (out "reparsed" reparsed pretty-print) + (error "print failed"))) + (check-reparse #t) + (check-reparse #f))) + +(define (check-fail input rx) + (let ([in (open-input-string input)]) + (port-count-lines! in) + (unless (with-handlers ([exn:fail? (lambda (exn) (regexp-match? rx (exn-message exn)))]) + (parse-all in) + #f) + (error "failed to fail: ~s" input)))) + +(define (lines s . ss) + (apply string-append s (for/list ([s (in-list ss)]) (string-append "\n" s)))) + +(check input1 expected1) +(check input2 expected2) +(check input3 expected3) + +(check-fail "if t | «tag: if f | a | b» more | y" #rx"no terms allowed after `»`") +(check-fail "x: y:« a; b » more; c" #rx"no terms allowed after `»`") + +(check-fail (lines "x" + " y") + #rx"wrong indentation") +(check-fail (lines "1" + " + 2" + " + 3") + #rx"wrong indentation") +(check-fail (lines "1: 2" + " + 3") + #rx"wrong indentation") +(check-fail (lines "x | y | c" + " + 3") + #rx"wrong indentation") + +(check-fail "(«| a | c»)" #rx"misplaced `|`") +(check-fail "z «|« « | y » » »" #rx"misplaced `|`") +(check-fail "«|« w « | y » » »" #rx"misplaced `|`") diff --git a/shrubbery/weird.shrb b/shrubbery/weird.shrb new file mode 100644 index 000000000..129c9ecb8 --- /dev/null +++ b/shrubbery/weird.shrb @@ -0,0 +1,151 @@ +// A set of examples to see what happens with verious forms, +// where many of them seem nonsensical + +somthing else: 8 + +this is a \ + very long linear group \ + that spans multiple lines + +this is | one: a \ + long result + | two \ + also long + | three: + result \ + nested \ + here + +this is \ + the first group +this \ is \ the \ second \ group + +this is a group \ with (a, + nested, + list) + +this is a group \ + with (a, + nested, + list) + +this is a group \ + with (a, + \ + nested, + \ + list) + +this is a group \ + with (a, + \ + /* this a comment on `nested`: */ + nested, + \ + list) + +hello | a | c\ + : + d + +this: \ + is more + foo + +foo + | more \ + | again: + sub + +something + +more stuff + +something: + more stuff + +define + | fib(0) = 0 + | fib(1) = 1 + | fib(n) = + fib(n-1) + fib(n-2) + +define + | fib(0) = 0 + | fib(1) = 1 + | fib(n) = fib(n-1) + fib(n-2) + | more + +nonsense: + hello | there 4.5 + | more f(8) + next (ok | + | stuff: (begin: + more + things + , + separately) + again) + something + +q: + x; y +z + +q: + x; y; +z + +q: + w: + x; y; +z + +x + w | z : : + y + +let | x = 8 + | y = 9: 10 + let | x = 8 + | y = 9: 10 + show(x + y) + x - y + +letrec_syntax_and_values: + (m = (syntax_rules ....), + n = (syntax_rules ....)): + (x = 10, + y = 12) + => body + more + +x something | a + y: + w: + q + | c + z + & b + +x | indentize + y + z + | indentize: + y + +define fib(n): + match n + | 0 + : 0 + | 1 + : 1 + | n + : fib(n-1) + + fib(n-2) + more + +begin: + dictionary = [ + "foo" : 17, + "bar" : "string", + "baz" : #true + ] diff --git a/shrubbery/write.rkt b/shrubbery/write.rkt new file mode 100644 index 000000000..e8e36478b --- /dev/null +++ b/shrubbery/write.rkt @@ -0,0 +1,92 @@ +#lang racket/base +(require racket/keyword + racket/symbol) + +;; Writing a shubbery represented as an S-expression + +(provide write-shrubbery) + +(define rx:identifier #px"^(?:\\p{L}|_)(?:\\p{L}|\\p{N}|_)*$") + +(define (write-shrubbery v [op (current-output-port)]) + (let loop ([v v] [sole? #t]) + (cond + [(list? v) + (cond + [(null? v) + (error 'write-shubbery "unexpected ~s" v)] + [(eq? 'op (car v)) + (cond + [(and (not sole?) + (memq (cadr v) '(... ¿)) ) + (display "¿(? " op) + (display (cadr v) op) + (display ")" op)] + [else + (display (cadr v) op)])] + [(eq? 'alts (car v)) + (display "" op) + (for/fold ([first? #t]) ([v (in-list (cdr v))]) + (unless first? (display " " op)) + (display "|« " op) + (unless (and (pair? v) (eq? (car v) 'block)) + (error 'write-shubbery "unexpected ~s" v)) + (for/fold ([first? #t]) ([v (in-list (cdr v))]) + (unless first? (display "; " op)) + (loop v #f) + #f) + (display " »" op) + #f)] + [(eq? 'top (car v)) + (for/fold ([first? #t]) ([v (in-list (cdr v))]) + (unless first? (display "; " op)) + (loop v #t) + #f) + (void)] + [else + (define-values (open sep close) + (case (car v) + [(group) (values "" " " "")] + [(block) (values ":« " "; " " »")] + [(parens) (values "(" ", " ")")] + [(brackets) (values "[" ", " "]")] + [(braces) (values "{" ", " "}")] + [else (error 'write-shubbery "unexpected ~s" (car v))])) + (display open op) + (for/fold ([first? #t]) ([v (in-list (cdr v))]) + (unless (or first? + (and (pair? v) (eq? (car v) 'block))) + (display sep op)) + (loop v #f) + #f) + (display close op)])] + [(symbol? v) + (define s (symbol->immutable-string v)) + (cond + [(regexp-match? rx:identifier s) + (display s op)] + [else + (display "#{") + (write v op) + (display "}")])] + [(keyword? v) + (define s (keyword->immutable-string v)) + (cond + [(regexp-match? rx:identifier s) + (display "~" op) + (display s op)] + [else + (display "#{") + (write v op) + (display "}")])] + [(or (string? v) + (bytes? v) + (exact-integer? v) + (flonum? v)) + (write v op)] + [(boolean? v) + (display (if v "#true" "#false") op)] + [else + (display "#{") + (write v op) + (display "}")])))