diff --git a/README.md b/README.md index bb94e9cb..52f6e034 100644 --- a/README.md +++ b/README.md @@ -18,102 +18,6 @@ code style and, potentially, the convenience of using a single formatter tool, across multiple languages over their codebases, each with comparable styles applied. -## Motivation - -The style in which code is written has, historically, been mostly left -to personal choice. Of course, this is subjective by definition and has -led to many wasted hours reviewing formatting choices, rather than the -code itself. Prescribed style guides were an early solution to this, -spawning tools that lint a developer's formatting and ultimately leading -to automatic formatters. The latter were popularised by -[`gofmt`][gofmt], whose developers had [the insight][gofmt-slides] that -"good enough" uniform formatting, imposed on a codebase, largely -resolves these problems. - -Topiary follows this trend by aspiring to be a "universal formatter -engine", which allows developers to not only automatically format their -codebases with a uniform style, but to define that style for new -languages using a [simple DSL][tree-sitter-query]. This allows for the -fast development of formatters, providing a [Tree-sitter -grammar][tree-sitter-parsers] is defined for that language. - -## Design Principles - -Topiary has been created with the following goals in mind: - -* Use [Tree-sitter] for parsing, to avoid writing yet another grammar - for a formatter. - -* Expect idempotency. That is, formatting of already-formatted code - doesn't change anything. - -* For bundled formatting styles to meet the following constraints: - - * Be compatible with attested formatting styles used for that language - in the wild. - - * Be faithful to the author's intent: if code has been written such - that it spans multiple lines, that decision is preserved. - - * Minimise changes between commits such that diffs focus mainly on the - code that's changed, rather than superficial artefacts. That is, a - change on one line won't influence others, while the formatting - won't force you to make later, cosmetic changes when you modify your - code. - - * Be well-tested and robust, so that the formatter can be trusted in - large projects. - -* For end users -- i.e., not formatting style authors -- the formatter - should: - - * Prescribe a formatting style that, while customisable, is uniform - and "good enough" for their codebase. - - * Run efficiently. - - * Afford simple integration with other developer tools, such as - editors and language servers. - -## Language Support - - -The formatting styles for these languages come in two levels of maturity: -supported and experimental. - -#### Supported - -These formatting styles cover their target language and fulfil Topiary's -stated design goals. They are exposed, in Topiary, through the -`--language` command line flag, or language detection (based on file -extension). - -* [JSON] -* [Nickel] -* [OCaml] (both implementations and interfaces) -* [OCamllex] -* [TOML] -* [Tree Sitter Queries][tree-sitter-query] - -#### Contributed - -These languages' formatting styles have been generously provided by -external contributors. They are built in, by default, so are exposed in -the same way as supported languages. - -* [CSS] by @lavigneer - -#### Experimental - -These languages' formatting styles are subject to change and/or not yet -considered production-ready. They are _not_ built by default and are -gated behind a feature flag (either `experimental`, for all of them, or -by their individual name). Once included, they can be accessed in -Topiary in the usual way. - -* [Bash] -* [Rust] - ## Getting Started ### Installing @@ -125,1105 +29,6 @@ directory: cargo install --path topiary-cli ``` -Topiary needs to find the language query files (`.scm`) to function properly. By -default, `topiary` looks for a `languages` directory in the current working -directory. - -This won't work if you are running Topiary from another directory than this -repository. In order to use Topiary without restriction, **you must set the -environment variable `TOPIARY_LANGUAGE_DIR` to point to the directory where -Topiary's language query files (`.scm`) are located**. By default, you should -set it to `/topiary-queries/queries`, for example: - -```sh -export TOPIARY_LANGUAGE_DIR=/home/me/tools/topiary/topiary-queries/queries -topiary fmt ./projects/helloworld/hello.ml -``` - -`TOPIARY_LANGUAGE_DIR` can alternatively be set at build time. Topiary will pick -the correspond path up and embed it into the `topiary` binary. In that case, you -don't have to worry about making `TOPIARY_LANGUAGE_DIR` available at run-time -anymore. When `TOPIARY_LANGUAGE_DIR` has been set at build time and is set at -run-time as well, the run-time value takes precedence. - -See [`CONTRIBUTING.md`][contributing] for details on setting up a -development environment. - -### Setting up as pre-commit hook - -Topiary integrates seamlessly with [pre-commit-hooks.nix]: add Topiary as input -to your flake and, in [pre-commit-hooks.nix]'s setup, use: - -``` nix -pre-commit-check = nix-pre-commit-hooks.run { - hooks = { - nixfmt.enable = true; ## keep your normal hooks - ... - ## Add the following: - topiary = topiary.lib.${system}.pre-commit-hook; - }; -}; -``` - -[pre-commit-hooks.nix]: https://github.com/cachix/pre-commit-hooks.nix - -### Usage - -The Topiary CLI uses a number of subcommands to delineate functionality. -These can be listed with `topiary --help`; each subcommand then has its -own, dedicated help text. - - - -``` -CLI app for Topiary, the universal code formatter. - -Usage: topiary [OPTIONS] - -Commands: - format Format inputs - visualise Visualise the input's Tree-sitter parse tree - config Print the current configuration - prefetch Prefetch all languages in the configuration - completion Generate shell completion script - help Print this message or the help of the given subcommand(s) - -Options: - -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] - -v, --verbose... Logging verbosity (increased per occurrence) - -h, --help Print help - -V, --version Print version -``` - - -#### Format - - - -``` -Format inputs - -Usage: topiary format [OPTIONS] <--language |FILES> - -Arguments: - [FILES]... - Input files and directories (omit to read from stdin) - - Language detection and query selection is automatic, mapped from file extensions defined - in the Topiary configuration. - -Options: - -t, --tolerate-parsing-errors - Consume as much as possible in the presence of parsing errors - - -s, --skip-idempotence - Do not check that formatting twice gives the same output - - -l, --language - Topiary language identifier (for formatting stdin) - - -q, --query - Topiary query file override (when formatting stdin) - - -C, --configuration - Configuration file - - [env: TOPIARY_CONFIG_FILE] - - -v, --verbose... - Logging verbosity (increased per occurrence) - - -h, --help - Print help (see a summary with '-h') -``` - - -When formatting inputs from disk, language selection is detected from -the input files' extensions. To format standard input, you must specify -the `--language` and, optionally, `--query` arguments, omitting any -input files. - -Note: `fmt` is a recognised alias of the `format` subcommand. - -#### Visualise - - - -``` -Visualise the input's Tree-sitter parse tree - -Usage: topiary visualise [OPTIONS] <--language |FILE> - -Arguments: - [FILE] - Input file (omit to read from stdin) - - Language detection and query selection is automatic, mapped from file extensions defined - in the Topiary configuration. - -Options: - -f, --format - Visualisation format - - [default: dot] - - Possible values: - - dot: GraphViz DOT serialisation - - json: JSON serialisation - - -l, --language - Topiary language identifier (for formatting stdin) - - -q, --query - Topiary query file override (when formatting stdin) - - -C, --configuration - Configuration file - - [env: TOPIARY_CONFIG_FILE] - - -v, --verbose... - Logging verbosity (increased per occurrence) - - -h, --help - Print help (see a summary with '-h') -``` - - -When visualising inputs from disk, language selection is detected from -the input file's extension. To visualise standard input, you must -specify the `--language` and, optionally, `--query` arguments, omitting -the input file. The visualisation output is written to standard out. - -Note: `vis`, `visualize` and `view` are recognised aliases of the -`visualise` subcommand. - -#### Configuration - - - -``` -Print the current configuration - -Usage: topiary config [OPTIONS] - -Options: - -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] - -v, --verbose... Logging verbosity (increased per occurrence) - -h, --help Print help -``` - - -Note: `cfg` is a recognised alias of the `config` subcommand. - -#### Shell Completion - -Shell completion scripts for Topiary can be generated with the -`completion` subcommand. The output of which can be sourced into your -shell session or profile, as required. - - - -``` -Generate shell completion script - -Usage: topiary completion [OPTIONS] [SHELL] - -Arguments: - [SHELL] Shell (omit to detect from the environment) [possible values: bash, elvish, fish, - powershell, zsh] - -Options: - -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] - -v, --verbose... Logging verbosity (increased per occurrence) - -h, --help Print help -``` - - -For example, in Bash: - -```bash -source <(topiary completion) -``` - -#### Prefetching - -Topiary dynamically downloads, builds, and loads the tree-sitter grammars. In -order to ensure offline availability or speed up startup time, the grammars can -be prefetched and compiled. - - - -``` -Prefetch all languages in the configuration - -Usage: topiary prefetch [OPTIONS] - -Options: - -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] - -v, --verbose... Logging verbosity (increased per occurrence) - -h, --help Print help -``` - - -#### Logging - -By default, the Topiary CLI will only output error messages. You can -increase the logging verbosity with a respective number of -`-v`/`--verbose` flags: - -| Verbosity Flag | Logging Level | -| :------------- | :---------------------- | -| None | Errors | -| `-v` | ...and warnings | -| `-vv` | ...and information | -| `-vvv` | ...and debugging output | -| `-vvvv` | ...and tracing output | - -#### Exit Codes - -The Topiary process will exit with a zero exit code upon successful -formatting. Otherwise, the following exit codes are defined: - -| Reason | Code | -| :--------------------------- | ---: | -| Unspecified error | 1 | -| CLI argument parsing error | 2 | -| I/O error | 3 | -| Topiary query error | 4 | -| Source parsing error | 5 | -| Language detection error | 6 | -| Idempotency error | 7 | -| Unspecified formatting error | 8 | -| Multiple errors | 9 | - -When given multiple inputs, Topiary will do its best to process them -all, even in the presence of errors. Should _any_ errors occur, Topiary -will return a non-zero exit code. For more details on the nature of -these errors, run Topiary at the `warn` logging level (with `-v`). - -#### Example - -Once built, the program can be run like this: - -```bash -echo '{"foo":"bar"}' | topiary fmt --language json -``` - -`topiary` can also be built and run from source via either Cargo or Nix, -if you have those installed: - -```bash -echo '{"foo":"bar"}' | cargo run -- fmt --language json -echo '{"foo":"bar"}' | nix run . -- fmt --language json -``` - -It will output the following formatted code: - -```json -{ "foo": "bar" } -``` - -## Configuration - -Topiary is configured using `languages.ncl` files. The `.ncl` extension relates -to [Nickel](https://nickel-lang.org/), a configuration language created by -Tweag. There are up to four sources where Topiary checks for such a file. - -### Configuration Sources - -At build time the [languages.ncl](./languages.ncl) in the root of -this repository is embedded into Topiary. This file is parsed at -runtime. The purpose of this `languages.ncl` file is to provide sane -defaults for users of Topiary (both the library and the binary). - -The next two are read by the Topiary binary at runtime and allow the user to -configure Topiary to their needs. The first is intended to be user specific, and -can thus be found in the configuration directory of the OS: - -| OS | Typical Configuration Path | -| :------ | :---------------------------------------------------------------- | -| Unix | `/home/alice/.config/topiary/languages.ncl` | -| Windows | `C:\Users\Alice\AppData\Roaming\Topiary\config\languages.ncl` | -| macOS | `/Users/Alice/Library/Application Support/Topiary/languages.ncl` | - -This file is not automatically created by Topiary. - -The next source is intended to be a project-specific settings file for -Topiary. When running Topiary in some directory, it will ascend the file -tree until it finds a `.topiary` directory. It will then read any `languages.ncl` -file present in that directory. - -Finally, an explicit configuration file may be specified using the -`-C`/`--configuration` command line argument (or the -`TOPIARY_CONFIG_FILE` environment variable). This is intended for -driving Topiary under very specific use-cases. - -The Topiary binary parses these sources in the following order. - -1. The builtin configuration file. -2. The user configuration file in the OS's configuration directory. -3. The project specific Topiary configuration. -4. The explicit configuration file specified as a CLI argument. - -### Configuration Options - -The configuration file contains a record of languages. For instance, the one for -Nickel is defined as such: - -```nickel -nickel = { - extensions = ["ncl"], -}, -``` - -The `name` field is used by Topiary to associate the language entry with the -query file and Tree-sitter grammar. This value should be written in lowercase. - -The list of extensions is mandatory for every language, but does not necessarily -need to exist in every configuration file. It is sufficient if, for every -language, there is a single configuration file that defines the list of -extensions for that language. - -A final optional field, called `indent`, exists to define the indentation method -for that language. Topiary defaults to two spaces `" "` if it cannot find the -indent field in any configuration file for a specific language. - -### Overriding -If one of the sources listed above attempts to define a language configuration -already present in the builtin configuration, Topiary will display a Nickel error. - -To understand why, one can read the [Nickel documentation on Merging](https://nickel-lang.org/user-manual/merging). -The short answer is that a priority must be defined. The builtin configuration -has everything defined with priority 0. Any priority above that will replace -any other priority. For example, to override the entire Bash configuration, use the following -Nickel file. - -```nickel -{ - languages = { - bash | priority 1 = { - extensions = [ "sh" ], - indent = " ", - }, - }, -} -``` - -To override only the indentation, use the following Nickel file: -```nickel -{ - languages = { - bash = { - indent | priority 1 = " ", - }, - }, -} -``` - -## Design - -As long as there is a [Tree-sitter grammar][tree-sitter-parsers] defined -for a language, Tree-sitter can parse it and provide a concrete syntax -tree (CST). Tree-sitter will also allow us to run queries against this -tree. We can make use of that to define how a language should be -formatted. Here's an example query: - -```scheme -[ - (infix_operator) - "if" - ":" -] @append_space -``` - -This will match any node that the grammar has identified to be an -`infix_operator`, as well as any anonymous node containing `if` or `:`. -The match will be captured with the name `@append_space`. Our formatter -runs through all matches and captures, and when we process any capture -called `@append_space`, we will append a space after the matched node. - -The formatter goes through the CST nodes and detects all that are -spanning more than one line. This is interpreted to be an indication -from the programmer who wrote the input that the node in question should -be formatted as multi-line. Any other nodes will be formatted as -single-line. Whenever a query match has inserted a _softline_, it will -be expanded to a newline if the node is multi-line, or to a space or -nothing if the node is single-line, depending on whether -`@append_spaced_softline` or `@append_empty_softline` was used. - -Before rendering the output, the formatter will do a number of cleanup -operations, such as reducing consecutive spaces and newlines to one, -trimming spaces at end of lines and leading and trailing blanks lines, -and ordering indenting and newline instructions consistently. - -This means that you can for example prepend and append spaces to `if` -and `true`, and we will still output `if true` with just one space -between the words. - -## Supported capture instructions - -This assumes you are already familiar with the [Tree-sitter query -language][tree-sitter-query]. - -### A note on anchors -The behaviour of "anchors" can be counterintuitive. Consider, for instance, the -following query: -```scheme -( - (list_entry) @append_space - . -) -``` -One might assume that this query only matches the final element in the list but -this is not true. Since we did not explicitly march a parent node, the engine -will match on every `list_entry`. After all, the when looking only at the nodes -in the query, the `list_entry` is indeed the last node. - -To resolve this issue, match explicitly on the parent node: -```scheme -(list - (list_entry) @append_space - . -) -``` - -Or even implicitly: -```scheme -(_ - (list_entry) @append_space - . -) -``` - -Note that a capture is put after the node it is associated with. If you -want to put a space in front of a node, you do it like this: - -```scheme -(infix_operator) @prepend_space -``` - -This, on the other hand, will not work: - -```scheme -@append_space (infix_operator) -``` - -### `@allow_blank_line_before` - -The matched nodes will be allowed to have a blank line before them, if -specified in the input. For any other nodes, blank lines will be -removed. - -#### Example - -```scheme -; Allow comments and type definitions to have a blank line above them -[ - (comment) - (type_definition) -] @allow_blank_line_before -``` - -### `@append_delimiter` / `@prepend_delimiter` - -The matched nodes will have a delimiter appended to them. The delimiter -must be specified using the predicate `#delimiter!`. - -#### Example - -```scheme -; Put a semicolon delimiter after field declarations, unless they already have -; one, in which case we do nothing. -( - (field_declaration) @append_delimiter - . - ";"* @do_nothing - (#delimiter! ";") -) -``` - -If there is already a semicolon, the `@do_nothing` instruction will be -activated and prevent the other instructions in the query (the -`@append_delimiter`, here) from applying. Otherwise, the `";"*` captures -nothing and in this case the associated instruction (`@do_nothing`) does -not activate. - -Note that `@append_delimiter` is the same as `@append_space` when the -delimiter is set to `" "` (i.e., a space). - -### `@append_multiline_delimiter` / `@prepend_multiline_delimiter` - -The matched nodes will have a multi-line-only delimiter appended to -them. It will be printed only in multi-line nodes, and omitted in -single-line nodes. The delimiter must be specified using the predicate -`#delimiter!`. - -#### Example - -```scheme -; Add a semicolon at the end of lists only if they are multi-line, to avoid [1; 2; 3;]. -(list_expression - (#delimiter! ";") - (_) @append_multiline_delimiter - . - ";"? @do_nothing - . - "]" - . -) -``` - -If there is already a semicolon, the `@do_nothing` instruction will be -activated and prevent the other instructions in the query (the -`@append_multiline_delimiter`, here) from applying. Likewise, if the -node is single-line, the delimiter will not be appended either. - -### `@append_empty_softline` / `@prepend_empty_softline` - -The matched nodes will have an empty softline appended or prepended to -them. This will be expanded to a newline for multi-line nodes and to -nothing for single-line nodes. - -#### Example - -```scheme -; Put an empty softline before dots, so that in multi-line constructs we start -; new lines for each dot. -(_ - "." @prepend_empty_softline -) -``` - -### `@append_hardline` / `@prepend_hardline` - -The matched nodes will have a newline appended or prepended to them. - -#### Example - -```scheme -; Consecutive definitions must be separated by line breaks -( - (value_definition) @append_hardline - . - (value_definition) -) -``` - -### `@append_indent_start` / `@prepend_indent_start` - -The matched nodes will trigger indentation before or after them. This -will only apply to lines following, until an indentation end is -signalled. If indentation is started and ended on the same line, nothing -will happen. This is useful, because we get the correct behaviour -whether a node is formatted as single-line or multi-line. It is -important that all indentation starts and ends are balanced. - -#### Example - -```scheme -; Start an indented block after these -[ - "begin" - "else" - "then" - "{" -] @append_indent_start -``` - -### `@append_indent_end` / `@prepend_indent_end` - -The matched nodes will trigger that indentation ends before or after -them. - -#### Example - -```scheme -; End the indented block before these -[ - "end" - "}" -] @prepend_indent_end - -; End the indented block after these -[ - (else_clause) - (then_clause) -] @append_indent_end -``` - -### `@append_input_softline` / `@prepend_input_softline` - -The matched nodes will have an input softline appended or prepended to -them. An input softline is a newline if the node has a newline in front -of it in the input document, otherwise it is a space. - -#### Example - -```scheme -; Input softlines before and after all comments. This means that the input -; decides if a comment should have line breaks before or after. But don't put a -; softline directly in front of commas or semicolons. - -(comment) @prepend_input_softline - -( - (comment) @append_input_softline - . - [ "," ";" ]* @do_nothing -) -``` - -### `@append_space` / `@prepend_space` - -The matched nodes will have a space appended or prepended to them. Note -that this is the same as `@append_delimiter` / `@prepend_delimiter`, -with space as delimiter. - -#### Example - -```scheme -[ - (infix_operator) - "if" - ":" -] @append_space -``` - -### `@append_antispace` / `@prepend_antispace` - -It is often the case that tokens need to be juxtaposed with spaces, -except in a few isolated contexts. Rather than writing complicated rules -that enumerate every exception, an "antispace" can be inserted with -`@append_antispace` / `@prepend_antispace`; this will destroy any spaces -(not newlines) from that node, including those added by other formatting -rules. - -#### Example - -```scheme -[ - "," - ";" - ":" - "." -] @prepend_antispace -``` - -### `@append_spaced_softline` / `@prepend_spaced_softline` - -The matched nodes will have a spaced softline appended or prepended to -them. This will be expanded to a newline for multi-line nodes and to a -space for single-line nodes. - -#### Example - -```scheme -; Append spaced softlines, unless there is a comment following. -( - [ - "begin" - "else" - "then" - "->" - "{" - ";" - ] @append_spaced_softline - . - (comment)* @do_nothing -) -``` - -### `@delete` - -Remove the matched node from the output. - -#### Example - -```scheme -; Move semicolon after comments. -( - ";" @delete - . - (comment)+ @append_delimiter - (#delimiter! ";") -) -``` - -### `@do_nothing` - -If any of the captures in a query match are `@do_nothing`, then the -match will be ignored. - -#### Example - -```scheme -; Put a semicolon delimiter after field declarations, unless they already have -; one, in which case we do nothing. -( - (field_declaration) @append_delimiter - . - ";"* @do_nothing - (#delimiter! ";") -) -``` - -### `@multi_line_indent_all` - -To be used on comments or other leaf nodes, to indicate that we should indent -all its lines, not just the first. - -#### Example - -```scheme -(#language! ocaml) -(comment) @multi_line_indent_all -``` - -### `@single_line_no_indent` - -The matched node will be printed alone, on a single line, with no indentation. - -#### Example - -```scheme -(#language! ocaml) -; line number directives must be alone on their line, and can't be indented -(line_number_directive) @single_line_no_indent -``` - -### Understanding the different newline captures - -| Type | Single-Line Context | Multi-Line Context | -| :-------------- | :------------------ | :----------------- | -| Hardline | Newline | Newline | -| Empty Softline | Nothing | Newline | -| Spaced Softline | Space | Newline | -| Input Softline | Input-Dependent | Input-Dependent | - -"Input softlines" are rendered as newlines whenever the targeted node -follows a newline in the input. Otherwise, they are rendered as spaces. - -#### Example - -Consider the following JSON, which has been hand-formatted to exhibit -every context under which the different newline capture names operate: - -```json -{ - "single-line": [1, 2, 3, 4], - "multi-line": [ - 1, 2, - 3 - , 4 - ] -} -``` - -We'll apply a simplified set of JSON format queries that: -1. Opens (and closes) an indented block for objects; -2. Each key-value pair gets its own line, with the value split onto a - second; -3. Applies the different newline capture name on array delimiters. - -That is, iterating over each `@NEWLINE` type, we apply the following: - -```scheme -(#language! json) - -(object . "{" @append_hardline @append_indent_start) -(object "}" @prepend_hardline @prepend_indent_end .) -(object (pair) @prepend_hardline) -(pair . _ ":" @append_hardline) - -(array "," @NEWLINE) -``` - -The first two formatting rules are just for clarity's sake. The last -rule is what's important; the results of which are demonstrated below: - -##### `@append_hardline` - -```json -{ - "single-line": - [1, - 2, - 3, - 4], - "multi-line": - [1, - 2, - 3, - 4] -} -``` - -##### `@prepend_hardline` - -```json -{ - "single-line": - [1 - ,2 - ,3 - ,4], - "multi-line": - [1 - ,2 - ,3 - ,4] -} -``` - -##### `@append_empty_softline` - -```json -{ - "single-line": - [1,2,3,4], - "multi-line": - [1, - 2, - 3, - 4] -} -``` - -##### `@prepend_empty_softline` - -```json -{ - "single-line": - [1,2,3,4], - "multi-line": - [1 - ,2 - ,3 - ,4] -} -``` - -##### `@append_spaced_softline` - -```json -{ - "single-line": - [1, 2, 3, 4], - "multi-line": - [1, - 2, - 3, - 4] -} -``` - -##### `@prepend_spaced_softline` - -```json -{ - "single-line": - [1 ,2 ,3 ,4], - "multi-line": - [1 - ,2 - ,3 - ,4] -} -``` - -##### `@append_input_softline` - -```json -{ - "single-line": - [1, 2, 3, 4], - "multi-line": - [1, 2, - 3, 4] -} -``` - -##### `@prepend_input_softline` - -```json -{ - "single-line": - [1 ,2 ,3 ,4], - "multi-line": - [1 ,2 ,3 - ,4] -} -``` - -### Custom scopes and softlines - -So far, we've expanded softlines into line breaks depending on whether -the CST node they are associated with is multi-line. Sometimes, CST -nodes define scopes that are either too big or too small for our needs. -For instance, consider this piece of OCaml code: - -```ocaml -(1,2, -3) -``` - -Its CST is the following: - -``` -{Node parenthesized_expression (0, 0) - (1, 2)} - Named: true - {Node ( (0, 0) - (0, 1)} - Named: false - {Node product_expression (0, 1) - (1, 1)} - Named: true - {Node product_expression (0, 1) - (0, 4)} - Named: true - {Node number (0, 1) - (0, 2)} - Named: true - {Node , (0, 2) - (0, 3)} - Named: false - {Node number (0, 3) - (0, 4)} - Named: true - {Node , (0, 4) - (0, 5)} - Named: false - {Node number (1, 0) - (1, 1)} - Named: true - {Node ) (1, 1) - (1, 2)} - Named: false -``` - -We would want to add a line break after the first comma, but because the -CST structure is nested, the node containing this comma -(`product_expression (0, 1) - (0, 4)`) is *not* multi-line Only the -top-level node `product_expression (0, 1) - (1, 1)` is multi-line. - -To solve this issue, we introduce user-defined scopes and softlines. - -#### `@prepend_begin_scope` / `@append_begin_scope` / `@prepend_end_scope` / `@append_end_scope` - -These tags are used to define custom scopes. In conjunction with the `#scope_id! -` predicate, they define scopes that can span multiple CST nodes, or only part -of one. For instance, this scope matches anything between parenthesis in a -`parenthesized_expression`: - -```scheme -(parenthesized_expression - "(" @append_begin_scope - ")" @prepend_end_scope - (#scope_id! "tuple") -) -``` - -#### Scoped softlines - -We have four predicates that insert softlines in custom scopes, in -conjunction with the `#scope_id!` predicate: - -* `@prepend_empty_scoped_softline` -* `@prepend_spaced_scoped_softline` -* `@append_empty_scoped_softline` -* `@append_spaced_scoped_softline` - -When one of these scoped softlines is used, their behaviour depends on -the innermost encompassing scope with the corresponding `scope_id`. If -that scope is multi-line, the softline expands into a line break. In any -other regard, they behave as their non-`scoped` counterparts. - -#### Example - -This Tree-sitter query: - -```scheme -(#language! ocaml) - -(parenthesized_expression - "(" @begin_scope @append_empty_softline @append_indent_start - ")" @end_scope @prepend_empty_softline @prepend_indent_end - (#scope_id! "tuple") -) - -(product_expression - "," @append_spaced_scoped_softline - (#scope_id! "tuple") -) -``` - -...formats this piece of code: - -```ocaml -(1,2, -3) -``` - -...as: - -```ocaml -( - 1, - 2, - 3 -) -``` - -...while the single-lined `(1, 2, 3)` is kept as is. - -If we used `@append_spaced_softline` rather than -`@append_spaced_scoped_softline`, the `1,` would be followed by a space rather -than a newline, because it's inside a single-line `product_expression`. - -### Testing context with predicates - -Sometimes, similarly to what happens with softlines, we want a query to match -only if the context is single-line, or multi-line. Topiary has several -predicates that achieve this result. - -### `#single_line_only!` / `#multi_line_only!` - -These predicates allow the query to trigger only if the matched nodes are in a -single-line (resp. multi-line) context. - -#### Example - -```scheme -; Allow (and enforce) the optional "|" before the first match case -; in OCaml if and only if the context is multi-line -( - "with" - . - "|" @delete - . - (match_case) - (#single_line_only!) -) - -( - "with" - . - "|"? @do_nothing - . - (match_case) @prepend_delimiter - (#delimiter! "| ") - (#multi_line_only!) -) -``` - -### `#single_line_scope_only!` / `#multi_line_scope_only!` - -These predicates allow the query to trigger only if the associated custom scope -containing the matched nodes are is single-line (resp. multi-line). - -#### Example - -```scheme -; Allow (and enforce) the optional "|" before the first match case -; in function expressions in OCaml if and only if the scope is multi-line -(function_expression - (match_case)? @do_nothing - . - "|" @delete - . - (match_case) - (#single_line_scope_only! "function_definition") -) -(function_expression - "|"? @do_nothing - . - (match_case) @prepend_delimiter - (#multi_line_scope_only! "function_definition") - (#delimiter! "| ") ; sic -) -``` ## Suggested workflow diff --git a/bin/verify-documented-usage.sh b/bin/verify-documented-usage.sh index 708f7399..f6792de6 100755 --- a/bin/verify-documented-usage.sh +++ b/bin/verify-documented-usage.sh @@ -17,7 +17,7 @@ get-cli-usage() { } get-readme-usage() { - # Get the help text from the README + # Get the help text from the usage file local subcommand="${1-ROOT}" sed --quiet " @@ -26,11 +26,11 @@ get-readme-usage() { /${FENCE}/d # Delete the code fences p # Print anything else } - " README.md + " docs/book/src/cli-usage.md } diff-usage() { - # Generate a diff between the README and CLI help text + # Generate a diff between the usage file and CLI help text local subcommand="${1-ROOT}" diff --text \ @@ -46,13 +46,13 @@ main() { local _subcommand for _subcommand in "${subcommands[@]}"; do if ! _diff=$(diff-usage "${_subcommand}"); then - >&2 echo "Usage is not correctly documented in README.md for the ${_subcommand} subcommand!" + >&2 echo "Usage is not correctly documented in cli-usage.md for the ${_subcommand} subcommand!" echo "${_diff}" exit 1 fi done - >&2 echo "Usage is correctly documented in README.md" + >&2 echo "Usage is correctly documented in cli-usage.md" } main diff --git a/docs/book/.gitignore b/docs/book/.gitignore new file mode 100644 index 00000000..7585238e --- /dev/null +++ b/docs/book/.gitignore @@ -0,0 +1 @@ +book diff --git a/docs/book/book.toml b/docs/book/book.toml new file mode 100644 index 00000000..90ec6cfa --- /dev/null +++ b/docs/book/book.toml @@ -0,0 +1,6 @@ +[book] +authors = ["Tweag I/O"] +language = "en" +multilingual = false +src = "src" +title = "Topiary" diff --git a/docs/book/src/SUMMARY.md b/docs/book/src/SUMMARY.md new file mode 100644 index 00000000..828b4ad1 --- /dev/null +++ b/docs/book/src/SUMMARY.md @@ -0,0 +1,14 @@ +# Summary +[Introduction](./introduction.md) + +- [Installation](./installation/main.md) + - [Package managers](./installation/package-managers.md) + - [Building from source](./installation/building-from-source.md) + - [Using with Nix](./installation/using-with-nix.md) +- [CLI Usage](./cli-usage.md) +- [Library Usage](./library-usage.md) +- [Language Support](./usage/language-support.md) +- [Configuration](./configuration/main.md) +- [Guides](./guides/main.md) + - [Adding a language](./guides/adding-a-language.md) + - [Writing queries](./guides/writing-queries.md) diff --git a/docs/book/src/cli-usage.md b/docs/book/src/cli-usage.md new file mode 100644 index 00000000..d083281f --- /dev/null +++ b/docs/book/src/cli-usage.md @@ -0,0 +1,256 @@ +# CLI +### Usage +The Topiary CLI uses a number of subcommands to delineate functionality. +These can be listed with `topiary --help`; each subcommand then has its +own, dedicated help text. + + + +``` +CLI app for Topiary, the universal code formatter. + +Usage: topiary [OPTIONS] + +Commands: + format Format inputs + visualise Visualise the input's Tree-sitter parse tree + config Print the current configuration + prefetch Prefetch all languages in the configuration + completion Generate shell completion script + help Print this message or the help of the given subcommand(s) + +Options: + -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] + -v, --verbose... Logging verbosity (increased per occurrence) + -h, --help Print help + -V, --version Print version +``` + + +#### Format + + + +``` +Format inputs + +Usage: topiary format [OPTIONS] <--language |FILES> + +Arguments: + [FILES]... + Input files and directories (omit to read from stdin) + + Language detection and query selection is automatic, mapped from file extensions defined + in the Topiary configuration. + +Options: + -t, --tolerate-parsing-errors + Consume as much as possible in the presence of parsing errors + + -s, --skip-idempotence + Do not check that formatting twice gives the same output + + -l, --language + Topiary language identifier (for formatting stdin) + + -q, --query + Topiary query file override (when formatting stdin) + + -C, --configuration + Configuration file + + [env: TOPIARY_CONFIG_FILE] + + -v, --verbose... + Logging verbosity (increased per occurrence) + + -h, --help + Print help (see a summary with '-h') +``` + + +When formatting inputs from disk, language selection is detected from +the input files' extensions. To format standard input, you must specify +the `--language` and, optionally, `--query` arguments, omitting any +input files. + +Note: `fmt` is a recognised alias of the `format` subcommand. + +#### Visualise + + + +``` +Visualise the input's Tree-sitter parse tree + +Usage: topiary visualise [OPTIONS] <--language |FILE> + +Arguments: + [FILE] + Input file (omit to read from stdin) + + Language detection and query selection is automatic, mapped from file extensions defined + in the Topiary configuration. + +Options: + -f, --format + Visualisation format + + [default: dot] + + Possible values: + - dot: GraphViz DOT serialisation + - json: JSON serialisation + + -l, --language + Topiary language identifier (for formatting stdin) + + -q, --query + Topiary query file override (when formatting stdin) + + -C, --configuration + Configuration file + + [env: TOPIARY_CONFIG_FILE] + + -v, --verbose... + Logging verbosity (increased per occurrence) + + -h, --help + Print help (see a summary with '-h') +``` + + +When visualising inputs from disk, language selection is detected from +the input file's extension. To visualise standard input, you must +specify the `--language` and, optionally, `--query` arguments, omitting +the input file. The visualisation output is written to standard out. + +Note: `vis`, `visualize` and `view` are recognised aliases of the +`visualise` subcommand. + +#### Configuration + + + +``` +Print the current configuration + +Usage: topiary config [OPTIONS] + +Options: + -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] + -v, --verbose... Logging verbosity (increased per occurrence) + -h, --help Print help +``` + + +Note: `cfg` is a recognised alias of the `config` subcommand. + +#### Shell Completion + +Shell completion scripts for Topiary can be generated with the +`completion` subcommand. The output of which can be sourced into your +shell session or profile, as required. + + + +``` +Generate shell completion script + +Usage: topiary completion [OPTIONS] [SHELL] + +Arguments: + [SHELL] Shell (omit to detect from the environment) [possible values: bash, elvish, fish, + powershell, zsh] + +Options: + -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] + -v, --verbose... Logging verbosity (increased per occurrence) + -h, --help Print help +``` + + +For example, in Bash: + +```bash +source <(topiary completion) +``` + +#### Prefetching + +Topiary dynamically downloads, builds, and loads the tree-sitter grammars. In +order to ensure offline availability or speed up startup time, the grammars can +be prefetched and compiled. + + + +``` +Prefetch all languages in the configuration + +Usage: topiary prefetch [OPTIONS] + +Options: + -C, --configuration Configuration file [env: TOPIARY_CONFIG_FILE] + -v, --verbose... Logging verbosity (increased per occurrence) + -h, --help Print help +``` + + +#### Logging + +By default, the Topiary CLI will only output error messages. You can +increase the logging verbosity with a respective number of +`-v`/`--verbose` flags: + +| Verbosity Flag | Logging Level | +| :------------- | :---------------------- | +| None | Errors | +| `-v` | ...and warnings | +| `-vv` | ...and information | +| `-vvv` | ...and debugging output | +| `-vvvv` | ...and tracing output | + +#### Exit Codes + +The Topiary process will exit with a zero exit code upon successful +formatting. Otherwise, the following exit codes are defined: + +| Reason | Code | +| :--------------------------- | ---: | +| Unspecified error | 1 | +| CLI argument parsing error | 2 | +| I/O error | 3 | +| Topiary query error | 4 | +| Source parsing error | 5 | +| Language detection error | 6 | +| Idempotency error | 7 | +| Unspecified formatting error | 8 | +| Multiple errors | 9 | + +When given multiple inputs, Topiary will do its best to process them +all, even in the presence of errors. Should _any_ errors occur, Topiary +will return a non-zero exit code. For more details on the nature of +these errors, run Topiary at the `warn` logging level (with `-v`). + +#### Example + +Once built, the program can be run like this: + +```bash +echo '{"foo":"bar"}' | topiary fmt --language json +``` + +`topiary` can also be built and run from source via either Cargo or Nix, +if you have those installed: + +```bash +echo '{"foo":"bar"}' | cargo run -- fmt --language json +echo '{"foo":"bar"}' | nix run . -- fmt --language json +``` + +It will output the following formatted code: + +```json +{ "foo": "bar" } +``` diff --git a/docs/book/src/configuration/main.md b/docs/book/src/configuration/main.md new file mode 100644 index 00000000..07d4c9fa --- /dev/null +++ b/docs/book/src/configuration/main.md @@ -0,0 +1,95 @@ +# Configuration +Topiary is configured using `languages.ncl` files. The `.ncl` extension relates +to [Nickel](https://nickel-lang.org/), a configuration language created by +Tweag. There are up to four sources where Topiary checks for such a file. + +### Configuration Sources + +At build time the [languages.ncl](https://github.com/tweag/topiary/blob/main/topiary-config/languages.ncl) in the root of +this repository is embedded into Topiary. This file is parsed at +runtime. The purpose of this `languages.ncl` file is to provide sane +defaults for users of Topiary (both the library and the binary). + +The next two are read by the Topiary binary at runtime and allow the user to +configure Topiary to their needs. The first is intended to be user specific, and +can thus be found in the configuration directory of the OS: + +| OS | Typical Configuration Path | +| :------ | :---------------------------------------------------------------- | +| Unix | `/home/alice/.config/topiary/languages.ncl` | +| Windows | `C:\Users\Alice\AppData\Roaming\Topiary\config\languages.ncl` | +| macOS | `/Users/Alice/Library/Application Support/Topiary/languages.ncl` | + +This file is not automatically created by Topiary. + +The next source is intended to be a project-specific settings file for +Topiary. When running Topiary in some directory, it will ascend the file +tree until it finds a `.topiary` directory. It will then read any `languages.ncl` +file present in that directory. + +Finally, an explicit configuration file may be specified using the +`-C`/`--configuration` command line argument (or the +`TOPIARY_CONFIG_FILE` environment variable). This is intended for +driving Topiary under very specific use-cases. + +The Topiary binary parses these sources in the following order. + +1. The builtin configuration file. +2. The user configuration file in the OS's configuration directory. +3. The project specific Topiary configuration. +4. The explicit configuration file specified as a CLI argument. + +### Configuration Options + +The configuration file contains a record of languages. For instance, the one for +Nickel is defined as such: + +```nickel +nickel = { + extensions = ["ncl"], +}, +``` + +The `name` field is used by Topiary to associate the language entry with the +query file and Tree-sitter grammar. This value should be written in lowercase. + +The list of extensions is mandatory for every language, but does not necessarily +need to exist in every configuration file. It is sufficient if, for every +language, there is a single configuration file that defines the list of +extensions for that language. + +A final optional field, called `indent`, exists to define the indentation method +for that language. Topiary defaults to two spaces `" "` if it cannot find the +indent field in any configuration file for a specific language. + +### Overriding +If one of the sources listed above attempts to define a language configuration +already present in the builtin configuration, Topiary will display a Nickel error. + +To understand why, one can read the [Nickel documentation on Merging](https://nickel-lang.org/user-manual/merging). +The short answer is that a priority must be defined. The builtin configuration +has everything defined with priority 0. Any priority above that will replace +any other priority. For example, to override the entire Bash configuration, use the following +Nickel file. + +```nickel +{ + languages = { + bash | priority 1 = { + extensions = [ "sh" ], + indent = " ", + }, + }, +} +``` + +To override only the indentation, use the following Nickel file: +```nickel +{ + languages = { + bash = { + indent | priority 1 = " ", + }, + }, +} +``` diff --git a/docs/book/src/guides/adding-a-language.md b/docs/book/src/guides/adding-a-language.md new file mode 100644 index 00000000..fef2f41c --- /dev/null +++ b/docs/book/src/guides/adding-a-language.md @@ -0,0 +1 @@ +# Adding a language diff --git a/docs/book/src/guides/main.md b/docs/book/src/guides/main.md new file mode 100644 index 00000000..02cb246c --- /dev/null +++ b/docs/book/src/guides/main.md @@ -0,0 +1,2 @@ +# Guides +This section houses contribution guides to Topiary that are a little more involved. diff --git a/docs/book/src/guides/writing-queries.md b/docs/book/src/guides/writing-queries.md new file mode 100644 index 00000000..bee75a6a --- /dev/null +++ b/docs/book/src/guides/writing-queries.md @@ -0,0 +1,702 @@ +# Writing queries +As long as there is a [Tree-sitter grammar][tree-sitter-parsers] defined +for a language, Tree-sitter can parse it and provide a concrete syntax +tree (CST). Tree-sitter will also allow us to run queries against this +tree. We can make use of that to define how a language should be +formatted. Here's an example query: + +```scheme +[ + (infix_operator) + "if" + ":" +] @append_space +``` + +This will match any node that the grammar has identified to be an +`infix_operator`, as well as any anonymous node containing `if` or `:`. +The match will be captured with the name `@append_space`. Our formatter +runs through all matches and captures, and when we process any capture +called `@append_space`, we will append a space after the matched node. + +The formatter goes through the CST nodes and detects all that are +spanning more than one line. This is interpreted to be an indication +from the programmer who wrote the input that the node in question should +be formatted as multi-line. Any other nodes will be formatted as +single-line. Whenever a query match has inserted a _softline_, it will +be expanded to a newline if the node is multi-line, or to a space or +nothing if the node is single-line, depending on whether +`@append_spaced_softline` or `@append_empty_softline` was used. + +Before rendering the output, the formatter will do a number of cleanup +operations, such as reducing consecutive spaces and newlines to one, +trimming spaces at end of lines and leading and trailing blanks lines, +and ordering indenting and newline instructions consistently. + +This means that you can for example prepend and append spaces to `if` +and `true`, and we will still output `if true` with just one space +between the words. + +## Supported capture instructions + +This assumes you are already familiar with the [Tree-sitter query +language][tree-sitter-query]. + +### A note on anchors +The behaviour of "anchors" can be counterintuitive. Consider, for instance, the +following query: +```scheme +( + (list_entry) @append_space + . +) +``` +One might assume that this query only matches the final element in the list but +this is not true. Since we did not explicitly march a parent node, the engine +will match on every `list_entry`. After all, the when looking only at the nodes +in the query, the `list_entry` is indeed the last node. + +To resolve this issue, match explicitly on the parent node: +```scheme +(list + (list_entry) @append_space + . +) +``` + +Or even implicitly: +```scheme +(_ + (list_entry) @append_space + . +) +``` + +Note that a capture is put after the node it is associated with. If you +want to put a space in front of a node, you do it like this: + +```scheme +(infix_operator) @prepend_space +``` + +This, on the other hand, will not work: + +```scheme +@append_space (infix_operator) +``` + +### `@allow_blank_line_before` + +The matched nodes will be allowed to have a blank line before them, if +specified in the input. For any other nodes, blank lines will be +removed. + +#### Example + +```scheme +; Allow comments and type definitions to have a blank line above them +[ + (comment) + (type_definition) +] @allow_blank_line_before +``` + +### `@append_delimiter` / `@prepend_delimiter` + +The matched nodes will have a delimiter appended to them. The delimiter +must be specified using the predicate `#delimiter!`. + +#### Example + +```scheme +; Put a semicolon delimiter after field declarations, unless they already have +; one, in which case we do nothing. +( + (field_declaration) @append_delimiter + . + ";"* @do_nothing + (#delimiter! ";") +) +``` + +If there is already a semicolon, the `@do_nothing` instruction will be +activated and prevent the other instructions in the query (the +`@append_delimiter`, here) from applying. Otherwise, the `";"*` captures +nothing and in this case the associated instruction (`@do_nothing`) does +not activate. + +Note that `@append_delimiter` is the same as `@append_space` when the +delimiter is set to `" "` (i.e., a space). + +### `@append_multiline_delimiter` / `@prepend_multiline_delimiter` + +The matched nodes will have a multi-line-only delimiter appended to +them. It will be printed only in multi-line nodes, and omitted in +single-line nodes. The delimiter must be specified using the predicate +`#delimiter!`. + +#### Example + +```scheme +; Add a semicolon at the end of lists only if they are multi-line, to avoid [1; 2; 3;]. +(list_expression + (#delimiter! ";") + (_) @append_multiline_delimiter + . + ";"? @do_nothing + . + "]" + . +) +``` + +If there is already a semicolon, the `@do_nothing` instruction will be +activated and prevent the other instructions in the query (the +`@append_multiline_delimiter`, here) from applying. Likewise, if the +node is single-line, the delimiter will not be appended either. + +### `@append_empty_softline` / `@prepend_empty_softline` + +The matched nodes will have an empty softline appended or prepended to +them. This will be expanded to a newline for multi-line nodes and to +nothing for single-line nodes. + +#### Example + +```scheme +; Put an empty softline before dots, so that in multi-line constructs we start +; new lines for each dot. +(_ + "." @prepend_empty_softline +) +``` + +### `@append_hardline` / `@prepend_hardline` + +The matched nodes will have a newline appended or prepended to them. + +#### Example + +```scheme +; Consecutive definitions must be separated by line breaks +( + (value_definition) @append_hardline + . + (value_definition) +) +``` + +### `@append_indent_start` / `@prepend_indent_start` + +The matched nodes will trigger indentation before or after them. This +will only apply to lines following, until an indentation end is +signalled. If indentation is started and ended on the same line, nothing +will happen. This is useful, because we get the correct behaviour +whether a node is formatted as single-line or multi-line. It is +important that all indentation starts and ends are balanced. + +#### Example + +```scheme +; Start an indented block after these +[ + "begin" + "else" + "then" + "{" +] @append_indent_start +``` + +### `@append_indent_end` / `@prepend_indent_end` + +The matched nodes will trigger that indentation ends before or after +them. + +#### Example + +```scheme +; End the indented block before these +[ + "end" + "}" +] @prepend_indent_end + +; End the indented block after these +[ + (else_clause) + (then_clause) +] @append_indent_end +``` + +### `@append_input_softline` / `@prepend_input_softline` + +The matched nodes will have an input softline appended or prepended to +them. An input softline is a newline if the node has a newline in front +of it in the input document, otherwise it is a space. + +#### Example + +```scheme +; Input softlines before and after all comments. This means that the input +; decides if a comment should have line breaks before or after. But don't put a +; softline directly in front of commas or semicolons. + +(comment) @prepend_input_softline + +( + (comment) @append_input_softline + . + [ "," ";" ]* @do_nothing +) +``` + +### `@append_space` / `@prepend_space` + +The matched nodes will have a space appended or prepended to them. Note +that this is the same as `@append_delimiter` / `@prepend_delimiter`, +with space as delimiter. + +#### Example + +```scheme +[ + (infix_operator) + "if" + ":" +] @append_space +``` + +### `@append_antispace` / `@prepend_antispace` + +It is often the case that tokens need to be juxtaposed with spaces, +except in a few isolated contexts. Rather than writing complicated rules +that enumerate every exception, an "antispace" can be inserted with +`@append_antispace` / `@prepend_antispace`; this will destroy any spaces +(not newlines) from that node, including those added by other formatting +rules. + +#### Example + +```scheme +[ + "," + ";" + ":" + "." +] @prepend_antispace +``` + +### `@append_spaced_softline` / `@prepend_spaced_softline` + +The matched nodes will have a spaced softline appended or prepended to +them. This will be expanded to a newline for multi-line nodes and to a +space for single-line nodes. + +#### Example + +```scheme +; Append spaced softlines, unless there is a comment following. +( + [ + "begin" + "else" + "then" + "->" + "{" + ";" + ] @append_spaced_softline + . + (comment)* @do_nothing +) +``` + +### `@delete` + +Remove the matched node from the output. + +#### Example + +```scheme +; Move semicolon after comments. +( + ";" @delete + . + (comment)+ @append_delimiter + (#delimiter! ";") +) +``` + +### `@do_nothing` + +If any of the captures in a query match are `@do_nothing`, then the +match will be ignored. + +#### Example + +```scheme +; Put a semicolon delimiter after field declarations, unless they already have +; one, in which case we do nothing. +( + (field_declaration) @append_delimiter + . + ";"* @do_nothing + (#delimiter! ";") +) +``` + +### `@multi_line_indent_all` + +To be used on comments or other leaf nodes, to indicate that we should indent +all its lines, not just the first. + +#### Example + +```scheme +(#language! ocaml) +(comment) @multi_line_indent_all +``` + +### `@single_line_no_indent` + +The matched node will be printed alone, on a single line, with no indentation. + +#### Example + +```scheme +(#language! ocaml) +; line number directives must be alone on their line, and can't be indented +(line_number_directive) @single_line_no_indent +``` + +### Understanding the different newline captures + +| Type | Single-Line Context | Multi-Line Context | +| :-------------- | :------------------ | :----------------- | +| Hardline | Newline | Newline | +| Empty Softline | Nothing | Newline | +| Spaced Softline | Space | Newline | +| Input Softline | Input-Dependent | Input-Dependent | + +"Input softlines" are rendered as newlines whenever the targeted node +follows a newline in the input. Otherwise, they are rendered as spaces. + +#### Example + +Consider the following JSON, which has been hand-formatted to exhibit +every context under which the different newline capture names operate: + +```json +{ + "single-line": [1, 2, 3, 4], + "multi-line": [ + 1, 2, + 3 + , 4 + ] +} +``` + +We'll apply a simplified set of JSON format queries that: +1. Opens (and closes) an indented block for objects; +2. Each key-value pair gets its own line, with the value split onto a + second; +3. Applies the different newline capture name on array delimiters. + +That is, iterating over each `@NEWLINE` type, we apply the following: + +```scheme +(#language! json) + +(object . "{" @append_hardline @append_indent_start) +(object "}" @prepend_hardline @prepend_indent_end .) +(object (pair) @prepend_hardline) +(pair . _ ":" @append_hardline) + +(array "," @NEWLINE) +``` + +The first two formatting rules are just for clarity's sake. The last +rule is what's important; the results of which are demonstrated below: + +##### `@append_hardline` + +```json +{ + "single-line": + [1, + 2, + 3, + 4], + "multi-line": + [1, + 2, + 3, + 4] +} +``` + +##### `@prepend_hardline` + +```json +{ + "single-line": + [1 + ,2 + ,3 + ,4], + "multi-line": + [1 + ,2 + ,3 + ,4] +} +``` + +##### `@append_empty_softline` + +```json +{ + "single-line": + [1,2,3,4], + "multi-line": + [1, + 2, + 3, + 4] +} +``` + +##### `@prepend_empty_softline` + +```json +{ + "single-line": + [1,2,3,4], + "multi-line": + [1 + ,2 + ,3 + ,4] +} +``` + +##### `@append_spaced_softline` + +```json +{ + "single-line": + [1, 2, 3, 4], + "multi-line": + [1, + 2, + 3, + 4] +} +``` + +##### `@prepend_spaced_softline` + +```json +{ + "single-line": + [1 ,2 ,3 ,4], + "multi-line": + [1 + ,2 + ,3 + ,4] +} +``` + +##### `@append_input_softline` + +```json +{ + "single-line": + [1, 2, 3, 4], + "multi-line": + [1, 2, + 3, 4] +} +``` + +##### `@prepend_input_softline` + +```json +{ + "single-line": + [1 ,2 ,3 ,4], + "multi-line": + [1 ,2 ,3 + ,4] +} +``` + +### Custom scopes and softlines + +So far, we've expanded softlines into line breaks depending on whether +the CST node they are associated with is multi-line. Sometimes, CST +nodes define scopes that are either too big or too small for our needs. +For instance, consider this piece of OCaml code: + +```ocaml +(1,2, +3) +``` + +Its CST is the following: + +``` +{Node parenthesized_expression (0, 0) - (1, 2)} - Named: true + {Node ( (0, 0) - (0, 1)} - Named: false + {Node product_expression (0, 1) - (1, 1)} - Named: true + {Node product_expression (0, 1) - (0, 4)} - Named: true + {Node number (0, 1) - (0, 2)} - Named: true + {Node , (0, 2) - (0, 3)} - Named: false + {Node number (0, 3) - (0, 4)} - Named: true + {Node , (0, 4) - (0, 5)} - Named: false + {Node number (1, 0) - (1, 1)} - Named: true + {Node ) (1, 1) - (1, 2)} - Named: false +``` + +We would want to add a line break after the first comma, but because the +CST structure is nested, the node containing this comma +(`product_expression (0, 1) - (0, 4)`) is *not* multi-line Only the +top-level node `product_expression (0, 1) - (1, 1)` is multi-line. + +To solve this issue, we introduce user-defined scopes and softlines. + +#### `@prepend_begin_scope` / `@append_begin_scope` / `@prepend_end_scope` / `@append_end_scope` + +These tags are used to define custom scopes. In conjunction with the `#scope_id! +` predicate, they define scopes that can span multiple CST nodes, or only part +of one. For instance, this scope matches anything between parenthesis in a +`parenthesized_expression`: + +```scheme +(parenthesized_expression + "(" @append_begin_scope + ")" @prepend_end_scope + (#scope_id! "tuple") +) +``` + +#### Scoped softlines + +We have four predicates that insert softlines in custom scopes, in +conjunction with the `#scope_id!` predicate: + +* `@prepend_empty_scoped_softline` +* `@prepend_spaced_scoped_softline` +* `@append_empty_scoped_softline` +* `@append_spaced_scoped_softline` + +When one of these scoped softlines is used, their behaviour depends on +the innermost encompassing scope with the corresponding `scope_id`. If +that scope is multi-line, the softline expands into a line break. In any +other regard, they behave as their non-`scoped` counterparts. + +#### Example + +This Tree-sitter query: + +```scheme +(#language! ocaml) + +(parenthesized_expression + "(" @begin_scope @append_empty_softline @append_indent_start + ")" @end_scope @prepend_empty_softline @prepend_indent_end + (#scope_id! "tuple") +) + +(product_expression + "," @append_spaced_scoped_softline + (#scope_id! "tuple") +) +``` + +...formats this piece of code: + +```ocaml +(1,2, +3) +``` + +...as: + +```ocaml +( + 1, + 2, + 3 +) +``` + +...while the single-lined `(1, 2, 3)` is kept as is. + +If we used `@append_spaced_softline` rather than +`@append_spaced_scoped_softline`, the `1,` would be followed by a space rather +than a newline, because it's inside a single-line `product_expression`. + +### Testing context with predicates + +Sometimes, similarly to what happens with softlines, we want a query to match +only if the context is single-line, or multi-line. Topiary has several +predicates that achieve this result. + +### `#single_line_only!` / `#multi_line_only!` + +These predicates allow the query to trigger only if the matched nodes are in a +single-line (resp. multi-line) context. + +#### Example + +```scheme +; Allow (and enforce) the optional "|" before the first match case +; in OCaml if and only if the context is multi-line +( + "with" + . + "|" @delete + . + (match_case) + (#single_line_only!) +) + +( + "with" + . + "|"? @do_nothing + . + (match_case) @prepend_delimiter + (#delimiter! "| ") + (#multi_line_only!) +) +``` + +### `#single_line_scope_only!` / `#multi_line_scope_only!` + +These predicates allow the query to trigger only if the associated custom scope +containing the matched nodes are is single-line (resp. multi-line). + +#### Example + +```scheme +; Allow (and enforce) the optional "|" before the first match case +; in function expressions in OCaml if and only if the scope is multi-line +(function_expression + (match_case)? @do_nothing + . + "|" @delete + . + (match_case) + (#single_line_scope_only! "function_definition") +) +(function_expression + "|"? @do_nothing + . + (match_case) @prepend_delimiter + (#multi_line_scope_only! "function_definition") + (#delimiter! "| ") ; sic +) +``` diff --git a/docs/book/src/installation.md b/docs/book/src/installation.md new file mode 100644 index 00000000..25267fe2 --- /dev/null +++ b/docs/book/src/installation.md @@ -0,0 +1 @@ +# Installation diff --git a/docs/book/src/installation/building-from-source.md b/docs/book/src/installation/building-from-source.md new file mode 100644 index 00000000..692c08e4 --- /dev/null +++ b/docs/book/src/installation/building-from-source.md @@ -0,0 +1,24 @@ +# Building from source +Assuming you have the Topiary repository cloned locally, you can build Topiary in two ways. + +## Using Nix +To build Topiary using nix simply call `nix build`, this assumes you have +`flakes` and `nix-command` enabled. + +Alternatively, the Topiary flake also has a Topiary package that doesn't fetch +and build the grammars but instead takes them from the nixpkgs pinned by the +flake. To build this version use `nix build .#topiary-cli-nix`. + +## Using Cargo +Building Topiary using the standard rust build tools requires not only those +tools, but also some external dependencies. Our flake provides a devshell that +sets all required environment variables and fetches all dependencies. To enter +this devshell use `nix develop` or setup [direnv][direnv]. + +If you cannot/do not want to use Nix, you are responsible for getting all +dependencies and setting the required environment variables. You must ensure at +least `pkg-config` and `openssl` are available. + +From there use `cargo build` to build `topiary-cli`. + +[direnv]: https://direnv.net/ diff --git a/docs/book/src/installation/main.md b/docs/book/src/installation/main.md new file mode 100644 index 00000000..cd19f08b --- /dev/null +++ b/docs/book/src/installation/main.md @@ -0,0 +1,30 @@ +# Installation +Topiary can be installed in a few different ways. For more information on the +different ways, see the following pages: + - [Package managers](./package-managers.md) + - [Building from source](./building-from-source.md) + - [Using with Nix](./using-with-nix.md) + +Topiary needs to find the language query files (`.scm`) to function properly. By +default, `topiary` looks for a `languages` directory in the current working +directory. + +This won't work if you are running Topiary from another directory than this +repository. In order to use Topiary without restriction, **you must set the +environment variable `TOPIARY_LANGUAGE_DIR` to point to the directory where +Topiary's language query files (`.scm`) are located**. By default, you should +set it to `/topiary-queries/queries`, for example: + +```sh +export TOPIARY_LANGUAGE_DIR=/home/me/tools/topiary/topiary-queries/queries +topiary fmt ./projects/helloworld/hello.ml +``` + +`TOPIARY_LANGUAGE_DIR` can alternatively be set at build time. Topiary will pick +the correspond path up and embed it into the `topiary` binary. In that case, you +don't have to worry about making `TOPIARY_LANGUAGE_DIR` available at run-time +anymore. When `TOPIARY_LANGUAGE_DIR` has been set at build time and is set at +run-time as well, the run-time value takes precedence. + +See [`CONTRIBUTING.md`][contributing] for details on setting up a +development environment. diff --git a/docs/book/src/installation/package-managers.md b/docs/book/src/installation/package-managers.md new file mode 100644 index 00000000..d89722be --- /dev/null +++ b/docs/book/src/installation/package-managers.md @@ -0,0 +1,54 @@ +# Package managers +Topiary has been packaged for some package managers. + +## Nix(pkgs) +To install Topiary from nix use whichever way you are familiar with. For instance: + +### `configuration.nix` +```nix +environment.systemPackages = with pkgs; [ + topiary +]; +``` + +### `home.nix` +```nix +home.packages = with pkgs; [ + topiary +]; +``` + +### nix install +#### On NixOS +```bash +# without flakes: +nix-env -iA nixos.topiary +# with flakes: +nix profile install nixpkgs#topiary +``` +#### On Non NixOS +```bash +# without flakes: +nix-env -iA nixpkgs.topiary +# with flakes: +nix profile install nixpkgs#topiary +``` + +### `nix-shell` +To temporarily add `topiary` to your path, use: +```bash +# without flakes: +nix-shell -p topiary +# with flakes: +nix shell nixpkgs#topiary +``` + +## Arch Linux (AUR) +```bash +yay -S topiary +``` + +## Cargo +```bash +cargo install -p topiary-cli +``` diff --git a/docs/book/src/installation/using-with-nix.md b/docs/book/src/installation/using-with-nix.md new file mode 100644 index 00000000..709efcd1 --- /dev/null +++ b/docs/book/src/installation/using-with-nix.md @@ -0,0 +1,25 @@ +# Using with Nix +Topiary provides a flake with several attributes. The main one is `topiary-cli` +that produces a version of the CLI that doesn't come with any tree-sitter +grammars. However, this version cannot be used in Nix. For that purpose the +flake also provides the `topiary-cli-nix` package. This package utilizes the +tree-sitter grammars from the `nixpkgs` flake input. Note that the tree-sitter +grammar for OCamlLex hasn't been added to nixpkgs yet, and so this build +disables support for that language. + +## Git Hooks +Topiary integrates seamlessly with [pre-commit-hooks.nix]: add Topiary as input +to your flake and, in [pre-commit-hooks.nix]'s setup, use: + +``` nix +pre-commit-check = nix-pre-commit-hooks.run { + hooks = { + nixfmt.enable = true; ## keep your normal hooks + ... + ## Add the following: + topiary = topiary.lib.${system}.pre-commit-hook; + }; +}; +``` + +[pre-commit-hooks.nix]: https://github.com/cachix/pre-commit-hooks.nix diff --git a/docs/book/src/introduction.md b/docs/book/src/introduction.md new file mode 100644 index 00000000..ffa6257f --- /dev/null +++ b/docs/book/src/introduction.md @@ -0,0 +1,69 @@ +# Introduction +Topiary aims to be a uniform formatter for simple languages, as part of +the [Tree-sitter] ecosystem. It is named after the art of clipping or +trimming trees into fantastic shapes. + +Topiary is designed for formatter authors and formatter users. Authors +can create a formatter for a language without having to write their own +formatting engine or even their own parser. Users benefit from uniform +code style and, potentially, the convenience of using a single formatter +tool, across multiple languages over their codebases, each with +comparable styles applied. + +## Motivation + +The style in which code is written has, historically, been mostly left +to personal choice. Of course, this is subjective by definition and has +led to many wasted hours reviewing formatting choices, rather than the +code itself. Prescribed style guides were an early solution to this, +spawning tools that lint a developer's formatting and ultimately leading +to automatic formatters. The latter were popularised by +[`gofmt`][gofmt], whose developers had [the insight][gofmt-slides] that +"good enough" uniform formatting, imposed on a codebase, largely +resolves these problems. + +Topiary follows this trend by aspiring to be a "universal formatter +engine", which allows developers to not only automatically format their +codebases with a uniform style, but to define that style for new +languages using a [simple DSL][tree-sitter-query]. This allows for the +fast development of formatters, providing a [Tree-sitter +grammar][tree-sitter-parsers] is defined for that language. + +## Design Principles + +Topiary has been created with the following goals in mind: + +* Use [Tree-sitter] for parsing, to avoid writing yet another grammar + for a formatter. + +* Expect idempotency. That is, formatting of already-formatted code + doesn't change anything. + +* For bundled formatting styles to meet the following constraints: + + * Be compatible with attested formatting styles used for that language + in the wild. + + * Be faithful to the author's intent: if code has been written such + that it spans multiple lines, that decision is preserved. + + * Minimise changes between commits such that diffs focus mainly on the + code that's changed, rather than superficial artefacts. That is, a + change on one line won't influence others, while the formatting + won't force you to make later, cosmetic changes when you modify your + code. + + * Be well-tested and robust, so that the formatter can be trusted in + large projects. + +* For end users -- i.e., not formatting style authors -- the formatter + should: + + * Prescribe a formatting style that, while customisable, is uniform + and "good enough" for their codebase. + + * Run efficiently. + + * Afford simple integration with other developer tools, such as + editors and language servers. + diff --git a/docs/book/src/library-usage.md b/docs/book/src/library-usage.md new file mode 100644 index 00000000..94d4e511 --- /dev/null +++ b/docs/book/src/library-usage.md @@ -0,0 +1,11 @@ +# Library +Topiary is published on [crates.io][topiary-crate], which means that its +documentation can be found on [docs.rs][topiary-docs]. Of main interest is +the `formatter` function that performs the actual formatting. The example in +the documentation of that function is kept up to date. For a more complete +example, please see the +[client-app example in the Topiary repository][client-app]. + +[client-app]: https://github.com/tweag/topiary/tree/main/examples/client-app +[topiary-crate]: https://crates.io/crates/topiary-core +[topiary-docs]: https://docs.rs/topiary-core/latest/topiary_core/ diff --git a/docs/book/src/usage/language-support.md b/docs/book/src/usage/language-support.md new file mode 100644 index 00000000..a39874dd --- /dev/null +++ b/docs/book/src/usage/language-support.md @@ -0,0 +1,37 @@ +# Language Support + +Topiary's support of languages comes in two levels of maturity: +supported and experimental. + +#### Supported + +These formatting styles cover their target language and fulfill Topiary's +stated design goals. They are exposed, in Topiary, through the +`--language` command line flag, or language detection (based on file +extension). + +* [JSON] +* [Nickel] +* [OCaml] (both implementations and interfaces) +* [OCamllex] +* [TOML] +* [Tree Sitter Queries][tree-sitter-query] + +#### Contributed + +These languages' formatting styles have been generously provided by +external contributors. They are built in, by default, so are exposed in +the same way as supported languages. + +* [CSS] by @lavigneer + +#### Experimental + +These languages' formatting styles are subject to change and/or not yet +considered production-ready. They are _not_ built by default and are +gated behind a feature flag (either `experimental`, for all of them, or +by their individual name). Once included, they can be accessed in +Topiary in the usual way. + +* [Bash] +* [Rust] diff --git a/flake.nix b/flake.nix index 04266e7c..8650934c 100644 --- a/flake.nix +++ b/flake.nix @@ -99,7 +99,7 @@ wasm = pkgs.callPackage ./shell.nix { checks = self.checks.${system}; craneLib = topiaryPkgs.passtru.craneLibWasm; inherit binPkgs; }; }; - ## For easy use in https://github.com/cachix/pre-commit-hooks.nix + ## For easy use in https://github.com/cachix/git-hooks.nix lib.pre-commit-hook = { enable = true; name = "topiary"; diff --git a/shell.nix b/shell.nix index ea905692..dcb9d40b 100644 --- a/shell.nix +++ b/shell.nix @@ -17,6 +17,7 @@ craneLib.devShell cargo-dist cargo-flamegraph rust-analyzer + mdbook pkg-config openssl.dev