From 1a15a2c60732bf2411f69267249e1fc7d93647bd Mon Sep 17 00:00:00 2001 From: Claudio Wunder Date: Tue, 1 Nov 2022 18:39:40 +0100 Subject: [PATCH] tools: add documentation regarding our api tooling Introduces a proper imperative description of how the current API documentation build system works. Refs: https://github.com/nodejs/next-10/issues/169 --- doc/contributing/api-documentation.md | 296 ++++++++++++++++++++++++++ 1 file changed, 296 insertions(+) create mode 100644 doc/contributing/api-documentation.md diff --git a/doc/contributing/api-documentation.md b/doc/contributing/api-documentation.md new file mode 100644 index 00000000000000..04e3eeddbf3c13 --- /dev/null +++ b/doc/contributing/api-documentation.md @@ -0,0 +1,296 @@ +# Node.js API Documentation Tooling + +The Node.js API documentation is generated by an in-house tooling that resides +within the [tools/doc](https://github.com/nodejs/node/tree/main/tools/doc) +directory. + +The build process (using `make doc`) uses this tooling to parse the markdown +files in [doc/api](https://github.com/nodejs/node/tree/main/doc/api) and +generate the following: + +1. Human-readable HTML in `out/doc/api/*.html` +2. A JSON representation in `out/doc/api/*.json` + +These are published to nodejs.org for multiple versions of Node.js. As an +example the latest version of the Human-redable HTML is published to +[nodejs.org/en/doc](https://nodejs.org/en/docs/), and the latest version json +documentation is published to +[nodejs.org/api/all.json](https://nodejs.org/api/all.json) + + + +**The key things to know about the tooling include:** + +1. The entry-point is `tools/doc/generate.js`. +2. The tooling supports the CLI arguments listed in the table below. +3. The tooling processes one file at a time. +4. The tooling uses a set of dependencies as described in the dependencies + section. +5. The tooling parses the input files and does several transformations to the + AST (Abstract Syntax Tree). +6. The tooling generates a JSON output that contains the metadata and content of + the Markdown file. +7. The tooling generates a HTML output that contains a human-readable and ready + to-view version of the file. + +This documentation serves the purpose of explaining the existing tooling +processes, to allow easier maintenance and evolution of the tooling. It is not +meant to be a guide on how to write documentation for Node.js. + +#### Vocabulary & Good to Know's + +* AST means "Abstract Syntax Tree" and it is a data structure that represents + the structure of a certain data format. In our case, the AST is a "graph" + representation of the contents of the Markdown file. +* MDN means [Mozilla Developer Network](https://developer.mozilla.org/en-US/) + and it is a website that contains documentation for web technologies. We use + it as a reference for the structure of the documentation. +* The + [Stability Index](https://nodejs.org/dist/latest/docs/api/documentation.html#stability-index) + is used to community the Stability of a given Node.js module. The Stability + levels include: +* Stability 0: Deprecated. (This module is Deprecated) +* Stability 1: Experimental. (This module is Experimental) +* Stability 2: Stable. (This module is Stable) +* Stability 3: Legacy. (This module is Legacy) +* Within Remark YAML snippets `` are considered HTML nodes, + that's because YAML isn't valid Markdown content. (Doesn't abide by the + Markdown spec) +* "New Tooling" references to the (written from-scratch) API build tooling + introduced in `nodejs/nodejs.dev` that might replace the current one from + `nodejs/node` + +## CLI Arguments + +The tooling requires a `filename` argument and supports extra arguments (some +also required) as shown below: + +| Argument | Description | Required | Example | +| --------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | -------- | ---------------------------------- | +| `--node-version=` | The version of Node.js that is being documented. It defaults to `process.version` which is supplied by Node.js itself | No | v19.0.0 | +| `--output-directory=` | The directory where the output files will be generated. | Yes | `./out/api/` | +| `--apilinks=` | This file is used as an index to specify the source file for each module | No | `./out/doc/api/apilinks.json` | +| `--versions-file=` | This file is used to specify an index of all previous versions of Node.js. It is used for the Version Navigation on the API docs page. | No | `./out/previous-doc-versions.json` | + +**Note:** both of the `apilinks` and `versions-file` parameters are generated by +the Node.js build process (Makefile). And they're files containing a JSON +object. + +### Basic Usage + +```bash +# cd tools/doc +npm run node-doc-generator ${filename} +``` + +**OR** + +```bash +# nodejs/node root directory +make doc +``` + +## Dependencies and how the Tooling works internally + +The API tooling uses an-AST-alike library called +[unified](https://github.com/unifiedjs/unified) for processing the Input file as +a Graph that supports easy modification and update of its nodes. + +In addition to `unified` we also use +[Remark](https://github.com/remarkjs/remark) for manipulating the Markdown part, +and [Rehype](https://github.com/rehypejs/rehype)to help convert to and from +Markdown. + +### What are the steps of the internal tooling? + +The tooling uses `unified` pipe-alike engine to pipe each part of the process. +(The description below is a simplified version) + +* Starting from reading the Frontmatter section of the Markdown file with + [remark-frontmatter](https://www.npmjs.com/package/remark-frontmatter). +* Then the tooling goes to parse the Markdown by using `remark-parse` and adds + support to [GitHub Flavoured Markdown](https://github.github.com/gfm/). +* The tooling proceeds by parsing some of the Markdown nodes and transforming + them to HTML. +* The tooling proceeds to generate the JSON output of the file. +* Finally it does its final node transformations and generates a stringified + HTML. +* It then stores the output to a JSON file and adds extra styling to the HTML + and then stores the HTML file. + +### What each file is responsible for? + +The files listed below are the ones referenced and actually used during the +build process of the API docs as we see on . The +remaining files from the directory might be used by other steps of the Node.js +Makefile or might even be deprecated/remnant of old processes and might need to +be revisited/removed. + +* **`html.mjs`**: Responsible for transforming nodes by decorating them with + visual artifacts for the HTML pages; + * For example, transforming man or JS doc references to links correctly + referring to respective External documentation. +* **`json.mjs`**: Responsible for generating the JSON output of the file; + * It is mostly responsible for going through the whole Markdown file and + generating a JSON object that represent the Metadata of a specific Module. + * For example, for the FS module, it will generate an object with all its + methods, events, classes and use several regular expressions (ReGeX) for + extracting the information needed. +* **`generate.mjs`**: Main entry-point of doc generation for a specific file. It + does e2e processing of a documentation file; +* **`allhtml.mjs`**: A script executed after all files are generated to create a + single "all" page containing all the HTML documentation; +* **`alljson.mjs`**: A script executed after all files are generated to create a + single "all" page containing all the JSON entries; +* **`markdown.mjs`**: Contains utility to replace Markdown links to work with + the website. +* **`common.mjs`**: Contains a few utility functions that are used by the other + files. +* **`type-parser.mjs`**: Used to replace "type references" (e.g. "String", or + "Buffer") to the correct Internal/External documentation pages (i.e. MDN or + other Node.js documentation pages). + +**Note:** It is important to mention that other files not mentioned here might +be used during the process but are not relevant to the generation of the API +docs themselves. You will notice that a lot of the logic within the build +process is **specific** to the current infrastructure. +Just as adding some JavaScript snippets, styles, transforming certain Markdown +elements into HTML, and adding certain HTML classes or such things. + +**Note:** Regarding the previous **Note** it is important to mention that we're +currently working on an API tooling that is generic and independent of the +current Nodejs.org Infrastructure. +[The new tooling that is functional is available at the nodejs.dev repository](https://github.com/nodejs/nodejs.dev/blob/main/scripts/syncApiDocs.js) +and uses plain ReGeX (No AST) and [MDX](https://mdxjs.com/). + +## The Build Process + +The build process that happens on `generate.mjs` follows the steps below: + +* Links within the Markdown are replaced directly within the source Markdown + (AST) (`markdown.replaceLinks`) + * This happens within `markdown.mjs` and basically it adds suffixes or + modifies link references within the Markdown + * This is necessary for the `https://nodejs.org` infrastructure as all pages + are suffixed with `.html` +* Text (and some YAML) Nodes are transformed/modified through + `html.preprocessText` +* JSON output is generated through `json.jsonAPI` +* The title of the page is inferred through `html.firstHeader` +* Nodes are transformed into HTML Elements through `html.preprocessElements` +* The HTML Table of Contents (ToC) is generated through `html.buildToc` + +### `html.mjs` + +This file is responsible for doing node AST transformations that either update +Markdown nodes to decorate them with more data or transform them into HTML Nodes +that attain a certain visual responsibility; For example, to generate the "Added +at" label, or the Source Links or the Stability Index, or the History table. + +**Note:** Methods not listed below are either not relevant or utility methods +for string/array/object manipulation (e.g.: are used by the other methods +mentioned below). + +#### `preprocessText` + +**New Tooling:** Most of the features within this method are available within +the new tooling. + +This method does two things: + +* Replaces the Source Link YAML entry `<-- source_link= -->` into a "Source + Link" HTML anchor element. +* Replaces type references within the Markdown (text) (i.e.: "String", "Buffer") + into the correct HTML anchor element that links to the correct documentation + page. + * The original node then gets mutated from text to HTML. + * It also updates references to Linux "MAN" pages to Web versions of them. + +#### `firstHeader` + +**New Tooling:** All features within this method are available within the new +Tooling. + +Is used to attempt to extract the first heading of the page (recursively) to +define the "title" of the page. + +**Note:** As all API Markdown files start with a Heading, this could possibly be +improved to a reduced complexity. + +#### `preprocessElements` + +**New Tooling:** All features within this method are available within the new +tooling. + +This method is responsible for doing multiple transformations within the AST +Nodes, in majority, transforming the source node in respective HTML elements +with diverse responsibilities, such as: + +* Updating Markdown `code` blocks by adding Language highlighting + * It also adds the "CJS"/"MJS" switch to Nodes that are followed by their + CJS/ESM equivalents. +* Increasing the Heading level of each Heading +* Parses YAML blocks and transforms them into HTML elements (See more at the + `parseYAML` method) +* Updates BlockQuotes that are prefixed by the "Stability" word into a Stability + Index HTML element. + +#### `parseYAML` + +**New Tooling:** Most of the features within this method are available within +the new tooling. + +This method is responsible for parsing the `<--YAML snippets -->` and +transforming them into HTML elements. + +It follows a certain kind of "schema" that basically constitues in the following +options: + +| YAML Key | Description | Example | Example Result | Available on new tooling | +| ------------- | ------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | --------------------------- | ------------------------ | +| `added` | It's used to reference when a certain "module", "class" or "method" was added on Node.js | `added: v0.1.90` | `Added in: v0.1.90` | Yes | +| `deprecated` | It's used to reference when a certain "module", "class" or "method" was deprecated on Node.js | `deprecated: v0.1.90` | `Deprecated since: v0.1.90` | Yes | +| `removed` | It's used to reference when a certain "module", "class" or "method" was removed on Node.js | `removed: v0.1.90` | `Removed in: v0.1.90` | No | +| `changes` | It's used to describe all the changes (historical ones) that happened within a certain "module", "class" or "method" in Node.js | `[{ version: v0.1.90, pr-url: '', description: '' }]` | -- | Yes | +| `napiVersion` | It's used to describe in which version of the N-API this "module", "class" or "method" is available within Node.js | `napiVersion: 1` | `N-API version: 1` | Yes | + +**Note:** The `changes` field gets prepended with the `added`, `deprecated` and +`removed` fields if they exist. The table only gets generated if a `changes` +field exists. In the new tooling only "added" is prepended for now. + +#### `buildToc` + +**New Tooling:** This feature is natively available within the new tooling +through MDX. + +This method generates the Table of Contents based on all the Headings of the +Markdown file. + +#### `altDocs` + +**New Tooling:** All features within this method are available within the new +tooling. + +This method generates a version picker for the current page to be shown in older +versions of the API docs. + +### `json.mjs` + +This file is responsible for generating a JSON object that (supposedly) is used +for IDE-Intellisense or for indexing of all the "methods", "classes", "modules", +"events", "constants" and "globals" available within a certain Markdown file. + +It attempts a best effort extraction of the data by using several regular +expression patterns (ReGeX). + +**Note:** JSON output generation is currently not supported by the new tooling, +but it is in the pipeline for development. + +#### `jsonAPI` + +This method traverses all the AST Nodes by iterating through each one of them +and infers the kind of information each node contains through ReGeX. Then it +mutate the data and appends it to the final JSON object. + +For a more in-depth information we recommend to refer to the `json.mjs` file as +it contains a lot of comments.