diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 964be60..9c72409 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -21,4 +21,8 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 + - uses: taiki-e/install-action@v2 + with: + tool: mdbook + fallback: none - run: ./ci.sh diff --git a/ci.sh b/ci.sh index 915d949..700148f 100755 --- a/ci.sh +++ b/ci.sh @@ -13,3 +13,8 @@ cd ./source/river cargo run -p river -- --config-toml ./assets/example-config.toml --validate-configs cargo run -p river -- --config-toml ./assets/test-config.toml --validate-configs cargo run -p river -- --config-kdl ./assets/test-config.kdl --validate-configs +cd ../../ + +# ensure the user manual can be built +cd user-manual +mdbook build diff --git a/user-manual/.gitignore b/user-manual/.gitignore new file mode 100644 index 0000000..7585238 --- /dev/null +++ b/user-manual/.gitignore @@ -0,0 +1 @@ +book diff --git a/user-manual/book.toml b/user-manual/book.toml new file mode 100644 index 0000000..10d12f3 --- /dev/null +++ b/user-manual/book.toml @@ -0,0 +1,6 @@ +[book] +authors = ["James Munns"] +language = "en" +multilingual = false +src = "src" +title = "River User Manual" diff --git a/user-manual/src/SUMMARY.md b/user-manual/src/SUMMARY.md new file mode 100644 index 0000000..41a5dd6 --- /dev/null +++ b/user-manual/src/SUMMARY.md @@ -0,0 +1,11 @@ +# Summary + +- [Introduction](./intro.md) +- [Installation](./install.md) +- [Core Concepts](./concepts/mod.md) +- [Configuration](./config/mod.md) + - [Command Line Interface](./config/cli.md) + - [Environment Variables](./config/env.md) + - [Configuration File (KDL)](./config/kdl.md) + - [Configuration File (TOML)](./config/toml.md) +- [Hot Reloading](./reloading.md) diff --git a/user-manual/src/concepts/mod.md b/user-manual/src/concepts/mod.md new file mode 100644 index 0000000..2a34516 --- /dev/null +++ b/user-manual/src/concepts/mod.md @@ -0,0 +1,106 @@ +# Core Concepts + +River is a Reverse Proxy application. + +It is intended to handle connections from **Downstream** clients, forward +**Requests** to **Upstream** servers, and then forward **Responses** from +the **Upstream** servers back to the **Downstream** clients. + +```text +┌────────────┐ ┌─────────────┐ ┌────────────┐ +│ Downstream │ ┌ ─│─ Proxy ┌ ┼ ─ │ Upstream │ +│ Client │─────────▶│ │ │──┼─────▶│ Server │ +└────────────┘ │ └───────────┼─┘ └────────────┘ + ─ ─ ┘ ─ ─ ┘ + ▲ ▲ + ┌──┘ └──┐ + │ │ + ┌ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ + Listeners Connectors│ + └ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─ +``` + +For the purpose of this guide, we define **Requests** as messages sent +from the downstream client to the upstream server, and define **Responses** +as messages sent from the upstream server to the downstream client. + +River is capable of handling connections, requests, and responses from +numerous downstream clients and upstream servers simultaneously. + +When proxying between a downstream client and upstream server, River +may modify or block requests or responses. Examples of modification +include the removal or addition of HTTP headers of requests or responses, +to add internal metadata, or to remove sensitive information. Examples +of blocking include the rejection of requests for authentication or +rate limiting purposes. + +## Services + +River is oriented around the concept of **Services**. **Services** are +composed of three major elements: + +* **Listeners** - the sockets used to accept incoming connections from + downstream clients +* **Connectors** - the listing of potential upstream servers that requests + may be forwarded to +* **Path Control Options** - the modification or filtering settings used + when processing requests or responses. + +Services are configured independently from each other. This allows a single +instance of the River application to handle the proxying of multiple different +kinds of traffic, and to apply different rules when proxying these different +kinds of traffic. + +Each service also creates its own pool of worker threads, in order to allow for +the operating system to provide equal time and resources to each Service, +preventing one highly loaded Service from starving other Services of resources +such as memory and CPU time. + +## Listeners + +Listeners are responsible for accepting incoming connections and requests +from downstream clients. Each listener is a single listening socket, for +example listening to IPv4 traffic on address `192.168.10.2:443`. + +Listeners may optionally support the establishment and termination of TLS. +They may be configured with a TLS certificate and [SNI], allowing them +to securely accept traffic sent to a certain domain name, such as +`https://example.com`. + +[SNI]: https://www.cloudflare.com/en-gb/learning/ssl/what-is-sni/ + +Unlike some other reverse proxy applications, in River, a given listener +is "owned" by a single service. This means that multiple services may not +be listening to the same address and port. Traffic received by a given +Listener will always be processed by the same Service for the duration +of time that the River application is running. + +Listeners are configured "statically": they are set in the configuration +file loaded at the start of the River application, and are constant for +the time that the River application is running. + +## Connectors + +Connectors are responsible for the communication between the Service and +the upstream server(s). + +Connectors manage a few important tasks: + +* Allowing for Service Discovery, changing the set up potential upstream servers over time +* Allowing for Health Checks, selectively enabling and disabling which upstream servers + are eligible for proxying +* Load balancing of proxied requests across multiple upstream servers +* Optionally establishing secure TLS connections to upstream servers +* Maintaining reusable connections to upstream servers, to reduce the cost of connection + and proxying + +Similar to Listeners, each Service maintains its own unique set of Connectors. However, +Services may have overlapping sets of upstream servers, each of them considering an +upstream server in the list of proxy-able servers in their own connectors. This allows +multiple services to proxy to the same upstream servers, but pooled connections and +other aspects managed by Connectors are not shared across Services. + +## Path Control + +Path Control allows for configurable filtering and modification of requests and +responses at multiple stages of the proxying process. diff --git a/user-manual/src/config/cli.md b/user-manual/src/config/cli.md new file mode 100644 index 0000000..985a92e --- /dev/null +++ b/user-manual/src/config/cli.md @@ -0,0 +1,79 @@ +# Command Line Interface + +```text +River: A reverse proxy from Prossimo + +Usage: river [OPTIONS] + +Options: + --validate-configs + Validate all configuration data and exit + --config-toml + Path to the configuration file in TOML format + --config-kdl + Path to the configuration file in KDL format + --threads-per-service + Number of threads used in the worker pool for EACH service + --daemonize + Should the server be daemonized after starting? + --upgrade + Should the server take over an existing server? + --upgrade-socket + Path to upgrade socket + --pidfile + Path to the pidfile, used for upgrade + -h, --help + Print help +``` + +## `--validate-configs` + +Running River with this option will validate the configuration, and immediately exit +without starting any Services. A non-zero return code will be given when the configuration +fails validation. + +## `--config-toml ` + +Running River with this option will instruct River to load the configuration file from +the provided path. Cannot be used with `--config-kdl`. + +## `--config-kdl ` + +Running River with this option will instruct River to load the configuration file from +the provided path. Cannot be used with `--config-toml`. + +## `--threads-per-service ` + +Running River with this option will instruct River to use the given number of worker +threads per service. + +## `--daemonize` + +Running River with this option will cause River to fork after the creation of all +Services. The application will return once all Services have been started. + +If this option is not provided, the River application will run until it is commanded +to stop or a fatal error occurs. + +## `--upgrade` + +Running River with this option will cause River to take over an existing River +server's open connections. See [Hot Reloading] for more information about this. + +[Hot Reloading]: ../reloading.md + +## `--upgrade-socket ` + +Running River with this option will instruct River to look at the provided socket +path for receiving active Listeners from the currently running instance. + +This must be an absolute path. This option only works on Linux. + +See [Hot Reloading] for more information about this. + +## `--pidfile ` + +Running River with this option will set the path for the created pidfile when +the server is configured to daemonize. + +This must be an absolute path. diff --git a/user-manual/src/config/env.md b/user-manual/src/config/env.md new file mode 100644 index 0000000..b3577e9 --- /dev/null +++ b/user-manual/src/config/env.md @@ -0,0 +1,3 @@ +# Environment Variables + +TODO: We don't use any environment variables yet diff --git a/user-manual/src/config/kdl.md b/user-manual/src/config/kdl.md new file mode 100644 index 0000000..0c4ce9b --- /dev/null +++ b/user-manual/src/config/kdl.md @@ -0,0 +1,201 @@ +# Configuration File (KDL) + +The primary configuration file format used by River uses the +[KDL Configuration Language](https://kdl.dev/). + +KDL is a language for describing structured data. + +There are currently two major sections used by River: + +## The `system` section + +Here is an example `system` configuration block: + +```kdl +system { + threads-per-service 8 + daemonize false + pid-file "/tmp/river.pidfile" + + // Path to upgrade socket + // + // NOTE: `upgrade` is NOT exposed in the config file, it MUST be set on the CLI + // NOTE: This has issues if you use relative paths. See issue https://github.com/memorysafety/river/issues/50 + // NOTE: The upgrade command is only supported on Linux + upgrade-socket "/tmp/river-upgrade.sock" +} +``` + +### `system.threads-per-service INT` + +This field configures the number of threads spawned by each service. This configuration +applies to all services. + +A positive, non-zero integer is provided as `INT`. + +This field is optional, and defaults to `8`. + +### `system.daemonize BOOL` + +This field configures whether River should daemonize. + +The values `true` or `false` is provided as `BOOL`. + +This field is optional, and defaults to `false`. + +If this field is set as `true`, then `system.pid-file` must also be set. + +### `system.pid-file PATH` + +This field configured the path to the created pidfile when River is configured +to daemonize. + +A UTF-8 absolute path is provided as `PATH`. + +This field is optional if `system.daemonize` is `false`, and required if +`system.daemonize` is `true`. + +### `system.upgrade-socket` + +This field configured the path to the upgrade socket when River is configured +to take over an existing instance. + +A UTF-8 absolute path is provided as `PATH`. + +This field is optional if the `--upgrade` flag is provided via CLI, and required if +`--upgrade` is not set. + +## The `services` section + +Here is an example `services` block: + +```kdl +services { + Example1 { + listeners { + "0.0.0.0:8080" + "0.0.0.0:4443" cert-path="./assets/test.crt" key-path="./assets/test.key" + } + connectors { + load-balance { + selection "Ketama" key="UriPath" + discovery "Static" + health-check "None" + } + "91.107.223.4:443" tls-sni="onevariable.com" + } + path-control { + upstream-request { + filter kind="remove-header-key-regex" pattern=".*(secret|SECRET).*" + filter kind="upsert-header" key="x-proxy-friend" value="river" + } + upstream-response { + filter kind="remove-header-key-regex" pattern=".*ETag.*" + filter kind="upsert-header" key="x-with-love-from" value="river" + } + } + } + Example3 { + listeners { + "0.0.0.0:9000" + "0.0.0.0:9443" cert-path="./assets/test.crt" key-path="./assets/test.key" + } + file-server { + // The base path is what will be used as the "root" of the file server + // + // All files within the root will be available + base-path "." + } + } +} +``` + +Each block represents a single service, with the name of the service serving as +the name of the block. + +### `services.$NAME` + +The `$NAME` field is a UTF-8 string, used as the name of the service. If the name +does not contain spaces, it is not necessary to surround the name in quotes. + +Examples: + +* `Example1` - Valid, "Example1" +* `"Example2"` - Valid, "Example2" +* `"Server One"` - Valid, "Server One" +* `Server Two` - Invalid (missing quotation marks) + +### `services.$NAME.listeners` + +This section contains one or more Listeners. +This section is required. +Listeners are specified in the form: + +`"SOCKETADDR" [cert-path="PATH" key-path="PATH"]` + +`SOCKETADDR` is a UTF-8 string that is parsed into an IPv4 or IPv6 address and port. + +If the listener should accept TLS connections, the certificate and key paths are +specified in the form `cert-path="PATH" key-path="PATH"`, where `PATH` is a UTF-8 +path to the relevant files. If these are not provided, connections will be accepted +without TLS. + +### `services.$NAME.connectors` + +This section contains one or more Connectors. +This section is required. +Connectors are specified in the form: + +`"SOCKETADDR" [tls-sni="DOMAIN"]` + +`SOCKETADDR` is a UTF-8 string that is parsed into an IPv4 or IPv6 address and port. + +If the connector should use TLS for connections to the upstream server, the TLS-SNI +is specified in the form `tls-sni="DOMAIN"`, where DOMAIN is a domain name. If this +is not provided, connections to upstream servers will be made without TLS. + +### `services.$NAME.connectors.load-balance` + +This section defines how load balancing properties are configured for the +connectors in this set. + +This section is optional. + +### `services.$NAME.connectors.load-balance.selection` + +This defines how the upstream server is selected. + +Options are: + +* `selection "RoundRobin"` + * Servers are selected in a Round Robin fashion, giving equal distribution +* `selection "Random"` + * Servers are selected on a random basis, giving a statistically equal distribution +* `selection "FNV" key="KEYKIND"` + * FNV hashing is used based on the provided KEYKIND +* `selection "Ketama" key="KEYKIND"` + * Stable Ketama hashing is used based on the provided KEYKIND + +Where `KEYKIND` is one of the following: + +* `UriPath` - The URI path is hashed +* `SourceAddrAndUriPath` - The Source address and URI path is hashed + +### `services.$NAME.path-control` + +This section contains the configuration for path control filters + +### `services.$NAME.file-server` + +This section is only allowed when `connectors` and `path-control` are not present. + +This is used when serving static files, rather than proxying connections. + +### `services.$NAME.file-server.base-path` + +This is the base path used for serving files. ALL files within this directory +(and any children) will be available for serving. + +This is specified in the form `base-path "PATH"`, where `PATH` is a valid UTF-8 path. + +This section is required. diff --git a/user-manual/src/config/mod.md b/user-manual/src/config/mod.md new file mode 100644 index 0000000..5acb52f --- /dev/null +++ b/user-manual/src/config/mod.md @@ -0,0 +1,56 @@ +# Configuration + +River has three sources of configuration: + +1. Command Line Options +2. Environment Variable Options +3. Configuration File Options + +When configuration options are available in multiple sources, priority is given in +the order specified above. + +## Configuration File Options + +The majority of configuration options are provided via configuration file, allowing +users of River to provide files as part of a regular deployment process. Currently, +all configuration of Services (and their Listener, Connector, and Path Control +options) are provided via configuration file. + +At the current moment, two configuration file formats are supported: + +* [KDL] - the current preferred format +* TOML - likely to be removed soon + +[KDL]: https://kdl.dev/ + +For more information about configuration parameters available, see +[The KDL Configuration Format] section for more details. + +[The KDL Configuration Format]: ./kdl.md + +## Environment Variable Options + +At the moment, there are no options configurable via environment variables. + +In the future, environment variables will be used for configuration of +"secrets", such as passwords used for basic authentication, or bearer tokens +used for accessing management pages. + +It is not expected that River will make all configuration options available +through environment variables, as highly structured configuration (e.g. for +Services) via environment variable requires complex and hard to reason about +logic to parse and implement. + +## Command Line Options + +A limited number of options are available via command line. These options +are intended to provide information such as the path to the configuration +file. + +It is not expected that River will make all configuration options available +through CLI. + +For more information about options that are available via Command Line +Interface, please refer to [The CLI Interface Format]. + +[The CLI Interface Format]: ./cli.md diff --git a/user-manual/src/config/toml.md b/user-manual/src/config/toml.md new file mode 100644 index 0000000..09737c3 --- /dev/null +++ b/user-manual/src/config/toml.md @@ -0,0 +1,3 @@ +# Configuration File (TOML) + +TODO: We're probably going to retire TOML configuration file support. diff --git a/user-manual/src/install.md b/user-manual/src/install.md new file mode 100644 index 0000000..a109f4a --- /dev/null +++ b/user-manual/src/install.md @@ -0,0 +1,16 @@ +# Installation + +Pre-compiled versions of River and installation instructions are provided on +the [Releases] page [on GitHub]. + +[Releases]: https://github.com/memorysafety/river/releases +[on GitHub]: https://github.com/memorysafety/river + +Currently, builds are provided for: + +* x86-64 Linux (GNU libc) +* x86-64 Linux (MUSL libc) +* aarch64 MacOS (M-series devices) + +The primary target is currently **x86-64 Linux (GNU libc)**. Other platforms may +not support all features, and are supported on a best-effort basis. diff --git a/user-manual/src/intro.md b/user-manual/src/intro.md new file mode 100644 index 0000000..e1de1f0 --- /dev/null +++ b/user-manual/src/intro.md @@ -0,0 +1,17 @@ +# Introduction + +This is the user/operator facing manual for the River reverse proxy application. + +River is a reverse proxy application under development, utilizing the `pingora` reverse proxy engine +from Cloudflare. It is written in the Rust language. It is configurable, allowing for options +including routing, filtering, and modification of proxied requests. + +River acts as a binary distribution of the `pingora` engine - providing a typical application +interface for configuration and customization for operators. + +The source code and issue tracker for River can be found [on GitHub] + +[on GitHub]: https://github.com/memorysafety/river + +For developer facing documentation, including project roadmap and feature requirements for the +1.0 release, please refer to the `docs/` folder [on GitHub]. diff --git a/user-manual/src/reloading.md b/user-manual/src/reloading.md new file mode 100644 index 0000000..56de6bb --- /dev/null +++ b/user-manual/src/reloading.md @@ -0,0 +1,56 @@ +# Hot Reloading + +River does not support changing most settings while the server is running. +In order to change the settings of a running instance of River, it is necessary to +launch a new instance of River. + +However, River does support "Hot Reloading" - the ability for a new instance of +River to take over the responsibilities of a currently executing server. + +From a high level view, this process looks like: + +1. The existing instance of River is running +2. A new instance of River is started, configured with "upgrade" enabled via the command line. + The new instance does not yet begin execution, and is waiting for a hand-over of Listeners + from the existing instance +3. A SIGQUIT signal is sent to the FIRST River instance, which causes it to stop accepting + new connections, and to transfer all active listening Listener file descriptors to the + SECOND River instance +4. The SECOND River instance begins listening to all Listeners, and operating normally +5. The FIRST River instance continues handling any currently active downstream connections, + until either all connections have closed, or until a timeout period is reached. If + the timeout is reached, all open connections are closed ungracefully. +6. At the end of the timeout period, the FIRST River instance exits. + +In most cases, this allows seamless hand over from the OLD instance of RIVER to the NEW +instance of River, without any interruption of service. As long as no connections are +longer-lived than the timeout period, then this hand-over will not be observable from +downstream clients. + +Once the SIGQUIT signal is sent, all new incoming connections will be handled by the +new instance of River. Existing connections will continue to be serviced by the old +instance until their connection has been closed. + +There are a couple moving pieces that are necessary for this process to occur: + +## pidfile + +When River is configured to be daemonized, it will create a pidfile containing its +process ID at the configured location. + +This file can be used to determine the process ID necessary for sending SIGQUIT to. + +When the second instance has taken over, the pidfile of the original instance +will be replaced with the pidfile of the new instance. + +In general, both instances of River should be configured with the same +pidfile path. + +## upgrade socket + +In order to facilitate the transfer of listening socket file descriptors from +one instance to another, a socket is used to transfer file descriptors. + +This transfer begins when the SIGQUIT signal is sent to the first process. + +Both instances of River MUST be configured with the same upgrade socket path.