Skip to content

Commit

Permalink
Some final edits and release notes
Browse files Browse the repository at this point in the history
Fixes #30
  • Loading branch information
Frostman committed Oct 24, 2024
1 parent ccefb54 commit d44eff2
Show file tree
Hide file tree
Showing 9 changed files with 194 additions and 119 deletions.
21 changes: 13 additions & 8 deletions docs/getting-started/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,32 +14,37 @@ docker login ghcr.io

## Downloading the software

The main entry point for the software is the Hedgehog Fabricator CLI named `hhfab`. All software is published into the
OCI registry [GitHub Package](https://ghcr.io) including binaries, container images, or Helm charts.
Download the latest stable `hhfab` binary from the [GitHub Package](https://ghcr.io) using the following command:
The main entry point for the software is the Hedgehog Fabricator CLI named `hhfab`.

Currently `hhfab` is supported on Linux x86/arm64 (tested on Ubuntu 22.04) and MacOS x86/arm64 for building
installers/upgraders. It may work on Windows WSL2 (with Ubuntu), but it's not tested. For running VLAB only Linux x86
is currently supported.

All software is published into the OCI registry [GitHub Package](https://ghcr.io) including binaries, container images, or Helm charts.
Download the latest stable `hhfab` binary from the [GitHub Package](https://ghcr.io) using the following command, it requires ORAS to be installed (see below):

```bash
curl -fsSL https://i.hhdev.io/hhfab | bash
```

Or download a specific version using the following command:
Or download a specific version (e.g. beta-1) using the following command:

```bash
curl -fsSL https://i.hhdev.io/hhfab | VERSION=alpha-X bash
curl -fsSL https://i.hhdev.io/hhfab | VERSION=beta-1 bash
```

Use the `VERSION` environment variable to specify the version of the software to download. By default, the latest
Use the `VERSION` environment variable to specify the version of the software to download. By default, the latest stable
release is downloaded. You can pick a specific release series (e.g. `alpha-2`) or a specific release.

### Installing ORAS

The download script requires [ORAS](https://oras.land/) to be installed. ORAS is used to download the binary from the
OCI registry and can be installed using following command:

```bash
curl -fsSL https://i.hhdev.io/oras | bash
```

Currently only Linux x86 is supported for running `hhfab`.

## Next steps

* [Concepts](../concepts/overview.md)
Expand Down
31 changes: 29 additions & 2 deletions docs/install-upgrade/build-wiring.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
A wiring diagram is a YAML file that is a digital representation of your network. You can find more YAML level details in the User Guide section [switch features and port naming](../user-guide/profiles.md) and the [api](../reference/api.md). It's mandatory for all switches to reference a `SwitchProfile` in the `spec.profile` of the `Switch` object. Only port naming defined by switch profiles could be used in the wiring diagram, NOS (or any other) port names aren't supported.

In the meantime, to have a look at working wiring diagram for Hedgehog Fabric, run the sample generator that produces
VLAB-compatible wiring diagrams:
working wiring diagrams:

```console
ubuntu@sl-dev:~$ hhfab sample -h
Expand All @@ -28,6 +28,33 @@ OPTIONS:
--help, -h show help
```

Or you can generate a wiring diagram for a VLAB environment with flags to customize number of switches, links, servers, etc.:

```console
ubuntu@sl-dev:~$ hhfab vlab gen --help
NAME:
hhfab vlab generate - generate VLAB wiring diagram

USAGE:
hhfab vlab generate [command options]

OPTIONS:
--bundled-servers value number of bundled servers to generate for switches (only for one of the second switch in the redundancy group or orphan switch) (default: 1)
--eslag-leaf-groups value eslag leaf groups (comma separated list of number of ESLAG switches in each group, should be 2-4 per group, e.g. 2,4,2 for 3 groups with 2, 4 and 2 switches)
--eslag-servers value number of ESLAG servers to generate for ESLAG switches (default: 2)
--fabric-links-count value number of fabric links if fabric mode is spine-leaf (default: 0)
--help, -h show help
--mclag-leafs-count value number of mclag leafs (should be even) (default: 0)
--mclag-peer-links value number of mclag peer links for each mclag leaf (default: 0)
--mclag-servers value number of MCLAG servers to generate for MCLAG switches (default: 2)
--mclag-session-links value number of mclag session links for each mclag leaf (default: 0)
--no-switches do not generate any switches (default: false)
--orphan-leafs-count value number of orphan leafs (default: 0)
--spines-count value number of spines if fabric mode is spine-leaf (default: 0)
--unbundled-servers value number of unbundled servers to generate for switches (only for one of the first switch in the redundancy group or orphan switch) (default: 1)
--vpc-loopbacks value number of vpc loopbacks for each switch (default: 0)
```

### Sample Switch Configuration
```yaml
apiVersion: wiring.githedgehog.com/v1beta1
Expand Down Expand Up @@ -64,7 +91,7 @@ A connection represents the physical wires in your data center. They connect swi
#### Server Connections
A server connection is a connection used to connect servers to the fabric. The fabric will configure the server-facing port according to the type of the connection (MLAG, Bundle, etc).The configuration of the actual server needs to be done by the server administrator. The server name is not validated by the fabric and is used as metadata to identify the connection. A server connection can be one of:
A server connection is a connection used to connect servers to the fabric. The fabric will configure the server-facing port according to the type of the connection (MLAG, Bundle, etc). The configuration of the actual server needs to be done by the server administrator. The server port names are not validated by the fabric and used as metadata to identify the connection. A server connection can be one of:
- *Unbundled* - A single cable connecting switch to server.
- *Bundled* - Two or more cables going to a single switch, a LAG or similar.
Expand Down
130 changes: 68 additions & 62 deletions docs/install-upgrade/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,21 @@ For a VLAB user, the typical workflow with hhfab is:

1. `hhfab init --dev`
1. `hhfab vlab gen`
1. `hhfab vlab up --kill-stale`
1. `hhfab vlab up`

The above workflow will get a user up and running with a spine-leaf VLAB. The `--kill-stale` option is supplied as its harmless on the first run and stops a lot of problems from happening with an successive run.
The above workflow will get a user up and running with a spine-leaf VLAB.

### HHFAB for Physical Machines

It's possible to start from scratch:

1. `hhfab init` (see different flags to cusomize initial configuration)
1. Adjust the `fab.yaml` file to your needs
1. `hhfab validate`
1. `hhfab build`

Or import existing config and wiring files:

1. `hhfab init -c fab.yaml -w wiring-file.yaml -w extra-wiring-file.yaml`
1. `hhfab validate`
1. `hhfab build`
Expand All @@ -25,65 +34,9 @@ After the above workflow a user will have a .img file suitable for installing th

## Fab.yaml

The fabric YAML object has 4 objects:

- `mode` - either `spine-leaf` or `collapsed-core`
- `includeONIE` - defaults to `true`
- `defaultSwitchUsers` - the admin and operator credentials for SONiC.
- `defaultAlloyConfig` - the configuration details for telemetry of switch information

### Forward switch metrics and logs

There is an option to enable Grafana Alloy on all switches to forward metrics and logs to the configured targets using
Prometheus Remote-Write API and Loki API. If those APIs are available from Control Node(s), but not from the switches,
it's possible to enable HTTP Proxy on Control Node(s) that will be used by Grafana Alloy running on the switches to
access the configured targets. It could be done by passing `--control-proxy=true` to `hhfab init`.

Metrics includes port speeds, counters, errors, operational status, transceivers, fans, power supplies, temperature
sensors, BGP neighbors, LLDP neighbors, and more. Logs include agent logs.
### Configure control node and switch users

Configuring the exporters and targets is currently only possible by editing the `fab.yaml` configuration file. An example configuration is provided below:

```yaml
spec:
config:
...
defaultAlloyConfig:
agentScrapeIntervalSeconds: 120
unixScrapeIntervalSeconds: 120
unixExporterEnabled: true
lokiTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all logs
env: env-1
url: https://logs-prod-021.grafana.net/loki/api/v1/push
useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy
prometheusTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all metrics
env: env-1
sendIntervalSeconds: 120
url: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/push
useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy
unixExporterCollectors: # list of node-exporter collectors to enable, https://grafana.com/docs/alloy/latest/reference/components/prometheus.exporter.unix/#collectors-list
- cpu
- filesystem
- loadavg
- meminfo
collectSyslogEnabled: true # collect /var/log/syslog on switches and forward to the lokiTargets
```
For additional options, see the `AlloyConfig` [struct in Fabric repo](https://github.com/githedgehog/fabric/blob/master/api/meta/alloy.go).

### Configure switch users

Configuring switch users is done either passing `--default-password-hash` to `hhfab init` or editing the resulting `fab.yaml` file emitted by `hhfab init`. You can specify users to be configured on the switches in the following format:
Configuring control node and switch users is done either passing `--default-password-hash` to `hhfab init` or editing the resulting `fab.yaml` file emitted by `hhfab init`. You can specify users to be configured on the control node(s) and switches in the following format:

```yaml
spec:
Expand All @@ -96,8 +49,8 @@ spec:

fabric:
mode: spine-leaf # "spine-leaf" or "collapsed-core"
defaultSwitchUsers:

defaultSwitchUsers:
admin: # at least one user with name 'admin' and role 'admin'
role: admin
#password: "$5$8nAYPGcl4..." # password hash
Expand All @@ -110,6 +63,9 @@ spec:
# - "ssh-ed25519 AAAAC3Nza..."

```

Control node(s) user is always named `core`.

The role of the user,`operator` is read-only access to `sonic-cli` command on the switches. In order to avoid conflicts, do not use the following usernames: `operator`,`hhagent`,`netops`.

### NTP and DHCP
Expand All @@ -136,7 +92,57 @@ spec:
```
The **management** interface is for the control node to manage the fabric switches, *not* end-user management of the control node. For end-user management of the control node specify the **external** interface name.

### Forward switch metrics and logs

There is an option to enable Grafana Alloy on all switches to forward metrics and logs to the configured targets using
Prometheus Remote-Write API and Loki API. If those APIs are available from Control Node(s), but not from the switches,
it's possible to enable HTTP Proxy on Control Node(s) that will be used by Grafana Alloy running on the switches to
access the configured targets. It could be done by passing `--control-proxy=true` to `hhfab init`.

Metrics includes port speeds, counters, errors, operational status, transceivers, fans, power supplies, temperature
sensors, BGP neighbors, LLDP neighbors, and more. Logs include agent logs.

Configuring the exporters and targets is currently only possible by editing the `fab.yaml` configuration file. An example configuration is provided below:

```yaml
spec:
config:
...
defaultAlloyConfig:
agentScrapeIntervalSeconds: 120
unixScrapeIntervalSeconds: 120
unixExporterEnabled: true
lokiTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all logs
env: env-1
url: https://logs-prod-021.grafana.net/loki/api/v1/push
useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy
prometheusTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all metrics
env: env-1
sendIntervalSeconds: 120
url: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/push
useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy
unixExporterCollectors: # list of node-exporter collectors to enable, https://grafana.com/docs/alloy/latest/reference/components/prometheus.exporter.unix/#collectors-list
- cpu
- filesystem
- loadavg
- meminfo
collectSyslogEnabled: true # collect /var/log/syslog on switches and forward to the lokiTargets
```
For additional options, see the `AlloyConfig` [struct in Fabric repo](https://github.com/githedgehog/fabric/blob/master/api/meta/alloy.go).

## Complete Example File

```yaml
apiVersion: fabricator.githedgehog.com/v1beta1
kind: Fabricator
Expand Down
14 changes: 7 additions & 7 deletions docs/install-upgrade/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@

## Prerequisites

* A machine with access to the Internet to use Fabricator and build installer
* An 8 GB USB flash drive, if you are not using virtual media
* A machine with access to the Internet to use Fabricator and build installer with at least 8 GB RAM and 25 GB of disk space
* An 16 GB USB flash drive, if you are not using virtual media
* Have a machine to function as the Fabric Control Node. [System Requirements](./requirements.md) as well as IPMI access to it to install
the OS.
* A management switch with at least 1 10GbE port
* A management switch with at least 1 10GbE port is recommended
* Enough [Supported Switches](./supported-devices.md) for your Fabric

## Overview of Install Process

This section is dedicated to the Hedgehog Fabric installation on bare-metal control node(s) and switches, their
preparation and configuration. To install the vlab see [Vlab Overview](../vlab/overview.md).
preparation and configuration. To install the VLAB see [VLAB Overview](../vlab/overview.md).

Download and install `hhfab` following instructions from the [Download](../getting-started/download.md) section.

Expand All @@ -32,7 +32,7 @@ The main steps to install Fabric are:
1. Connect management switch to Fabric control node
1. Connect 1GbE Management port of switches to management switch
1. Prepare supported switches
1. Ensure switch serial numbers and / or management interface mac addresses are recorded in wiring diagram
1. Ensure switch serial numbers and / or first management interface MAC addresses are recorded in wiring diagram
1. Boot them into ONIE Install Mode to have them automatically provisioned

## Build Control Node configuration and Installer
Expand Down Expand Up @@ -63,7 +63,7 @@ There are utilities that assist this process such as [etcher](https://etcher.bal

## Install Control Node

This control node should be given a static IP address. Either a lease or statically assigned.
This control node should be given a static IP address. Either a lease or statically assigned.

1. Configure the server to use UEFI boot **without** secure boot

Expand Down Expand Up @@ -93,7 +93,7 @@ The control node is dual-homed. It has a 10GbE interface that connects to the ma

Now that the install has finished, you can start interacting with the Fabric using `kubectl`, `kubectl fabric` and `k9s`, all pre-installed as part of the Control Node installer.

At this stage, the fabric hands out DHCP addresses to the switches via the management network. Optionally, you can monitor this process by going through the following steps:
At this stage, the fabric hands out DHCP addresses to the switches via the management network. Optionally, you can monitor this process by going through the following steps:
- enter `k9s` at the command prompt
- use the arrow keys to select the pod named `fabric-boot`
- the logs of the pod will be displayed showing the DHCP lease process
Expand Down
Loading

0 comments on commit d44eff2

Please sign in to comment.