Skip to content

Commit

Permalink
Cardano Tracer documentation - postprocessing
Browse files Browse the repository at this point in the history
  • Loading branch information
mgmeier committed Feb 5, 2025
1 parent 93d6e17 commit de5d5c2
Showing 1 changed file with 34 additions and 39 deletions.
73 changes: 34 additions & 39 deletions docs/get-started/cardano-node/new-tracing-system/cardano-tracer.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,17 +35,17 @@ keywords: [Tracing, cardano-tracer, trace-dispatch, new tracing system, monitori

Previously, the node handled all the logging by itself. It provides two web-servers for application monitoring: Prometheus and EKG.

`cardano-tracer` is a result of _moving_ all the logging/monitoring-related stuff from the node to a separate service. As a result, the node became smaller, faster, and simpler.
`cardano-tracer` is a result of _moving_ all the logging/monitoring-related stuff from the node to a separate service. As a result, the node becomes smaller, faster, and simpler (once the current system is deprecated).

### Overview

You can think of Cardano node as a **producer** of logging and monitoring information, and `cardano-tracer` as a **consumer** of this information. After a network connection between them is established, `cardano-tracer` periodically asks for such information, and the node replies with it.

There are 3 such kinds of information:

1. **Trace object**, contain logging data. `cardano-tracer` periodically queries for new trace objects, receives them and stores them in the log files and/or in Linux `systemd`'s journal.
2. **EKG metric**, contain system metrics. [Consult the EKG documentation](https://hackage.haskell.org/package/ekg-core) for more info. `cardano-tracer` periodically queries for new EKG metrics, receives them and displays them using monitoring tools.
3. **Data points**, contain arbitrary information about the node. `cardano-tracer` does not poll periodically for new data points, only by _explicit_ request when it needs it.
1. **Trace object**, contains logging data. `cardano-tracer` periodically queries for new trace objects, receives them and stores them in the log files and/or in Linux `systemd`'s journal.
2. **EKG metric**, contains system metrics. [Consult the EKG documentation](https://hackage.haskell.org/package/ekg-core) for more info. `cardano-tracer` periodically queries for new EKG metrics, receives them and displays them using monitoring tools.
3. **Data point**, contains arbitrary information about the node. `cardano-tracer` does not poll periodically for new data points, only by _explicit_ request when it needs it.

`cardano-tracer` can work as an aggregator as well: _one_ `cardano-tracer` process can receive the information from _multiple_ nodes.

Expand Down Expand Up @@ -106,7 +106,7 @@ Backends can be a combination of `Forwarder`, `EKGBackend`, and one of `Stdout M
}
```

For explanations of the trace forwarder option refer to the following document:
The `TraceOptionForwarder` values can be given on the node's CLI instead, as is the case in the Quickstart example. For further node-side configuration explanations, refer to:

[New Tracing Quickstart](docs/get-started/cardano-node/new-tracing-system/quick-start.md)

Expand Down Expand Up @@ -279,9 +279,9 @@ As you see, the tag in `network` field is `ConnectTo` now, which means that `car

It is **highly recommended** to use `AcceptAt` for easier maintainance. Use `ConnectTo` only if you really need it.

`AcceptTo` and `ConnectTo` are mirrored by the reciprocal option on the node `TracerSocketPathAccept`/`TracerSocketPathAccept`. If you choose one on the node, you choose the opposite on the tracer. This only makes a difference to which entity initiates the handshake; after the handshake the configuration is identical.
`AcceptTo` and `ConnectTo` are mirrored by the reciprocal option on the node `--tracer-socket-path-connect` / `--tracer-socket-path-accept`. If you choose one on the node, you choose the opposite on the tracer. This only makes a difference to which entity initiates the handshake; after the handshake the configuration is identical.

Suppose you have 3 working nodes, and they are connected to the same `cardano-tracer`. And then you want to connect 4-th node to it. If `cardano-tracer` is configured using `AcceptAt`, you shouldn't change its configuration - you just connect your 4-th node to it. But if `cardano-tracer` is configured using `ConnectTo`, you should add path to 4-th socket in its configuration file and then restart `cardano-tracer` process.
Suppose you have 3 working nodes, and they are connected to the same `cardano-tracer`. And then you want to connect a 4th node to it. If `cardano-tracer` is configured using `AcceptAt`, you don't need to change its configuration - you just connect the additional node to it. But if `cardano-tracer` is configured using `ConnectTo`, you'll need to add a 4th socket path to its configuration file and restart the `cardano-tracer` process.

### Network Magic

Expand All @@ -291,11 +291,11 @@ The value from the example above, `764824073`, is taken from the Shelley genesis

### Requests

The optional field `loRequestNum` specifies the number of log items that will be requested from the node. For example, if `loRequestNum` is `10`, `cardano-tracer` will constantly ask 10 log items in one request. This value is useful for reducing the network traffic: it is possible to ask 50 log items in one request or ask them in 50 requests one at a time. `loRequestNum` is the maximum number of requests, if there are fewer log items they will be returned immediately. For example, if `cardano-tracer` asks 50 log items but the node has only 40 log items _in this moment of time_, these 40 items will be returned, there is no waiting for additional 10 items.
The optional field `loRequestNum` specifies the number of log items that will be requested from the node. For example, if `loRequestNum` is `10`, `cardano-tracer` will periodically ask 10 log items in one request. This value is useful for fine-tuning network traffic: it is possible to ask 50 log items in one request, or ask them in 50 requests one at a time. `loRequestNum` is the *maximum* number of log items. For example, if `cardano-tracer` asks 50 log items but the node has only 40 log items _in this moment of time_, these 40 items will be returned, the request won't block to wait for additional 10 items.

The optional field `ekgRequestFreq` specifies the period of how often EKG metrics will be requested, in seconds. For example, if `ekgRequestFreq` is `1`, `cardano-tracer` will ask for new EKG metrics every second. There is no limit as `loRequestNum`, so every request returns _all_ the metrics the node has _in this moment of time_.
The optional field `ekgRequestFreq` specifies the period of how often EKG metrics will be requested, in seconds. For example, if `ekgRequestFreq` is `10`, `cardano-tracer` will ask for new EKG metrics every ten seconds. There is no limit as `loRequestNum`, so every request returns _all_ the metrics the node has _in this moment of time_.

There are default values for `loRequestNum` and `ekgRequestFreq`, so if you are not sure - remove these fields from your configuration file to use default values.
The reliable default values are `loRequestNum: 100` and `ekgRequestFreq: 1`, which will be used when these fields are left out of your configuration file.

### Logging

Expand Down Expand Up @@ -362,11 +362,12 @@ An optional field `rotation` describes parameters for log rotation. If you skip

The field `rpFrequencySecs` specifies rotation period, in seconds. In this example, `rpFrequencySecs` is `30`, which means that rotation check will be performed every 30 seconds.

The field `rpLogLimitBytes` specifies the maximum size of the log file, in bytes. In this example, `rpLogLimitBytes` is `50000`, which means that once the size of the current log file is 50 KB, the new log file will be created.
The field `rpLogLimitBytes` specifies the maximum size of the log file, in bytes. In this example, `rpLogLimitBytes` is `50000`, which means that once the size of the current log file is 50 KB, a new log file will be created.

The field `rpKeepFilesNum` specifies the number of the log files that will be kept. In this example, `rpKeepFilesNum` is `3`, which means that 3 _last_ log files will always be kept.
The field `rpKeepFilesNum` specifies the number of log files that will be kept. In this example, `rpKeepFilesNum` is `3`, which means that 3 _last_ log files will always be kept.

The fields `rpMaxAgeMinutes`, `rpMaxAgeHours` specify the lifetime of the log file, in minutes, or hours. If both fields are specified, `rpMaxAgeMinutes` takes precedence. In this example, `rpMaxAgeHours` is `1`, which means that each log file will be kept for 1 hour only. After that, the log file is considered outdated. N _last_ log files (specified by `rpKeepFilesNum`) will be kept even if they are outdated. All other outdated files will be deleted by `cardano-tracer`.

The fields `rpMaxAgeMinutes`, `rpMaxAgeHours` specify the lifetime of the log file, in minutes, or hours. In this example, `rpMaxAgeHours` is `1`, which means that each log file will be kept for 1 hour only. After that, the log file is treated as outdated and will be deleted. N _last_ log files (specified by `rpKeepFilesNum`) will be kept even if they are outdated. If both fields are specified, `rpMaxAgeMinutes` takes precedence.

### Prometheus

Expand All @@ -379,14 +380,16 @@ The optional field `hasPrometheus` specifies the host and port of the web page w
}
```

Here the web page is available at `http://127.0.0.1:3000`. If you skip this field, the web page will not be available.
Here the web page is available at `http://127.0.0.1:3000`. If you skip this field, no Prometheus endpoint will be started.

After you open `http://127.0.0.1:3000` in your browser, you will see the list of identifiers of connected nodes (or the warning message, if there are no connected nodes yet), for example:

```
* KindStar_Just 3001
* KindStar_3001
```

This identifier corresponds to the `TraceOptionNodeName` in the node config, or the fallback `<hostname>_<node port>` if no such value is provided.

Each identifier is a hyperlink to the page where you will see the **current** list of metrics received from the corresponding node, in such a format:

```
Expand All @@ -406,19 +409,11 @@ rts_gc_num_bytes_usage_samples 4
remainingKESPeriods_int 62
# TYPE rts_gc_bytes_copied counter
rts_gc_bytes_copied 17114384
# TYPE nodeCannotForge_int gauge
```

### EKG Monitoring

At top-level route `/` EKG gives a list of connected nodes.
That page from the example can of course be directly accessed by `http://127.0.0.1:3000/kindstar-3001`.

The responses are either human-readable names (HTML) with clickable
links, or JSON mapping from connected node names to relative URLs,
depending on desired content type (`Accept:` header of the request).

The routes dynamically depend on the connected nodes, the node names
are [sluggified](https://hackage.haskell.org/package/slugify).
### EKG Monitoring

The optional field `hasEKG` specifies the host and port of the web
page with EKG metrics. For example:
Expand All @@ -430,38 +425,38 @@ page with EKG metrics. For example:
}
```

With this example, the list of clickable identifiers of connected
nodes will be available at `http://127.0.0.1:3100`, such as:
Just as with Prometheus, the root path `/` on EKG shows a list of connected nodes. The response is either human-readable names (HTML) with clickable
links, or JSON mapping from connected node names to relative URLs,
depending on desired content type (`Accept:` header of the request).

With a specified node name, on the node configuration:
The URL routes dynamically depend on the connected nodes _in this moment of time_; the node names
are [sluggified](https://hackage.haskell.org/package/slugify).

For a node with a specified name in its configuration:

```
{
TraceOptionNodeName: "foo-node"
}
```

Another node with

Another connection that does not specify a node name is left with a
fallback name which consists of the system's hostname and the node's
port number the links get rendered the following way:
and another connection that does not specify a node name, the list of clickable identifiers of connected
nodes will be available at `http://127.0.0.1:3100` as:

```
* foo-node
* KindStar_Just 3001
* KindStar_3001
```
Just as with Prometheus, the fallback for `TraceOptionNodeName` is `<hostname>_<node port>`.

Clicking an identifier will take you to its monitoring page. Clicking
on `foo-node` (`http://localhost:3100/foo-node`) and `KindStar_Just
3001` (`127.0.0.1:3100/kindstar-just-3001`) takes you to the
respective monitoring metrics.
on `foo-node` (`http://localhost:3100/foo-node`) and `KindStar_3001` (`127.0.0.1:3100/kindstar-3001`) takes you to the
respective metrics monitoring.

Sending a HTTP GET request with a JSON Accept header gives the metrics
of an identifier as JSON. `jq '.'` pretty-prints the JSON object.

```
$ curl --silent -H 'Accept: application/json' '127.0.0.1:3100/kindstar-just-3001' | jq '.'
$ curl --silent -H 'Accept: application/json' '127.0.0.1:3100/kindstar-3001' | jq '.'
{
"Mem": {
"resident_int": {
Expand Down

0 comments on commit de5d5c2

Please sign in to comment.