Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial tracing docs #1731

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
157 changes: 92 additions & 65 deletions content/docs/reference/tracing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,109 +16,136 @@ import TabItem from '@theme/TabItem';

## Summary

Tracing tracks the progression of a single user request as it is handled by Pomerium.
Pomerium has comprehensive support for OpenTelemetry tracing, allowing detailed introspection into requests and authorization flows. You can use tracing to debug errors and latency issues in your applications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time we say OpenTelemetry, let's say "OpenTelemetry (OTel)" so later the OTel reference is clear.


Each unit of work is called a Span in a trace. Spans include metadata about the work, including the time spent in the step (latency), status, time events, attributes, links. You can use tracing to debug errors and latency issues in your applications, including in downstream connections.

## How to configure
## Configuration

<Tabs>
<TabItem value="Core" label="Core">

#### Shared Tracing Settings
### Environment Variables

The recommended way to configure tracing is by using the standard OpenTelemetry environment variables:

- [SDK environment variables](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#general-sdk-configuration)
- [OTLP exporter environment variables](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/)

The main variables used to configure tracing in Pomerium are the following:

| Name | Description | Default |
| :-- | :-- | :-- |
| [`OTEL_TRACES_EXPORTER`](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#exporter-selection) | Trace exporter to be used. <br/> Valid values are `"otlp"` or `"none"` | `"none"` |
| [`OTEL_EXPORTER_OTLP_ENDPOINT`](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#otel_exporter_otlp_endpoint) or <br/> [`OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#otel_exporter_otlp_traces_endpoint) | See [Endpoint Configuration](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#endpoint-configuration). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the description, include a short description or example values and then the external link that you have now is fine below that as "For more details, see the OTEL documentation.", but we shouldn't send people external for basic info.

| [`OTEL_EXPORTER_OTLP_PROTOCOL`](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#otel_exporter_otlp_protocol) or <br/> [`OTEL_EXPORTER_OTLP_TRACES_PROTOCOL`](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#otel_exporter_otlp_traces_protocol) | See [Protocol Configuration](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#protocol-configuration). <br/> Valid values are `"grpc"` or `"http/protobuf"`. <br/>If unset, Pomerium will attempt to determine the protocol based on the endpoint port number (the standard ports are 4317 for GRPC, 4318 for HTTP), otherwise it will default to `"http/protobuf"`. | (auto) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the external link below our own information, so users quickly get what they need, and if they need more, they keep reading and click the link.

| [`OTEL_TRACES_SAMPLER_ARG`](https://opentelemetry.io/docs/languages/sdk-configuration/general/#otel_traces_sampler_arg) | Sampling probability, a number in the \[0..1\] range, e.g. `1.0` (sample all traces) or `0.25` (sample 25% of traces) | `1.0` |

### Config file

Tracing can also be configured using the Pomerium config file if desired:

| Config Key | Equivalent Environment Variable |
| :-- | :-- |
| `tracing_provider` | [`OTEL_TRACES_EXPORTER`](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#exporter-selection) |
| `tracing_otlp_endpoint` | [`OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#otel_exporter_otlp_traces_endpoint) |
| `tracing_otlp_protocol` | [`OTEL_EXPORTER_OTLP_TRACES_PROTOCOL`](https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/#otel_exporter_otlp_traces_protocol) |
| `tracing_sample_rate` | [`OTEL_TRACES_SAMPLER_ARG`](https://opentelemetry.io/docs/languages/sdk-configuration/general/#otel_traces_sampler_arg) |

| Config Key | Description | Required |
| :-- | :-- | --- |
| tracing_provider | The name of the tracing provider. (e.g. Jaeger, Zipkin) | ✅ |
| tracing_sample_rate | Percentage of requests to sample in decimal notation. Default is `0.0001`, or .01% | ❌ |
</TabItem>
<TabItem value="Enterprise" label="Enterprise">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing a Zero tab.


Set `tracing_sample_rate = 1` if you want to see all requests in the tracings.
1. In the Enterprise Console, navigate to Settings > Tracing

#### Datadog
2. In the "Tracing Provider" dropdown, select "OTLP"

Datadog is a real-time monitoring system that supports distributed tracing and monitoring.
3. Enter your desired sample rate and OTLP endpoint

| Config Key | Description | Required |
| :-- | :-- | --- |
| tracing_datadog_address | `host:port` address of the Datadog Trace Agent. Defaults to `localhost:8126` | ❌ |
4. Optionally, enter a protocol ("grpc" or "http/protobuf"). If the endpoint uses port 4317 or 4318, the protocol will be selected automatically. Port 4317 is the standard for OTLP GRPC, and 4318 for OTLP HTTP.

#### Jaeger (partial)
![Enterprise tracing config](./img/tracing/tracing-otlp.png)

**Warning** At this time, the Jaeger protocol does not capture spans inside the Proxy Service. Please use the Zipkin protocol with Jaeger for full support.
</TabItem>
</Tabs>

[Jaeger](https://www.jaegertracing.io/) is a distributed tracing system released as open source by Uber Technologies. It is used for monitoring and troubleshooting microservices-based distributed systems, including:
## Examples

- Distributed context propagation
- Distributed transaction monitoring
- Root cause analysis
- Service dependency analysis
- Performance / latency optimization
### Using Jaeger to visualize trace data

| Config Key | Description | Required |
| :-- | :-- | --- |
| tracing_jaeger_collector_endpoint | Url to the Jaeger HTTP Thrift collector. | ✅ |
| tracing_jaeger_agent_endpoint | Send spans to jaeger-agent at this address. | ✅ |
[Jaeger](https://www.jaegertracing.io/) is a popular open-source tracing platform. It can be used to collect trace data and visualize it in the browser.

For quick local testing, use Jaeger all-in-one, which is an executable designed to launch the Jaeger UI, jaeger-collector, jaeger-query, and jaeger-agent, with an in-memory storage component.
1. Run Jaeger in all-in-one mode with Docker:

```bash
docker run -d --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-e COLLECTOR_OTLP_ENABLED=true \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
$ docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
-p 14269:14269 \
-p 9411:9411 \
jaegertracing/all-in-one:1.45

jaegertracing/jaeger:latest
```

Pomerium settings
2. Run Pomerium with OpenTelemetry environment variables set:

```yaml
tracing_provider: jaeger
tracing_jaeger_collector_endpoint: http://localhost:14268/api/traces
tracing_jaeger_agent_endpoint: localhost:6831
```bash
$ OTEL_TRACES_EXPORTER=otlp OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 pomerium --config path/to/your/config.yaml`
```

Open Jaeger UI at `http://localhost:16686` in the browser to view Pomerium traces.
3. Navigate to a Pomerium route defined in the config file

#### Zipkin
4. Open your browser to http://localhost:16686 to view traces in the Jaeger UI.

Zipkin is an open-source distributed tracing system and protocol.
### Tracing errors

Many tracing backends support Zipkin either directly or through intermediary agents, including Jaeger. For full tracing support, we recommend using the Zipkin tracing protocol.
A typo in the OAuth2 issuer URL configuration is a common mistake that can lead to unexpected errors. A user attempting to navigate to a Pomerium route that requires authentication might see an error page instead of being redirected to the Identity Provider. In the Jaeger UI, traces that contain errors are highlighted and easy to find:

| Config Key | Description | Required |
| :---------------------- | :------------------------------- | -------- |
| tracing_zipkin_endpoint | Url to the Zipkin HTTP endpoint. | ✅ |
![Jaeger trace list](./img/tracing/jaeger-trace-list-err.png)

</TabItem>
<TabItem value="Enterprise" label="Enterprise">
Clicking on this trace will show us the original unauthenticated request (`GET https://verify.localhost.pomerium.io/`) and that it was redirected to sign in. When attempting to initiate the auth flow, an error was encountered, which was recorded in the trace:

Configure **Tracing** in the Console:
![Jaeger error trace](./img/tracing/error-flow.png)

1. Select a **Tracing Provider** and set **Sample Tracing Rate**
Clicking on the span that recorded the error will show the error message - we are missing a trailing slash in the issuer URL!

![Select tracing provider](./img/tracing/tracing-providers.png)
### Tracing upstream applications

![Select tracing provider and sample rate](./img/tracing/default-tracing.png)
If upstream applications also have OpenTelemetry support, traces will propagate through Pomerium to those applications and the combined trace data will be visible.

2. Configure tracing **Endpoints**
#### Example: Grafana

![Set Jaeger endpoints](./img/tracing/jaeger-endpoints.png)
[Grafana](https://grafana.com/) is a good example of an upstream application that has tracing support and is easily integrated with Pomerium.

![Set Zipkin endpoint](./img/tracing/zipkin-endpoint.png)
The [Securing Grafana with Pomerium](../guides/grafana.mdx) guide can help you get started with a new Grafana deployment.

</TabItem>
</Tabs>
To enable OpenTelemetry traces in Grafana, set the environment variable `GF_TRACING_OPENTELEMETRY_OTLP_ADDRESS` to the same ip:port (without scheme) as the OTLP endpoint configured in Pomerium.

Alternatively, this can be set in the Grafana config file:

```ini
# grafana.ini
[tracing.opentelemetry.otlp]
address = x.x.x.x:4317
```

Note that at the time of writing, Grafana only supports exporting OTLP traces with the GRPC protocol. Most tracing backends, like Jaeger or the OTel Collector, support both protocols however.

With tracing enabled in both Pomerium and Grafana, navigate to your Grafana route. After a few seconds, the combined traces should be visible in Jaeger:

![Grafana traces in Jaeger](./img/tracing/grafana-trace-list.png)

The bottom trace (occurred first) is the initial unauthenticated request to Pomerium. The top trace is the authenticated request, after the user signed in and was redirected. This trace includes spans exported by Grafana itself, which we can see in detail:

![Grafana trace details](./img/tracing/grafana-trace.png)

Grafana exports very detailed traces, which can be helpful in debugging complex issues. The combined trace data helps easily visualize the request flow between Pomerium and Grafana, or any other upstream application.

### Visualizing the Pomerium auth flow

Pomerium can trace a request's entire journey through the authentication process, across multiple individual redirects between Pomerium services and the Identity Provider.

For example, this trace shows an unauthenticated request (`GET https://verify.localhost.pomerium.io/`) that triggered a sequence of redirects to perform the auth flow:

![Auth flow](./img/tracing/auth-flow.png)

The trace above ends with a final redirect to repeat the original request, but this time the user is authenticated:

### Examples
![Auth flow 2](./img/tracing/auth-flow-2.png)

![jaeger example trace](img/jaeger.png)
This trace ends with the proxied request to the upstream server.
3 changes: 2 additions & 1 deletion cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@
"pomerium",
"posix",
"proto",
"protobuf",
"proxied",
"proxying",
"psql",
Expand Down Expand Up @@ -231,4 +232,4 @@
"sidebars.js",
"static/_redirects"
]
}
}
Loading