Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for emitting exemplars in metric logs #873

Open
dtrejod opened this issue Dec 5, 2024 · 0 comments
Open

Add support for emitting exemplars in metric logs #873

dtrejod opened this issue Dec 5, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@dtrejod
Copy link
Contributor

dtrejod commented Dec 5, 2024

What happened?

Currently, the metric.1 log type in WGS does not support the inclusion of "Samples" (also known as exemplars) in histogram and timer measurements. This limits the ability to directly correlate metric spikes with specific trace data, making it difficult to debug and understand the root cause of issues observed in metrics.

What did you want to happen?

Add support for "Samples" (exemplars) to the metric.1 log type, specifically for histogram and timer measurements. These samples should include the last exact measurement values along with a traceId that links to the corresponding parent trace. This will enable users to quickly find and examine sampled traces when looking at metrics, especially useful for analyzing latency and investigating slow requests.

Background

Exemplars allow you to quickly find sampled traces when looking at metrics, providing valuable context for understanding metric spikes. They are particularly useful when investigating latency issues, as they help correlate specific high-latency requests with trace data.

A raw metric generally represents a statistical aggregation of numerous values with the same labels over a scraping interval (30s in our case). In contrast, an exemplar is a single exact measurement of that value, along with its actual timestamp. By selecting a sampled measurement with span information available, users can easily look up the trace in the trace viewer.

Proposed Changes

  1. Extend the metric.1 log type to include a samples field for histogram measurements.
  2. Define the samples field as a list of Sample objects.
  3. Each Sample object should include:
    • value: The exact value of the sample.
    • time: The RFC3339Nano UTC datetime string of when the sample was taken.
    • traceId: The optional Zipkin trace id associated with this sample, if available.

Example Exemplar

{
  "type": "metric.1",
  "time": "2024-12-05T15:10:32.59714886Z",
  "metricName": "server.request.size",
  "metricType": "histogram",
  "values": {
    "max": 695,
    "p95": 387.0,
    "p99": 448.0,
    "p999": 695.0,
    "count": 467012
  },
  "samples": [
    {
      "value": 387,
      "time": "2024-12-05T15:10:10.161091075Z",
      "traceId": "a5a56a355d767807"
    }
  ],
  "tags": {
    "endpoint": "put",
    "framework": "conjure-undertow",
  },
  "unsafeParams": {}
}
@dtrejod dtrejod added the enhancement New feature or request label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant