Skip to content

Commit

Permalink
add uptime metric [SLT-372] (#3321)
Browse files Browse the repository at this point in the history
* uptime metrics

* update

---------

Co-authored-by: Trajan0x <[email protected]>
  • Loading branch information
trajan0x and trajan0x authored Oct 21, 2024
1 parent c5c990e commit 112a9ab
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 3 deletions.
9 changes: 9 additions & 0 deletions core/metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,14 @@ The metrics endpoint is exposed on `/metrics` on port `8080` by default and is c

**Note: this server failing to bind to `METRICS_PORT` will not cause the application to fail to start. The error will be logged.**

Most metrics come with a `# HELP` explanation that explains them, for example:

```promql
# HELP process_uptime_seconds The uptime of the process in seconds
# TYPE process_uptime_seconds gauge
process_uptime_seconds{otel_scope_name="standard_metrics",otel_scope_version=""} 24.241680459
```

## Logger

Currently, the entire sanguine codebase uses [ipfs go-log]("https://github.com/ipfs/go-log"). As pointed out in [#1521](https://github.com/synapsecns/sanguine/issues/1521), this is not a good long term solution since the logs are not currently appended to opentelemetry, and so new traces require telemtry.
Expand All @@ -80,3 +88,4 @@ Note: because both [ipfs go-log]("https://github.com/ipfs/go-log") and [otelzap
### Using the logger

Since the logger is dependent on the `context` to derive the current span, you need to always use `logger.Ctx(ctx)` or `logger.InfoCtx`. One thing under consideration is removing the non-ctx methods

8 changes: 5 additions & 3 deletions core/metrics/base.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ package metrics
import (
"context"
"fmt"
"net/http"

"github.com/gin-gonic/gin"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/synapsecns/sanguine/core"
Expand All @@ -30,6 +28,7 @@ import (
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
"go.opentelemetry.io/otel/trace"
"gorm.io/gorm"
"net/http"
)

const pyroscopeEndpoint = internal.PyroscopeEndpoint
Expand All @@ -44,7 +43,8 @@ type baseHandler struct {
tracer trace.Tracer
name string
propagator propagation.TextMapPropagator
meter MeterProvider
// Deprecated: will be removed in a future version
meter MeterProvider
// handler is an integrated handler for everything exported over http. This includes prometheus
// or http-based sampling methods for other providers.
handler http.Handler
Expand Down Expand Up @@ -78,6 +78,8 @@ func (b *baseHandler) Start(ctx context.Context) error {
otel.SetMeterProvider(b.meter)
b.handler = promhttp.Handler()

newStandardMetrics(ctx, b)

go func() {
<-ctx.Done()
// shutting down this way will not flush.
Expand Down
41 changes: 41 additions & 0 deletions core/metrics/standard.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
package metrics

import (
"context"
"go.opentelemetry.io/otel/metric"
"time"
)

// standardMetrics records metrics across any service using the metrics handler.
type standardMetrics struct {
metrics Handler
meter metric.Meter
uptimeGauge metric.Float64ObservableGauge
startTime time.Time
}

const processUptimeSecondsMetric = "process_uptime_seconds"

func newStandardMetrics(ctx context.Context, handler Handler) {
str := standardMetrics{
metrics: handler,
meter: handler.Meter("standard_metrics"),
startTime: time.Now(),
}

var err error
if str.uptimeGauge, err = str.meter.Float64ObservableGauge(processUptimeSecondsMetric, metric.WithDescription("The uptime of the process in seconds"), metric.WithUnit("seconds")); err != nil {
handler.ExperimentalLogger().Errorf(ctx, "failed to create %s gauge: %v", processUptimeSecondsMetric, err)
}

// Register callback
if _, err = str.meter.RegisterCallback(str.uptimeCallback, str.uptimeGauge); err != nil {
handler.ExperimentalLogger().Warnf(ctx, "failed to register callback: %v", err)
}
}

func (str *standardMetrics) uptimeCallback(_ context.Context, observer metric.Observer) error {
uptimeDuration := time.Since(str.startTime).Seconds()
observer.ObserveFloat64(str.uptimeGauge, uptimeDuration)
return nil
}

0 comments on commit 112a9ab

Please sign in to comment.