Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics API] Ability to record a metric with a custom timestamp #4343

Open
ThePlenkov opened this issue Dec 16, 2024 · 10 comments
Open

[Metrics API] Ability to record a metric with a custom timestamp #4343

ThePlenkov opened this issue Dec 16, 2024 · 10 comments
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:followup Needs follow up during triage

Comments

@ThePlenkov
Copy link

ThePlenkov commented Dec 16, 2024

What are you trying to achieve?

I'd like to be able to develop so-called proxy layers with SDK, where I'd like to register metrics such as gauge and histogram providing custom timestamps. Currently timestamp defaults to the current time such as Date.now( ) in JavaScript, but in case if we provide metrics exposed from alternative sources ( not with SDK ) - we're not able to proxy those metrics correctly because of the different timestamp.

This issue can be easily implemented, but will make SDK much more flexible in terms of registering metrics in a right moment.

So the proposal is:

  • method add (counters) should contain optional parameter timestamp which is default to a current time
  • method record (gauge,histogram) should contain optional parameter timestamp which is default to a current time

Additional context.

In our ABAP runtime (internal SAP language) we do not have OTel SDK available out-of-the box. For that purpose we created own SDK ( https://github.com/abapify/otel/ ) and set of exporters ( such as MQTT ) which are streaming telemetry events in a ABAP compatible format ( not OTLP, own format chosen by set of reasons explained here ). So as intermediate layer - we created a NodeJS service which is not generating, but streaming/proxying OpenTelemetry events using JS SDK.

Because of that our charts attract metrics according to how they were processed by proxy service but as they were actually recorded. Currently we face some shift, but there could be more complicated patterns when service is temporary down - metrics will be shown completely wrong..

Unfortunately there are few problem points here. This is one of the problems we do not have a solution for yet.

@ThePlenkov ThePlenkov added the spec:metrics Related to the specification/metrics directory label Dec 16, 2024
@trask
Copy link
Member

trask commented Dec 17, 2024

hi @ThePlenkov! have you looked at writing a metric producer to solve your needs?

bridges to third-party metric sources

@trask trask added the triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details label Dec 17, 2024
@ThePlenkov
Copy link
Author

hi @ThePlenkov! have you looked at writing a metric producer to solve your needs?

That's exactly what I've been advised here

Going to try it today

@ThePlenkov
Copy link
Author

Here is our solution:

  • we parse metric message ( it's not OTLP, own format ) and create a metric using metric SDK
  • when we record/add a data point - we also provide a technical attribute TS containing original timestamp
  • During inititalisation we created a class inherited from PeriodicExportingMetricReader and redefined collect method
  • In redefined method we first collect resource metrics from super.collect( ) and then replace start time from attribute TS
  • To avoid conflicts we also update end time with keeping same difference as it was before comparing to the initial start time
  • Finally we delete TS attribute from the resource metric

So as you can see we managed to solve a problem , but the solution still looks a bit "hacky", but indeed much less risky than redefining _record method of the metric.

Unfortunately I didn't find the solution how we could just implement the whole MetricProducer in a standalone class from the scratch without extending MetricReader. We would prefer still to construct resource metrics with SDK rather than define it manually.

Having the option to provide metric timestamps when recording them would help to avoid even doing this and would help to construct resource metrics with SDK.

@github-actions github-actions bot added the triage:followup Needs follow up during triage label Jan 1, 2025
@danielgblanco danielgblanco added triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted and removed triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details triage:followup Needs follow up during triage labels Jan 6, 2025
@danielgblanco
Copy link
Contributor

@ThePlenkov I think I understand the use case, but I'll leave this open for discussion by the community to collected feedback.

@danielgblanco
Copy link
Contributor

Hi @ThePlenkov, one thing I'd like to follow up on is the original use case. What you're referring to is not changing the timestamp on the metric producer as it produces a batch of metrics, but being able to change it at the measurement level? This effectively generates a single data point for each of the measurements at a given timestamp (not an aggregation across multiple measurements between timestamps).

The Logs/Events API allows to specify a (optional) timestamp. Would events be an alternative here instead of metrics?

@ThePlenkov
Copy link
Author

Hi @danielgblanco . Well let me please explain the case please.

We're building metrics in our own runtime ( SAP ABAP ). We would like to register metrics with timestamps from that exact moment when is registered in a code ( like metric->add_counter( 1 ) for example ). So what happens after - we're storing those metrics in a certain format and export from the system to a layer called OTLP proxy which is written in TS and already can use OTLP SDK. This layer receives messages from our service ( not in OTLP format, we use own simplified format ) and using SDK it tries to re-create same records programmatically. What works with traces and logs doesn't work with metrics - that's right.

So our latest solution is to use a special TS attribute when we put the actual timestamp and then we redefine it in a collect method:

import { millisToHrTime } from '@opentelemetry/core'
import type { CollectionResult } from '@opentelemetry/sdk-metrics'
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics'
import type { CollectionOptions } from '@opentelemetry/sdk-metrics/build/src/types'

/**
 * An extension of the `PeriodicExportingMetricReader` that supports external timestamps.
 *
 * This class allows data points to include a custom `__TS__` attribute that specifies
 * the original timestamp at which the measurement occurred. This is useful for
 * scenarios where metrics are collected late (e.g., after transmission delays) and
 * need to reflect their actual occurrence time rather than the collection time.
 *
 * When a data point includes a `__TS__` attribute (string or number), this attribute
 * will be parsed as a timestamp and used to adjust the `startTime` and `endTime`
 * of the data point. After adjusting the timestamps, the `__TS__` attribute is removed
 * from the attributes to avoid polluting the exported data.
 */
// eslint-disable-next-line id-length
export class PeriodicExportingMetricReaderWithTimestamps extends PeriodicExportingMetricReader {
  /**
   * Collects metrics and adjusts their timestamps if a `__TS__` attribute is present.
   *
   * @param {CollectionOptions} [options] - Optional collection options.
   * @returns {Promise<CollectionResult>} A promise that resolves to the collected metrics.
   *
   * During collection, each metric's data points are examined. If a data point has an
   * attribute named `__TS__` that can be interpreted as a valid date, the data point's
   * `startTime` and `endTime` are shifted so that the `startTime` aligns with the
   * provided timestamp. The `endTime` is adjusted to maintain the original duration.
   * After adjusting timestamps, the `__TS__` attribute is removed.
   */
  async collect(options?: CollectionOptions): Promise<CollectionResult> {
    const result = await super.collect(options)

    for (const scopeMetric of result.resourceMetrics.scopeMetrics) {
      for (const metric of scopeMetric.metrics) {
        for (const dataPoint of metric.dataPoints) {
          const ts = dataPoint.attributes.__TS__
          if (
            (typeof ts === 'string' || typeof ts === 'number') &&
            !Number.isNaN(new Date(ts).getTime())
          ) {
            const startTime = millisToHrTime(new Date(ts).getTime())

            // Calculate the time difference between endTime and startTime
            // as we don't know why end time differs from startTime,
            // so we just copy the difference
            const durationSeconds = dataPoint.endTime[0] - dataPoint.startTime[0]
            const durationNanos = dataPoint.endTime[1] - dataPoint.startTime[1]

            // Adjust startTime and endTime
            ;(dataPoint as { startTime: [number, number] }).startTime = startTime
            ;(dataPoint as { endTime: [number, number] }).endTime = [
              startTime[0] + durationSeconds,
              startTime[1] + durationNanos,
            ]

            // console.log('new time', dataPoint.startTime)

            // Remove the custom timestamp attribute
            delete dataPoint.attributes.__TS__
          }
        }
      }
    }

    return result
  }
}

I know this is probably wrong from the point of SDK - but so far this is the easiest way we found to reach this goal.

So once again - the goal is using SDK to construct the metric message ( while it was received from the external source, not created with this SDK ).

Does it make sense?

@github-actions github-actions bot added the triage:followup Needs follow up during triage label Jan 21, 2025
@trask
Copy link
Member

trask commented Jan 21, 2025

hi @ThePlenkov, can you give a bit more info on why MetricProducer isn't working, e.g. in Java I think what you're proposing should be a straightforward implementation of https://github.com/open-telemetry/opentelemetry-java/blob/13e2b8c4dfeba09f361b47986febbd792ab1ae60/sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/export/MetricProducer.java#L33-L42, and shouldn't require pushing the original timestamp into a fake attribute

is the problem that you also need to do your own aggregation?

have you explored using a collector receiver to do this?

@trask trask removed the triage:followup Needs follow up during triage label Jan 21, 2025
@ThePlenkov
Copy link
Author

Hi @trask for sure I might not see the full picture and of course we only have experience of working with JS lib but I guess it's similar in any language.

So what is out problem.

We have developed a service written in JS which is receiving message from SAP in SAP compatible format ( that's how we essentially simplify the amount of the custom code on SAP side ).

This service receives the information about metric event ( like count, or add ) with value, attributes and the actual timestamp.

Since the process of this message delivery is asyncronous - there might be some slight delay in actual metric raising and the moment we handle it.

So to forward this metric further we wrote some piece of code which is using SDK like this:

metric_api->get_meter( )->create_counter( )->add_value( ).

which is recreating the metric in the SDK and luckily ( because we have a central OTel pipeline, managed by dedicated team ) those metrics got exposed automatically.

Alternatively as far as I understand if we need to implement own MetricProducer we have to implement this method: https://open-telemetry.github.io/opentelemetry-js/interfaces/_opentelemetry_sdk_metrics.MetricProducer.html#collect and in this case we need to construct the result manually. WHich is indeed possible - but i just think we loose the power of using SDK in this case.

Did i understand it right?

Thanks
Petr

@trask
Copy link
Member

trask commented Jan 22, 2025

@ThePlenkov makes sense, I'm just not sure if there's an easy solution to this, as adding custom timestamp to the metrics API would probably add considerable complexity to all of the language implementations since they would then need to bucket things by timestamp as well, and it's unclear when you would export out those old buckets (how do you know when the last timestamp for an old bucket was received), or can a single old bucket be sent across multiple exports, which adds even more complexity

@ThePlenkov
Copy link
Author

well - the solution mentioned above is easy but ugly =) Meanwhile it's covering our needs fully so far. So far we didn't notice any deviations in metric numbers.

@github-actions github-actions bot added the triage:followup Needs follow up during triage label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:followup Needs follow up during triage
Projects
None yet
Development

No branches or pull requests

3 participants