Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo - remove root namespace from system-specific attributes and metrics #1711

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Dec 23, 2024

Demonstrates #1708 (comment) and #1618 (comment)

  • we're inconsistent when naming system-specific attributes and metrics. E.g. we have
    • jvm.* and not runtime.jvm.*
    • db.cosmosdb and not cosmosdb
    • messaging.kafka and not kafka
    • gcp and not cloud.gcp
    • ...
  • Any system, even if designed for a specific domain, is wider that this domain. E.g.
    • redis is mostly used as a cache, we call it a database, but it's also a messaging system - would we provide cache.redis, db.redis and messaging.redis?
    • Azure EventHubs is a messaging system, but we need metrics for connection management, load balancing, etc - having everything in eventhubs (or az.eventhubs) makes more sense.

This PR shows how the world would look like if we (in general) removed domain prefix from all system-specific things.

The naming guidance would be:

  • if metric/event/attribute describes a general concept applicable to multiple system within this domain, it should start with this domain (e.g. db.namespace or db.client.operation.duration)
  • if metrics/event/attribute is specific to a certain system, it should start with this system name (e.g. jvm.cpu.time, cosmosdb.client.request_charge)

| [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the Elasticsearch cluster which the client connects to. [10] | `customers`; `test.users` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [11] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.query.text`](/docs/attributes-registry/db.md) | string | The request body for a [search-type query](https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html), as a json string. [12] | `"{\"query\":{\"term\":{\"user.id\":\"kimchy\"}}}"` | `Recommended` [13] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`elasticsearch.node.name`](/docs/attributes-registry/elasticsearch.md) | string | Represents the human-readable identifier of the node/instance to which a request was routed. [14] | `instance-0000000001` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what Elastic folks think

@AlexanderWert @gregkalapos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elasticsearch.node.name seems ok to me. I think for Elasticsearch the proposed change would not make a significant difference, because it naturally falls under the db domain - so there was no discrepancy there anyways. E.g. the redis example in the PR description shows the problem very well - I'd say we haven't had this issue with Elasticsearch - definitely not to that extent.

So, having said that, I'd say, for consistency (if we go with this proposal and drop it due to other use-cases, e.g. redis), it'd be ok to drop db. for the Elasticsearch specific attributes.

| [`db.cosmosdb.regions_contacted`](/docs/attributes-registry/db.md) | string[] | List of regions contacted during operation in the order that they were contacted. If there is more than one region listed, it indicates that the operation was performed on multiple regions i.e. cross-regional call. [3] | `["North Central US", "Australia East", "Australia Southeast"]` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`db.cosmosdb.request_charge`](/docs/attributes-registry/db.md) | double | Request units consumed for the operation. | `46.18`; `1.0` | `Conditionally Required` when available | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`db.cosmosdb.sub_status_code`](/docs/attributes-registry/db.md) | int | Cosmos DB sub status code. | `1000`; `1002` | `Conditionally Required` when response was received and contained sub-code. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`cosmosdb.connection_mode`](/docs/attributes-registry/cosmosdb.md) | string | Cosmos client connection mode. | `gateway`; `direct` | `Conditionally Required` [1] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what CosmosDB folks would think about this (removing db in front of cosmosdb everywhere)

@jcocchi @sourabh1007

| [`messaging.batch.message_count`](/docs/attributes-registry/messaging.md) | int | The number of messages sent, received, or processed in the scope of the batching operation. [2] | `0`; `1`; `2` | `Conditionally Required` [3] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`messaging.destination.name`](/docs/attributes-registry/messaging.md) | string | The message destination name [4] | `MyQueue`; `MyTopic` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`messaging.kafka.message.tombstone`](/docs/attributes-registry/messaging.md) | boolean | A boolean that is true if the message is a tombstone. | | `Conditionally Required` [6] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`kafka.message.tombstone`](/docs/attributes-registry/kafka.md) | boolean | A boolean that is true if the message is a tombstone. | | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what messaging folks think (about removing messaging prefix from system-specific attributes and metrics)

@joaopgrassi @pyohannes @carlosalberto @trask

| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [9] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Conditionally Required` If available. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [8] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Conditionally Required` If available. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`servicebus.disposition_status`](/docs/attributes-registry/servicebus.md) | string | Describes the [settlement type](https://learn.microsoft.com/azure/service-bus-messaging/message-transfers-locks-settlement#peeklock). | `complete`; `abandon`; `dead_letter` | `Conditionally Required` if and only if `messaging.operation` is `settle`. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`servicebus.message.delivery_count`](/docs/attributes-registry/servicebus.md) | int | Number of deliveries that have been attempted for this message. | `2` | `Conditionally Required` [9] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

@lmolkova lmolkova Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the owner of the corresponding instrumentations, I'd be really excited about this change - I struggle with trying to fit everything about specific messaging lib/system into messaging namespace.

| [`gen_ai.openai.request.seed`](/docs/attributes-registry/gen-ai.md) | int | Requests with same seed value more likely to return same result. | `100` | `Conditionally Required` if the request includes a seed | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`gen_ai.openai.request.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier requested. May be a specific tier, default, or auto. | `auto`; `default` | `Conditionally Required` [4] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`gen_ai.openai.response.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier used for the response. | `scale`; `default` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`openai.request.response_format`](/docs/attributes-registry/openai.md) | string | The response format that is requested. | `json` | `Conditionally Required` if the request includes a response_format | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what gen_ai folks think about removing gen_ai prefix from system-specific conventions like openai.

E.g. what good does gen_ai prefix does for something like file upload - would we want to report gen_ai.client.openai.uploaded_files vs openai.client.uploaded_files metric?

@open-telemetry/semconv-genai-approvers @xrmx @aabmass @codefromthecrypt @trentm @lzchen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm gut feel is that we want the dimension/correlation of the source type vs embed it inside a naming convention, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above comment was about the attribute in general, but now noticing your question was limited to if it should be prefixed or not (nothing else changes except the prefixing).

Answer is similar, I think the attribute name should be owned by the blob upload component, not gen_ai. However, if this is about bascially all openai related properties becoming top-level and not being under gen_ai, main concern is it crowds the top-level namespace.

  • This puts a lot more pressure to pick names top-level names that are not ambiguous across domains. This might work for trademarked names better than non-trademarked ones.
  • This can help when other types of attributes are considered, like do we feel like saying GENAI_OPENAI_API_VERSION vs OPENAI_API_VERSION
  • semconv ownership is easier to figure out when it is prefixed, e.g. this team owns openai attributes, however, this is maybe not very useful to end users vs us

Gut feel is that removing the top-level namespaces for trademarked names, ones that don't have registry clash risk, sounds like it helps more than hurts.

I know it is a different issue, but in any text, most of these attributes are about the api or library in use, not necessarily the LLM provider (e.g xAI provider has openai compat support and would use these attributes for those traffic). We would need to keep that in mind when attributing to what's currently called "system". This clarification may not end up in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a question whether gen_ai prefix add any value for any attribute that describes something specific to a certain library/system/api/provider/whatever. If something is specific to openai API, we'd still use openai somewhere in the name.

| [`rpc.grpc.status_code`](/docs/attributes-registry/rpc.md) | int | The [numeric status code](https://github.com/grpc/grpc/blob/v1.33.2/doc/statuscodes.md) of the gRPC request. | `0`; `1`; `2` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`rpc.grpc.request.metadata.<key>`](/docs/attributes-registry/rpc.md) | string[] | gRPC request metadata, `<key>` being the normalized gRPC Metadata key (lowercase), the value being the metadata values. [1] | `rpc.grpc.request.metadata.my-custom-metadata-attribute=["1.2.3.4", "1.2.3.5"]` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`rpc.grpc.response.metadata.<key>`](/docs/attributes-registry/rpc.md) | string[] | gRPC response metadata, `<key>` being the normalized gRPC Metadata key (lowercase), the value being the metadata values. [2] | `rpc.grpc.response.metadata.my-custom-metadata-attribute=["attribute_value"]` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`grpc.status_code`](/docs/attributes-registry/grpc.md) | int | The [numeric status code](https://github.com/grpc/grpc/blob/v1.33.2/doc/statuscodes.md) of the gRPC request. | `0`; `1`; `2` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsuereth what would be your thoughts on getting rid of rpc. prefix from rpc.grpc? (and other similar changes in PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants