-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenTelemetry sampling conventions #793
Conversation
@lmolkova @pyohannes Thank you! I realized from your responses that this repository needs to contain more detail, whereas I had been planning on rolling that detail into open-telemetry/opentelemetry-specification#3910 and now I've put some of that text into this PR. I created a file in the attributes registry and moved the attribute definitions there. I am hoping to justify the use of "sampling" as a prefix, for historical reasons. I removed the Since spans record |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the additional context, I still have some questions and suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy about this document.
Co-authored-by: Kent Quirk <[email protected]>
Reviewers: I see no reason to continue promoting Therefore, I have revised this PR only to specify |
…entions into jmacd/sampling_convs
it looks like you need to regenerate the markdown to fully remove |
Updated: |
The OpenTelemetry sampling decision is defined in terms of a Threshold | ||
value and a Randomness value, each containing 56 bits of information. | ||
|
||
A constant known as _maximum adjusted count_ (`MaxAdjustedCount`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be just me, but I think that max
-something suggests inclusiveness, so this can be confusing. How about AdjustedCountLimit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with MaxAdjustedCount
. It is inclusive with respect to the adjusted count. However, I understand that it can be a little confusing as it is also used as an exclusive upper limit for the threshold and the random value.
The OpenTelemetry sampling decision is defined in terms of a Threshold | ||
value and a Randomness value, each containing 56 bits of information. | ||
|
||
A constant known as _maximum adjusted count_ (`MaxAdjustedCount`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with MaxAdjustedCount
. It is inclusive with respect to the adjusted count. However, I understand that it can be a little confusing as it is also used as an exclusive upper limit for the threshold and the random value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor wording clarification, but otherwise this looks good.
docs/sampling/README.md
Outdated
### Sampling randomness | ||
|
||
When determining the Randomness value from an item of telemetry, | ||
sampler implementations SHOULD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampler implementations SHOULD: | |
sampler implementations SHOULD evaluate the following in order: |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@jmacd do we have a consensus on this PR within the sampling WG? |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Miscellaneous updates: Apologies for stalling. I am re-opening this with the intention to start it moving again, next week. Related: @kalyanaj is working on minor revisions to OTEP 235 based on our analysis of the W3C tracestate level 2 "random" flag. Related: The semantic conventions here go slightly beyond OTEP 235, which focuses on tracestate and does not explicitly document our approach to logs sampling. The new semantic conventions in this PR could be added back into OTEP 235 as there is no disagreement in the Sampling SIG, or we could just approve them here -- I am referring to the new The OTel Collector prototype for this is in review in its final stage. If we don't get thus work approved and merged soon, the work done there will be at-risk for near-future breaking changes. Please see open-telemetry/opentelemetry-collector-contrib#31894. |
Co-authored-by: Kent Quirk <[email protected]>
Co-authored-by: Kent Quirk <[email protected]>
I will re-open this PR soon, will close it for now. In particular, the review for open-telemetry/opentelemetry-collector-contrib#31894 raises the question of whether the behavior adopted in If you have a hashing algorithm to construct some number of bits used for a consistent threshold-based approach and you want to record your sampling decision in a collection-path sampler such as the referenced component, you SHOULD synthesize an |
…rt OTEP 235) (#31894) **Description:** Creates new sampler modes named "equalizing" and "proportional". Preserves existing functionality under the mode named "hash_seed". Fixes #31918 This is the final step in a sequence, the whole of this work was factored into 3+ PRs, including the new `pkg/sampling` and the previous step, #31946. The two new Sampler modes enable mixing OTel sampling SDKs with Collectors in a consistent way. The existing hash_seed mode is also a consistent sampling mode, which makes it possible to have a 1:1 mapping between its decisions and the OTEP 235 randomness and threshold values. Specifically, the 14-bit hash value and sampling probability are mapped into 56-bit R-value and T-value encodings, so that all sampling decisions in all modes include threshold information. This implements the semantic conventions of open-telemetry/semantic-conventions#793, namely the `sampling.randomness` and `sampling.threshold` attributes used for logs where there is no tracestate. The default sampling mode remains HashSeed. We consider a future change of default to Proportional to be desirable, because: 1. Sampling probability is the same, only the hashing algorithm changes 2. Proportional respects and preserves information about earlier sampling decisions, which HashSeed can't do, so it has greater interoperability with OTel SDKs which may also adopt OTEP 235 samplers. **Link to tracking Issue:** Draft for open-telemetry/opentelemetry-specification#3602. Previously #24811, see also open-telemetry/oteps#235 Part of #29738 **Testing:** New testing has been added. **Documentation:** ✅ --------- Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Changes
Introduces 3 conventional attributes to describe sampling in OpenTelemetry collection pipelines. This specification refers to OTEP 235.
Related to the specification change (from OTEP 235) in open-telemetry/opentelemetry-specification#3910.
Related to the specification of about representing trace context in logs in open-telemetry/opentelemetry-specification#3909.
Prototype for a Collector sampler based on these attributes in open-telemetry/opentelemetry-collector-contrib#29720 and [WIP] open-telemetry/opentelemetry-collector-contrib#24811.
Part of open-telemetry/opentelemetry-specification#1413
Merge requirement checklist
[chore]