-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166
Conversation
Feedback from the OTel Spec SIG meeting discussion cc/ @jsuereth:
Update: 68fa270 |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
…ication into jmacd/otep235
…ication into jmacd/otep235
This reduces the number of lines of diff in PR 4166, which replaces the entire `tracestate-probability-sampling.md` file with new contents. Part of #4166. ## Changes Move a file, place a link to it and explain that a change is in progress.
@kalyanaj @PeterF778 @oertl @kentquirk Please take another look at this PR, especially the file |
@open-telemetry/specs-trace-approvers @open-telemetry/specs-approvers @open-telemetry/technical-committee this PR has reached consensus in the Sampling SIG, we have multiple prototypes implemented, and we are looking for final approvals. |
@PeterF778 @oertl @kentquirk @kalyanaj @yuanyuanzhao3, please take another look. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
I will place this PR in draft while the Sampling SIG discusses at least one more detail raised by @yuanyuanzhao3, which is (in my words) a concern about the ambiguity between |
specification/trace/sdk.md
Outdated
|
||
#### AlwaysOn | ||
|
||
* Returns `RECORD_AND_SAMPLE` always. | ||
* Description MUST be `AlwaysOnSampler`. | ||
* If the incoming TraceState has a valid OpenTelemetry TraceState `th` sub-key, the the returned TraceState is unmodified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part would be correct if the AlwaysOnSampler
decided to sample by looking at the parent sampled
flag. But it is not supposed to. It is expected to always sample, also when the parent was not sampled. Thus th:0
is always correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed--
As discussed in the SIG today, we are missing a specification for a ConsistentParentBased sampler. We should probably not use the AlwaysOn sampler in this case.
Assertion: Consistent probability samplers should not inspect the sampled flag.
ConsistentParentBased: when it is invoked in child context, it simply copies the Th value and returns the sampled flag as its decision. If there is a sampled flag and no th value: leave th unset and respect the sampled flag. This is an error case.
I will revert commit e6dc409
Then, I will make a new PR to specify the ConsistentParentBased sampler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that span metrics in this service would be skewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if service B uses AlwaysOnSampler
, all of its spans will get sampled (regardless of the sampling decisions for service A upstream). This might mean incomplete traces, but the metrics for service B will be correct with th:0
.
…metry#4168) This reduces the number of lines of diff in PR 4166, which replaces the entire `tracestate-probability-sampling.md` file with new contents. Part of open-telemetry#4166. ## Changes Move a file, place a link to it and explain that a change is in progress.
This reverts commit e6dc409.
specification/trace/sdk.md
Outdated
* For root spans, always sample a new context. | ||
* For child spans, take the decision of the parent context. | ||
|
||
By using the ParentBased sampler by default, users can change sampling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should be an "aside" or non-normative callout. I'm not sure we have precedence here, but could you move this to some different markdown structure, perhaps > quote
?
|
||
When a TraceIdRatioBased Sampler makes a decision for a non-root Span | ||
using TraceID randomness, but the Trace random flag was not set, the | ||
SDK SHOULD issue a one-time warning statement in its log with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one-time warning - You may need to include criteria of what consistutes a "time", i.e. is it once per line of code, once per span name...
I think this is a bit too vague to implement consistently.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
…ication into jmacd/otep235
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Fixes #1413.
Changes
Updates Trace SDK and TraceState handling specifications with OTEP 235 sampling thresholds. This PR depends on #4162 to introduce the concept of Trace Randomness. This PR is the second part of two, it focuses on thresholds.
TraceIdRatioBased
algorithm section. The existing TODO implies this is not a breaking change.TraceIdRatioBased
constructionTraceIdRatioBased
description (leave unmodified).The content of OTEP 235 was revised for clarity by @kalyanaj in open-telemetry/oteps#261. I've heavily copied from the final text in that still-unmerged OTEP. I introduced new content explaining how to compute thresholds from probabilities with use of variable precision, referring to the OTel Collector-Contrib
pkg/sampling
reference implementation. The new (Golang) demonstration code is validated here, https://go.dev/play/p/7eLM6FkuoA5.A proof of concept for this specification along with #4162 can be found in open-telemetry/opentelemetry-go#5645.
Part of #3602.
Product of the Sampling SIG members @kentquirk @kalyanaj @oertl @PeterF778 and myself.
CHANGELOG.md
spec-compliance-matrix.md