-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry TraceIdRatioBased sampler requirements following OTEP 235 #4166
Changes from 33 commits
0524a3d
c5453f8
25a61fd
68fa270
51f9794
ba5a47b
49673b7
e51bea6
4afe1c7
2f0dc0b
f333b71
b7376bd
483b3fa
c40de50
15a9c6f
672fac2
3c80d97
b2b37f7
1bb0b31
2f0e387
6e29b0e
77b51f8
a61fbdd
59c329d
d21f341
4e05267
d65ea09
92876f9
1855839
44c8190
66d190f
0aacc19
e6dc409
c75a010
87fb314
f3693fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,7 +30,10 @@ linkTitle: SDK | |
+ [AlwaysOn](#alwayson) | ||
+ [AlwaysOff](#alwaysoff) | ||
+ [TraceIdRatioBased](#traceidratiobased) | ||
- [Requirements for `TraceIdRatioBased` sampler algorithm](#requirements-for-traceidratiobased-sampler-algorithm) | ||
- [`TraceIdRatioBased` sampler configuration](#traceidratiobased-sampler-configuration) | ||
- [`TraceIdRatioBased` sampler algorithm](#traceidratiobased-sampler-algorithm) | ||
- [`TraceIdRatioBased` sampler description](#traceidratiobased-sampler-description) | ||
- [`TraceIdRatioBased` sampler compatibility warning](#traceidratiobased-sampler-compatibility-warning) | ||
+ [ParentBased](#parentbased) | ||
+ [JaegerRemoteSampler](#jaegerremotesampler) | ||
- [Span Limits](#span-limits) | ||
|
@@ -372,12 +375,24 @@ Callers SHOULD NOT cache the returned value. | |
### Built-in samplers | ||
|
||
OpenTelemetry supports a number of built-in samplers to choose from. | ||
The default sampler is `ParentBased(root=AlwaysOn)`. | ||
|
||
The default sampler is `ParentBased(root=AlwaysOn)`, which configures | ||
a policy depending on whether the new span is a root or a child: | ||
|
||
* For root spans, always sample a new context. | ||
* For child spans, take the decision of the parent context. | ||
|
||
By using the ParentBased sampler by default, users can change sampling | ||
across their system by reconfiguring only root span Samplers. To | ||
configure probability-based trace sampling across a system, users may | ||
configure `ParentBased(root=TraceIdRatioBased{probability})`. | ||
|
||
#### AlwaysOn | ||
|
||
* Returns `RECORD_AND_SAMPLE` always. | ||
* Description MUST be `AlwaysOnSampler`. | ||
* If the incoming TraceState has a valid OpenTelemetry TraceState `th` sub-key, the the returned TraceState is unmodified. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That part would be correct if the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed-- Assertion: Consistent probability samplers should not inspect the sampled flag. ConsistentParentBased: when it is invoked in child context, it simply copies the Th value and returns the sampled flag as its decision. If there is a sampled flag and no th value: leave th unset and respect the sampled flag. This is an error case. I will revert commit e6dc409 Then, I will make a new PR to specify the ConsistentParentBased sampler. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue is that span metrics in this service would be skewed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, if service B uses |
||
* If the there is no incoming TraceState or the OpenTelemetry TraceState `th` sub-key is not set, the returned TraceState SHOULD include `th:0`. | ||
|
||
#### AlwaysOff | ||
|
||
|
@@ -386,40 +401,78 @@ The default sampler is `ParentBased(root=AlwaysOn)`. | |
|
||
#### TraceIdRatioBased | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the | ||
parent `SampledFlag`, the `TraceIdRatioBased` should be used as a delegate of | ||
the `ParentBased` sampler specified below. | ||
* Description MUST return a string of the form `"TraceIdRatioBased{RATIO}"` | ||
with `RATIO` replaced with the Sampler instance's trace sampling ratio | ||
represented as a decimal number. The precision of the number SHOULD follow | ||
implementation language standards and SHOULD be high enough to identify when | ||
Samplers have different ratios. For example, if a TraceIdRatioBased Sampler | ||
had a sampling ratio of 1 to every 10,000 spans it COULD return | ||
`"TraceIdRatioBased{0.000100}"` as its description. | ||
|
||
TODO: Add details about how the `TraceIdRatioBased` is implemented as a function | ||
of the `TraceID`. [#1413](https://github.com/open-telemetry/opentelemetry-specification/issues/1413) | ||
|
||
##### Requirements for `TraceIdRatioBased` sampler algorithm | ||
|
||
* The sampling algorithm MUST be deterministic. A trace identified by a given | ||
`TraceId` is sampled or not independent of language, time, etc. To achieve this, | ||
implementations MUST use a deterministic hash of the `TraceId` when computing | ||
the sampling decision. By ensuring this, running the sampler on any child `Span` | ||
will produce the same decision. | ||
* A `TraceIdRatioBased` sampler with a given sampling rate MUST also sample all | ||
traces that any `TraceIdRatioBased` sampler with a lower sampling rate would | ||
sample. This is important when a backend system may want to run with a higher | ||
sampling rate than the frontend system, this way all frontend traces will | ||
still be sampled and extra traces will be sampled on the backend only. | ||
* **WARNING:** Since the exact algorithm is not specified yet (see TODO above), | ||
there will probably be changes to it in any language SDK once it is, which | ||
would break code that relies on the algorithm results. | ||
Only the configuration and creation APIs can be considered stable. | ||
It is recommended to use this sampler algorithm only for root spans | ||
(in combination with [`ParentBased`](#parentbased)) because different language | ||
SDKs or even different versions of the same language SDKs may produce inconsistent | ||
results for the same input. | ||
**Status**: [Development](../document-status.md) | ||
|
||
The `TraceIdRatioBased` sampler implements simple, ratio-based probability sampling using randomness features specified in the [W3C Trace Context Level 2][W3CCONTEXTMAIN] Candidate Recommendation. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
OpenTelemetry follows W3C Trace Context Level 2, which specifies 56 bits of randomness, in making use of 56 bits of information for probabilistic sampling decisions. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[OpenTelemetry defines consistent probability sampling using 56 bits of randomness][CONSISTENTSAMPLING]. | ||
|
||
The `TraceIdRatioBased` sampler MUST ignore the parent `SampledFlag`. | ||
For respecting the parent `SampledFlag`, see the `ParentBased` sampler specified below. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Note that the "ratio-based" part of this Sampler's name implies that | ||
it makes a probability decision directly from the TraceID, even though | ||
it was not originally specified in an exact way. In the present | ||
specification, the Sampler decision is more nuanced: only a portion of | ||
the identifier is used, after checking whether the OpenTelemetry | ||
TraceState field contains an explicit trace randomness value. | ||
|
||
[W3CCONTEXTMAIN]: https://www.w3.org/TR/trace-context-2 | ||
|
||
##### `TraceIdRatioBased` sampler configuration | ||
|
||
The `TraceIdRatioBased` sampler is typically configured using a 32-bit or 64-bit floating point number to express the sampling ratio. | ||
The minimum valid sampling ratio is `2^-56`, and the maximum valid sampling ratio is 1.0. | ||
From an input sampling ratio, a rejection threshold value is calculated; see [consistent-probability sampler requirements][CONSISTENTSAMPLING] for details on converting sampling ratios into thresholds with variable precision. | ||
|
||
[CONSISTENTSAMPLING]: ./tracestate-probability-sampling.md | ||
|
||
##### `TraceIdRatioBased` sampler algorithm | ||
|
||
Given a Sampler configured with a sampling threshold `T` and Context with randomness value `R` (typically, the 7 rightmost bytes of the trace ID), when `ShouldSample()` is called, it uses the expression `R >= T` to decide whether to return `RECORD_AND_SAMPLE` or `DROP`. | ||
|
||
* If randomness value (R) is greater or equal to the rejection threshold (T), meaning when (R >= T), return `RECORD_AND_SAMPLE`, otherwise, return `DROP`. | ||
* When (R >= T), the OpenTelemetry TraceState SHOULD be modified to include the key-value `th:T` for rejection threshold value (T), as specified for the [OpenTelemetry TraceState `th` sub-key][TRACESTATEHANDLING]. | ||
|
||
[TRACESTATEHANDLING]: ./tracestate-handling.md#sampling-threshold-value-th | ||
|
||
##### `TraceIdRatioBased` sampler description | ||
|
||
The `TraceIdRatioBased` GetDescription MUST return a string of the form `"TraceIdRatioBased{RATIO}"` | ||
with `RATIO` replaced with the Sampler instance's trace sampling ratio | ||
represented as a decimal number. The precision of the number SHOULD follow | ||
implementation language standards and SHOULD be high enough to identify when | ||
Samplers have different ratios. For example, if a TraceIdRatioBased Sampler | ||
had a sampling ratio of 1 to every 10,000 spans it could return | ||
`"TraceIdRatioBased{0.000100}"` as its description. | ||
jmacd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
##### `TraceIdRatioBased` sampler compatibility warning | ||
|
||
This specification has been revised from the original | ||
`TraceIdRatioBased` Sampler definition. The present definition for | ||
`TraceIdRatioBased` uses a new definition for trace randomness, where | ||
unless an explicit trace randomness value is set in the OpenTelemetry | ||
TraceState `rv` sub-key, Samplers are meant to presume that TraceIDs | ||
contain the necessary 56 bits of randomness. | ||
|
||
When a TraceIdRatioBased Sampler makes a decision for a non-root Span | ||
based on TraceID randomness, there is a possibility that the TraceID | ||
was in fact generated by an older SDK, unaware of this specification. | ||
The Trace random flag lets us disambiguate these two cases. This flag | ||
propagates information to let TraceIdRatioBased Samplers confirm that | ||
TraceIDs are random, however this requires W3C Trace Context Level 2 | ||
to be supported by every Trace SDK that has handled the context. | ||
|
||
When a TraceIdRatioBased Sampler makes a decision for a non-root Span | ||
using TraceID randomness, but the Trace random flag was not set, the | ||
SDK SHOULD issue a one-time warning statement in its log with a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: one-time warning - You may need to include criteria of what consistutes a "time", i.e. is it once per line of code, once per span name... I think this is a bit too vague to implement consistently. |
||
compatibility warning. As an example of this compatibility warning: | ||
|
||
``` | ||
WARNING: The TraceIdRatioBased sampler is presuming TraceIDs are random | ||
and expects the Trace random flag to be set in confirmation. Please | ||
upgrade your caller(s) to use W3C Trace Context Level 2. | ||
``` | ||
|
||
#### ParentBased | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should be an "aside" or non-normative callout. I'm not sure we have precedence here, but could you move this to some different markdown structure, perhaps
> quote
?