-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sampling constant #468
base: main
Are you sure you want to change the base?
Conversation
trace-id = 32HEXDIGLC ; 16 bytes array identifier. All zeroes forbidden | ||
parent-id = 16HEXDIGLC ; 8 bytes array identifier. All zeroes forbidden | ||
trace-flags = 2HEXDIGLC ; 8 bit flags. Currently, only one bit is used. See below for details | ||
sampling-constant = sampling-random-value parent-sampling-constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampling-constant = sampling-random-value parent-sampling-constant | |
sampling-constant = sampling-random-value parent-sampling-rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not really the rate. It is a constant used to calculate the probability. For example, a value of 2 works out to a probability of p = 2^-x = 2^-2 = 0.25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the term "parent sampling constant" has no meaning to me. It is still a sampling rate, it's just encoded in a particular manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we have it specified in open-telemetry/oteps#168, this field contains the "adjusted count" specifically encoded by the logarithm for non-zero values and as 63 for the zero case.
trace-flags = 2HEXDIGLC ; 8 bit flags. Currently, only one bit is used. See below for details | ||
sampling-constant = sampling-random-value parent-sampling-constant | ||
sampling-random-value = 2HEXDIGLC ; geometrically distributed 8-bit random value used for consistent sampling | ||
parent-sampling-constant = 2HEXDIGLC ; 8-bit value used to represent the head sampling probability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention of limiting values to 6-bits only was that on the wire the combined constant could be represented with just 3 hexes, not 4. Was this discussed and decided against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood the original 6-bit requirement was because of the desire to use a 64-bit int, but due to some restrictions of some languages (e.g. java) it is easier to ignore the sign bit hence 63 bits of data. Then a state for 0 was needed because there is no x
such that 2^-x = 0
leaving 62 bits. From there, only 6 bits is needed to represent the number of leading 0s in a 62-bit binary number and 62 was the highest required exponent because 2^-62
was the minimum sampling rate.
AFAIK the 6-bit + 6-bit 3-hex-character representation was just a happy coincidence of this earlier decision.
In any case, the decision to use 4 or 3 characters doesn't matter that much to me. 4 is slightly easier to parse (parse 2 hex bytes vs some bit shifting needed) and much easier for a human to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on both counts.
The use of 4 bytes is simpler to explain and implement, and 3 is definitely acceptable.
One reason to go with 4 instead of 3 is that it makes #467 orthogonal. The random value is only needed when the randomness flag is not set, according to that proposal (@bogdandrutu). Combining this proposal with #467, the p
value is only needed when sampled and the r
value is built in--we arrive at a nice outcome:
- With randomness flag set: 2 bytes extra when sampled, 0 bytes extra when not sampled
- Without randomness flag set: 4 bytes extra when sampled, 2 bytes extra when not sampled
Having now specified how OpenTelemetry could use Keeping in mind the original purpose of this work--which is to count spans and logs that happen in traced contexts and translate them into estimated metrics about how many associated spans and logs are happening in all contexts--we're better off with on-by-default implementations of probability sampling. The combination of these proposals leads to easier integration for span-to-metrics pipelines at a cost of 2 bytes per sampled context and 0 bytes per unsampled context. |
This draft proposal is an early proposal at solving #463
It does not directly compete with, but is overlapping with, #467
Preview | Diff