-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
otel: Add OpenTelemetry functionality to NGINX Unit #1463
Conversation
bb5da68
to
dac737c
Compare
Opentelemetry performance test results!The tests use wrk, were ran on a 12c/24t system with 64gb ram. wrk invocation as follows: With otel change, tracing enabled (sample rate 1.0, batch size 20, HTTP Transport)Latency:
Requests per second:
With otel change, tracing NOT enabled:Latency:
Requests per second:
Built without OpenTelemetry support:Latency:
Requests per second:
I conclude that the enclosed change does not measurably impact Unit's performance when not configured. The differences in numbers between the 2nd and 3rd dataset is far smaller than either of their deviance values, and I find the data is always similar across test runs. I also conclude that in a very high throughput deployment opentelemetry tracing will have a small impact. This test represents a high throughput deployment with the sample ratio cranked up unreasonably high. Best practices in any kind of situation in which unit will be receiving this kind of traffic are to reduce the sampling ratio to minimize the load of opentelemetry span processing and generation. Even with the sampling ratio cranked up to maximum (100%) opentelemetry processing for every single request (over 30k requests per second) merely adds half a millisecond on average. For more information you can reproduce my tests with these scripts: |
Just wanted to offer a note for configuration. The settings object has been extended with a new telemetry field. This field contains currently 4 items:
Example configuration: curl -X PUT 127.0.0.1:8080/config -d '{
"settings": {
"telemetry": {
"batch_size": 20,
"endpoint": "http://lgtm:4318/v1/traces",
"protocol": "http",
"sampling_ratio": 1
}
},
"listeners": {
"*:80": {
"pass": "routes"
}
},
"routes": [
{
"match": {
"headers": {
"accept": "*text/html*"
}
},
"action": {
"share": "/usr/share/unit/welcome/welcome.html"
}
},
{
"action": {
"share": "/usr/share/unit/welcome/welcome.md"
}
}
]
}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the commits with Ava's Co-authored-by
, the commit tags should look like
Signed-off-by: Gabor Javorszky <[email protected]>
Co-developed-by: Ava Hahn <[email protected]>
Signed-off-by: Ava Hahn <[email protected]>
I.e. A c-[ad]-b tag should be immediately followed by a s-o-b from the same person.
Note: that the person that last handled the patch should be the last s-o-b, so there's a logical progression of people who where involved on its way into the repo, hence the slight re-ordering.
Also note we use Co-developed-by
.
For .editorconfig: fix bracket balance of editorconfig file
and docs/openapi: update OpenAPI references
the s-o-b's should be swapped around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit
otel: add build tooling to include otel code
I think the following would be better in ./configure
diff --git ./auto/make ./auto/make
index f21a2dfc..e7d29ba3 100644
--- ./auto/make
+++ ./auto/make
@@ -7,6 +7,11 @@
$echo "creating $NXT_MAKEFILE"
+if [ $NXT_OTEL = "NO" ]; then
+ NXT_OTEL_LIB_LOC=
+ NXT_OTEL_BUILD_FLAG=
+ NXT_OTEL_LIB_DIR=
+fi
cat << END > $NXT_MAKEFILE
@@ -138,14 +143,14 @@ cat << END >> $NXT_MAKEFILE
libnxt: $NXT_BUILD_DIR/lib/$NXT_LIB_SHARED $NXT_BUILD_DIR/lib/$NXT_LIB_STATIC
-$NXT_BUILD_DIR/lib/$NXT_LIB_SHARED: \$(NXT_LIB_OBJS)
+$NXT_BUILD_DIR/lib/$NXT_LIB_SHARED: \$(NXT_LIB_OBJS) $NXT_OTEL_LIB_LOC
\$(PP_LD) \$@
\$(v)\$(NXT_SHARED_LOCAL_LINK) -o \$@ \$(NXT_LIB_OBJS) \\
- $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS
+ $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS $NXT_OTEL_LIB_LOC
-$NXT_BUILD_DIR/lib/$NXT_LIB_STATIC: \$(NXT_LIB_OBJS)
+$NXT_BUILD_DIR/lib/$NXT_LIB_STATIC: \$(NXT_LIB_OBJS) $NXT_OTEL_LIB_LOC
\$(PP_AR) \$@
- \$(v)$NXT_STATIC_LINK \$@ \$(NXT_LIB_OBJS)
+ \$(v)$NXT_STATIC_LINK \$@ \$(NXT_LIB_OBJS) $NXT_OTEL_LIB_LOC
$NXT_BUILD_DIR/lib/$NXT_LIB_UNIT_STATIC: \$(NXT_LIB_UNIT_OBJS) \\
$NXT_BUILD_DIR/share/pkgconfig/unit.pc \\
@@ -359,11 +364,11 @@ $echo >> $NXT_MAKEFILE
cat << END >> $NXT_MAKEFILE
$NXT_BUILD_DIR/sbin/$NXT_DAEMON: $NXT_BUILD_DIR/lib/$NXT_LIB_STATIC \\
- \$(NXT_OBJS)
+ \$(NXT_OBJS) $NXT_OTEL_LIB_LOC
\$(PP_LD) \$@
\$(v)\$(NXT_EXEC_LINK) -o \$@ \$(CFLAGS) \\
\$(NXT_OBJS) $NXT_BUILD_DIR/lib/$NXT_LIB_STATIC \\
- $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS
+ $NXT_LIBM $NXT_LIBS $NXT_LIB_AUX_LIBS $NXT_OTEL_LIB_LOC
END
Where you only need to adjust the NXT_LIB_AUX_CFLAGS
and NXT_LIB_AUX_LIBS
variables.
E.g. in my http compression patches I do
diff --git ./configure ./configure
index 6929d41d..f33134b7 100755
--- ./configure
+++ ./configure
@@ -127,6 +127,7 @@ NXT_LIBRT=
. auto/unix
. auto/os/conf
. auto/ssltls
+. auto/compression
if [ $NXT_REGEX = YES ]; then
. auto/pcre
@@ -169,11 +170,13 @@ END
NXT_LIB_AUX_CFLAGS="$NXT_OPENSSL_CFLAGS $NXT_GNUTLS_CFLAGS \\
$NXT_CYASSL_CFLAGS $NXT_POLARSSL_CFLAGS \\
- $NXT_PCRE_CFLAGS"
+ $NXT_PCRE_CFLAGS $NXT_ZLIB_CFLAGS $NXT_ZSTD_CFLAGS \\
+ $NXT_BROTLI_CFLAGS"
NXT_LIB_AUX_LIBS="$NXT_OPENSSL_LIBS $NXT_GNUTLS_LIBS \\
$NXT_CYASSL_LIBS $NXT_POLARSSL_LIBS \\
- $NXT_PCRE_LIB"
+ $NXT_PCRE_LIB $NXT_ZLIB_LIBS $NXT_ZSTD_LIBS \\
+ $NXT_BROTLI_LIBS"
if [ $NXT_NJS != NO ]; then
. auto/njs
For
I would make the prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otel: configuration items and their validation
Adds code responsible for users to apply the `telemetry` configuration
options.
I would like to see example Unit configuration here...
dac737c
to
6ebfe70
Compare
6ebfe70
to
c2d6ce7
Compare
This comment was marked as duplicate.
This comment was marked as duplicate.
c2d6ce7
to
ee3b440
Compare
21fd6be
to
02f7630
Compare
02f7630
to
9a497ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{
"settings": {
"telemetry": {
"batch_size": 20,
"endpoint": "http://lgtm:4318/v1/traces",
At some point this will require a DNS lookup.
I assume this happens in the OTEL library in its own thread away from the actual router threads?
Would this lookup happen once at startup for for every datum or batch of data sent?
On a related note is there a pool of connections to the endpoint created or is a new connection established for each bit of data sent?
59ad421
to
ddea9a8
Compare
I did wonder what might happen here...
Yes, so the thread-local storage variable has been initialised by the compiler, but we've never called
Hmm, not ideal... but stderr does get directed to the unit log... The fact eprintln!() panics if it can't write to stderr, is slightly concerning... But I don't see a good alternative... if we are going to be puttting more rust code in Unit, then we'll probably need to come up with a rust version of nxt_log() at some point. |
I wanted to double check this, so opened this PR to my own branch with bogus changes for the sole purpose of figuring out whether GitHub also parses the Whereas the commit with only the We could use both tags, but not sure what we'd gain with it. There's a way to customize what trailers we can use and how, but we don't have control over how GitHub specifically works. |
Are you guys not seeing the rust warnings? This patch is needed to get rid of these warnings
diff --git ./src/otel/src/lib.rs ./src/otel/src/lib.rs
index 7dbed005..dbf8e6f4 100644
--- ./src/otel/src/lib.rs
+++ ./src/otel/src/lib.rs
@@ -1,3 +1,7 @@
+#![allow(
+ non_camel_case_types,
+)]
+
use opentelemetry::global::BoxedSpan;
use opentelemetry::trace::{
Span, SpanBuilder, SpanKind, TraceId, Tracer, TracerProvider,
@@ -15,15 +19,10 @@ use std::sync::{Arc, OnceLock};
use tokio::sync::mpsc;
use tokio::sync::mpsc::{Receiver, Sender};
-#[allow(
- non_camel_case_types,
-)]
-
const TRACEPARENT_HEADER_LEN: u8 = 55;
const TIMEOUT: time::Duration = std::time::Duration::from_secs(10);
-const NXT_LOG_ERR: u32 = 1;
-const NXT_LOG_INFO: u32 = 4;
+const NXT_LOG_ERR: u32 = 1;
#[repr(C)]
pub struct nxt_str_t { That still leaves
Which can be dealt with by (don't see any point leaving it in with an "_") diff --git ./src/otel/src/lib.rs ./src/otel/src/lib.rs
index 7dbed005..68e5d2b0 100644
--- ./src/otel/src/lib.rs
+++ ./src/otel/src/lib.rs
@@ -118,7 +117,6 @@ unsafe fn nxt_otel_rs_init(
*/
Ok(_) => {
std::thread::spawn(move || nxt_otel_rs_runtime(
- log_callback,
ep,
proto,
batch_size,
@@ -141,7 +139,6 @@ unsafe fn nxt_otel_rs_init(
*/
#[tokio::main]
async unsafe fn nxt_otel_rs_runtime(
- log_callback: unsafe extern "C" fn(log_level: nxt_uint_t, msg: *const c_char),
endpoint: String,
proto: Protocol,
batch_size: f64, Which needless to say should be applied to commit 39284a2 ("otel: add opentelemetry rust crate code") |
ddea9a8
to
2ba5a0a
Compare
Thanks for catching these. I found a few more things. I applied the patches with some additions:
|
But that's exactly what the patch does!
Oh, right, good catch!
That's also what the patch does...
Hmm.. usize will be wrong for x86-64... What platform are you building on? I guess it isn't x86-64...
|
Apologies, I didn't want to take credit for something you did. The reason I reworked is because I assume when you refer to the patch, it's the changeset you added to the comment, ie this bit: Which looking at it does exactly what you said it does. I'm... unsure why I thought it did something different, there was a version where the same attribute was just moved a couple of lines further up without adding the |
I'm using a dockerfile that begins with The built image has a
My host system has a
Looking at the code, on x86_64 usize shouldn't be used: #[cfg(target_arch = "x86_64")]
pub type nxt_uint_t = ::std::os::raw::c_uint;
#[cfg(not(target_arch = "x86_64"))]
pub type nxt_uint_t = usize;
// ...
log_callback: unsafe extern "C" fn(log_level: nxt_uint_t, msg: *const c_char), This should work on both x86_64 and not x86_64 🤔 I see what you're saying. So basically this patch should also be applied: - const NXT_LOG_ERR: usize = 1;
+ const NXT_LOG_ERR: nxt_uint_t = 1; |
2ba5a0a
to
90f1972
Compare
Hah, I totally missed this (apologies, I'm having a weird day today. I applied that patch, nxt log err is now the nxt_uint_t type. |
Exactly! That'll be right regardless of architecture. The reason for doing this is to make it match the C side of things... |
OK, taking one last look through the code, just a couple of minor things diff --git ./src/nxt_conf_validation.c ./src/nxt_conf_validation.c
index b0f5c530..0dde39ff 100644
--- ./src/nxt_conf_validation.c
+++ ./src/nxt_conf_validation.c
@@ -1522,7 +1522,7 @@ nxt_otel_validate_batch_size(nxt_conf_validation_t *vldt,
batch_size = nxt_conf_get_number(value);
if (batch_size <= 0) {
- return NXT_ERROR;
+ return NXT_ERROR;
}
return NXT_OK; Should be applied to e0b640b ("otel: configuration items and their validation") diff --git ./src/nxt_http.h ./src/nxt_http.h
index 456a96ce..19bbdda3 100644
--- ./src/nxt_http.h
+++ ./src/nxt_http.h
@@ -10,6 +10,7 @@
#include <nxt_regex.h>
#include <nxt_otel.h>
+
typedef enum {
NXT_HTTP_UNSET = -1,
NXT_HTTP_INVALID = 0, Should be applied to 209c084 ("otel: add build tooling to include otel code") |
90f1972
to
c3f14ca
Compare
These have been applied to their respective commits. Thank you 🙂 |
Almost there, lets just get the commit tags right.. For a572a81 ("otel: add opentelemetry rust crate code") If you're adding my c-a-b, I need to sign-off on it (which I don't mind doing) as a c-a-b should always be immediately followed by a s-o-b. Seeing as Ava is the author, she doesn't need a c-a-b. Lastly, seeing as Gabor is the committer his should be the last s-o-b as he was the last one to handle the patch..., so that would make the commit tags look like
For b1a1009 ("otel: add build tooling to include otel code"), following the above rules
804fe14 ("otel: add header parsing and test call state")
3d2beb4 ("otel: configuration items and their validation")
eb9b0ff (".editorconfig: fix bracket balance of editorconfig file")
c3f14ca ("docs/openapi: update OpenAPI references")
|
For the first commit I'm equally happy if you just remove my c-a-b, I'm not sure it really warrants it... |
This is purely the source code of the rust end of opentelemetry. It does not have build tooling wired up yet, nor is this used from the C code. Signed-off-by: Ava Hahn <[email protected]> Signed-off-by: Gabor Javorszky <[email protected]>
Adds the --otel flag to the configure command and the various build time variables and checks that are needed in this flow. It also includes the nxt_otel.c and nxt_otel.h files that are needed for the rest of Unit to talk to the compiled static library that's generated from the rust crate. Signed-off-by: Ava Hahn <[email protected]> Co-authored-by: Gabor Javorszky <[email protected]> Signed-off-by: Gabor Javorszky <[email protected]>
Enables Unit to parse the tracestate and traceparent headers and add it to the list, as well as calls to nxt_otel_test_and_call_state. Signed-off-by: Ava Hahn <[email protected]> Signed-off-by: Gabor Javorszky <[email protected]>
Adds code responsible for users to apply the `telemetry` configuration options. configuration snippet as follows: { "settings": { "telemetry": { "batch_size": 20, "endpoint": "http://lgtm:4318/v1/traces", "protocol": "http", "sampling_ratio": 1 } }, "listeners": { "*:80": { "pass": "routes" } }, "routes": [ { "match": { "headers": { "accept": "*text/html*" } }, "action": { "share": "/usr/share/unit/welcome/welcome.html" } }, { "action": { "share": "/usr/share/unit/welcome/welcome.md" } } ] } Signed-off-by: Ava Hahn <[email protected]> Signed-off-by: Gabor Javorszky <[email protected]>
Tiny bracket balance fix. Signed-off-by: Ava Hahn <[email protected]> Signed-off-by: Gabor Javorszky <[email protected]>
These changes are generated by the openapi generator through a make command. Signed-off-by: Ava Hahn <[email protected]> Signed-off-by: Gabor Javorszky <[email protected]>
c3f14ca
to
25625b9
Compare
I shuffled the trailers around per your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think we can call this done and release it into the wild!
I know this took some time but thanks for sticking with it and hopefully you'll agree... it's better for it...
Oh well, all the s-o-b's are now the wrong way round, never mind, maybe next time just have one person doing the committing... |
Adds OpenTelemetry implementation via a Rust crate compiled to a static C library and linked into the existing Unit codebase with the necessary build and configuration steps.
--otel
build argument works from fb15df7 (add build tooling to include otel code) onwards.settings.telemetry
to Unit's config.json works from 290fffd (configuration items and their validation) onwardsSigned-off-by
trailers, and all of them are co-authored between @avahahn and myself.Closes #1283