-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry Implementation #1431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phase 1 OpenTelemetry Proof of Concept
* configure feature for OpenTelemetry support
* feature gated request fields for traceparent and tracestate headers
* feature gated triggers to OpenTelemetry state machine
* feature gated logic to intialize OpenTelemetry state structure
* temporary call to initialize OpenTelemetry tracer
* temporary logging callback
* OpenTelemetry state machine to manage calls into rust crate
* incoming header/field parsing for traceparent and tracestate
* Rust crate holding all the tracer logic.
Hmm, you may have noticed that we generally use English prose to describe changes to Unit, not just a list of bullet points, of which this tells me there is way too much happening in this commit.
So at a basic level this will need splitting up. You can take a look at the cgroup work
$ git log 9466daf9bdafa3e00f521a47f4ce218353bf7f86^..f67a01b88fd7c7057767e18a3dd06c24e94c8aa8
The work to switch from clone(2) to fork(2) + unshare(2) on Linux
$ git log 763396b8be07be41b1baf336952fd222cbeb8db7^..96f33c0395538256ab192ea7b26203a2061c4941
The introduction of the wasm language module
$ git log 46573e6993552e0ecbcbcd6f4e191b4b2e3f5dc9^..47ff51009fa05d83bb67cd5db16829ab4c0081d7
The introduction of the wasm-wasi-component language module
$ git log f2e6447567eef6eeafa833adae0ef155568ec20f^..4e6d7e87685c30a62a654fc91e271d34dd642202
for some examples of how this can be done...
Co-authored-by: Gabor Javorszky <[email protected]>
Signed-off-by: Ava Hahn <[email protected]>
s/Co-authored-by/Co-developed-by/
Also this should be immediately followed by a s-o-b
from the same person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of files which do not have any change in them besides a rearrange / alignment. Those files should not form part of this PR.
Well, they can (and probably should) be part of the PR, as I assume they form preparatory work, but should be done as separate commits first. |
They are just alignment changes, things like - something
- .call()
+ something.call() or - something(StructName{property});
+ something( StructName { property } ); I don't see how these are prep works 🙁 , though I might be missing something. |
Well, if they're not being subsequently touched then leave them alone full stop... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
I wonder if nxt_otel_send_trace()
is asynchronous or synchronous?
I would like to see some document describing the overall architecture of this... especially as a lot of it is effectively hidden away in a black box... |
that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns. The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it. |
The sender which calls |
To be clear, the other side of this channel is tokio? I think what @hongzhidao is getting at is whether this will cause Unit to block. IIRC the plan was to have a thread pool rust side that would deal with the actual sending of the telemetry data and thus allowing Unit to get back to handling client requests ASAP. |
when the nxt_otel_init() is called, we spawn a new thread from the rust code with the receiver passed to it, and in that thread the function called |
Correct. |
No, it will not block. |
That would be great. |
I think it would be a big help if you were to squash these two commits together, then see how it should be split up. E.g.
So we end up with a logical set of commits that progress the incorporation of this feature. Each commit should make sense and be reviewable on its own. That would significantly improve the review process. |
I agree, if we just get it linted and merged we wont have to deal with it again. |
In this case it is simple: the Span object has a custom destructor (implementation of the Drop trait) defined in the library we are using that calls the linked tracer object to send off the span before it is deallocated. |
* configure feature for OpenTelemetry support * feature gated request fields for traceparent and tracestate headers * feature gated triggers to OpenTelemetry state machine * feature gated logic to intialize OpenTelemetry state structure * temporary call to initialize OpenTelemetry tracer * temporary logging callback * OpenTelemetry state machine to manage calls into rust crate * incoming header/field parsing for traceparent and tracestate * Rust crate holding all the tracer logic. Co-authored-by: Gabor Javorszky <[email protected]> Signed-off-by: Ava Hahn <[email protected]>
* migrate opentelemetry tracer lifetime into router process * extend unit configuration API with validators for otel config * update openapi spec * regenerate unitctl openapi package * add method and path fields to opentelemetry span * spans export in configurable batches on background thread Co-authored-by: Gabor Javorszky <[email protected]> Signed-off-by: Ava Hahn <[email protected]>
Signed-off-by: Ava Hahn <[email protected]>
This would imply it's more than just that...
Which seems to include some name changes... |
Let me quote from our CONTRIBUTING.md
This could be be expanded out to something like
Now I didn't just make that up... That's from the liburing CONTRIBUTING.md. Just saying... |
Signed-off-by: Ava Hahn <[email protected]>
Can I kindly ask you to present these patches in a coherent manner. As it currently stands it would be easier to review it if was all one big patch, which is how I'm currently reviewing it (git diff master..) If we don't follow our own advice how can we expect others to? |
Oopsie, looks like an errant change... diff --git ./auto/ssltls ./auto/ssltls
index 6512d330..d8b79558 100644
--- ./auto/ssltls
+++ ./auto/ssltls
@@ -212,4 +212,4 @@ if [ $NXT_POLARSSL = YES ]; then
$echo
exit 1;
fi
-fi
+fi
\ No newline at end of file |
Another newline issue... diff --git ./auto/otel ./auto/otel
new file mode 100644
index 00000000..11387a1b
--- /dev/null
+++ ./auto/otel
@@ -0,0 +1,25 @@
+
+# Copyright (C) NGINX, Inc.
+
+
+nxt_found=no
+NXT_HAVE_OTEL=NO
+
+NXT_OTEL_LIB_LOC=src/otel/target/debug/libotel.a
+
+cat << END >> $NXT_AUTO_CONFIG_H
+#ifndef NXT_HAVE_OTEL
+#define NXT_HAVE_OTEL 1
+#endif
+END
+
+NXT_LIB_AUX_LIBS="$NXT_LIB_AUX_LIBS $(pkgconf openssl --cflags --libs || "-lssl -lcrypto") $NXT_OTEL_LIB_LOC"
+
+cat << END >> $NXT_MAKEFILE
+
+$NXT_OTEL_LIB_LOC:
+ pushd $NXT_BUILD_DIR/src/otel/
+ cargo build
+ cd ../../
+
+END
\ No newline at end of file What editor are you using? I've also noticed some erratic indenting in your patches... The
But maybe your editor doesn't use it... |
src/nxt_otel.h
Outdated
typedef struct { | ||
u_char *trace_id, *version, *parent_id, *trace_flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please declare these on separate lines. Really don't want this spreading to structures...
I am working on that after I take care of low hanging fruit like nits and style preferences. |
Signed-off-by: Ava Hahn <[email protected]>
Emacs primarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi all,
Please help me understand what otel does in http?
It adds spans such as "headers", "body", and then collect/pass them to the thread running Rust, right?
If yes, could we just do it when the request is finalized, It should happen in the same place as logging. Is it good enough to do them in function like nxt_http_trace(r)
?
What's the difference between the two ways?
First thing is that when the traceparent header is received it is parsed to get the span ID and parent ID. Spans may come from many other services and all end up in one unified hierarchy in the collector service. Next the span is initialized. We make one span per request. It is either linked to the parent declared in the Traceparent header or the span id and parent may be fully autogenerated. Once all header fields are done parsing they are added to the span. At that point it is guaranteed that any traceparent header has been parsed and that the span is fully initialized. After the body has been processed the size is added to the span. Finally as soon as we are sure there is no more data to parse the span is finished and sent to the background thread that sends all spans in batches to the tracer.
What would the benefit of that be over the current state machine? It seems like a slower design that would be more error prone. The current state machine for otel uses pre-existing routines in Unit like the header field parsing and generates the span as soon as it can be in small amounts. Either way might work but I don't understand why you would prefer it all to be done at once at the end of request processing. I would worry that the request data doesnt live long enough to be copied during the request closing, and I would worry that concentrating all the work at the end would add latency to request closing. |
What's the currently expected way to build this?
There is #ifndef NXT_HAVE_OTEL
#define NXT_HAVE_OTEL 1
in There is seemingly no src/otel/target/debug/libotel.a:
pushd $NXT_BUILD_DIR/src/otel/
cargo build
cd ../../ target in |
Is it right that we add serval spans but only passed them to Rust engine once? |
we make one span, and add several events to it. |
The two functions are defined in rust part?
Will both of them be communicating with rust? |
Trying to manually build the libotel part Compiling otel v0.1.0 (/home/andrew/src/unit/src/otel)
error[E0308]: mismatched types
--> src/lib.rs:199:35
|
199 | assert_eq!(traceparent.len(), TRACEPARENT_HEADER_LEN);
| ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
|
help: you can convert a `u8` to a `usize`
|
199 | assert_eq!(traceparent.len(), TRACEPARENT_HEADER_LEN.into());
| +++++++
error[E0308]: mismatched types
--> src/lib.rs:204:9
|
201 | ptr::copy_nonoverlapping(
| ------------------------ arguments to this function are incorrect
...
204 | TRACEPARENT_HEADER_LEN,
| ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
|
note: function defined here
--> /builddir/build/BUILD/rustc-1.81.0-src/library/core/src/intrinsics.rs:3036:21
help: you can convert a `u8` to a `usize`
|
204 | TRACEPARENT_HEADER_LEN.into(),
| +++++++
error[E0308]: mismatched types
--> src/lib.rs:207:14
|
207 | *buf.add(TRACEPARENT_HEADER_LEN) = b'\0' as _;
| --- ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
| |
| arguments to this method are incorrect
|
note: method defined here
--> /builddir/build/BUILD/rustc-1.81.0-src/library/core/src/ptr/mut_ptr.rs:1039:25
help: you can convert a `u8` to a `usize`
|
207 | *buf.add(TRACEPARENT_HEADER_LEN.into()) = b'\0' as _;
| +++++++
For more information about this error, try `rustc --explain E0308`.
error: could not compile `otel` (lib) due to 3 previous errors $ cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)
$ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04) (Fedora 1.81.0-1.fc40 |
Here "event" refers to additional properties of the request: its http verb, the headers used, the path, the body size (if there's any), etc... Event should probably be called Attribute (there's a set_attribute method on the span), but the function of the add event fn is to collect the properties of the single request we have a span for. |
Thanks for the explanation.
What does collector mean here? Does it mean the thread running rust or the otel server outside of Unit? |
There are 4 parts to most OpenTelemetry functionality:
This code implementation is mainly part 1, and merely configuring part 2. Parts 3 and 4 are vendor supplied pieces that we hook up. This is what I used for local testing: https://github.com/grafana/docker-otel-lgtm |
After "fixing" the compile errors
Hmm...
Crikey!
Rust eh?1 ... and
Now I know libotel is statically linked. But even the libc .a is only 6.6MB (Just observing...) |
How do you configure this in Unit? I'd expect to see an example in the commit message... |
Ugh. My plan is to get it running then observe it under wireshark... |
Closing for now, as some large work has to be done in order to address some of these comments. I am excited to put it back up once everything has been taken care of! |
This PR contains an OpenTelemetry module that is compiled during the configure phase and statically linked during compilation of Unit. The OpenTelemetry module is a Rust crate that encapsulates the logic needed to generate and send spans and traces, as well as managing batching and background threads.
During operation, Unit will initialize the OpenTelemetry tracer when it is configured by the user in the router thread. If configured, spans will be generated and stored in batches during HTTP request decoding and processing. deconfiguration and reconfiguration will reset and reinitialize the OpenTelemetry tracer.
The following changes are made to the code:
* extend unit configuration API with validators for otel config
* add method and path fields to opentelemetry span
* spans export in configurable batches on background thread
* configure feature for OpenTelemetry support
* feature gated request fields for traceparent and tracestate headers
* feature gated triggers to OpenTelemetry state machine
* feature gated logic to initialize OpenTelemetry state structure
* incoming header/field parsing for traceparent and tracestate
* Rust crate holding all the tracer logic.
* update openapi spec
* regenerate unitctl openapi package
Things to do before merge: