Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry Implementation #1431

Closed
wants to merge 5 commits into from
Closed

Conversation

avahahn
Copy link
Contributor

@avahahn avahahn commented Sep 16, 2024

This PR contains an OpenTelemetry module that is compiled during the configure phase and statically linked during compilation of Unit. The OpenTelemetry module is a Rust crate that encapsulates the logic needed to generate and send spans and traces, as well as managing batching and background threads.

During operation, Unit will initialize the OpenTelemetry tracer when it is configured by the user in the router thread. If configured, spans will be generated and stored in batches during HTTP request decoding and processing. deconfiguration and reconfiguration will reset and reinitialize the OpenTelemetry tracer.

The following changes are made to the code:
* extend unit configuration API with validators for otel config
* add method and path fields to opentelemetry span
* spans export in configurable batches on background thread
* configure feature for OpenTelemetry support
* feature gated request fields for traceparent and tracestate headers
* feature gated triggers to OpenTelemetry state machine
* feature gated logic to initialize OpenTelemetry state structure
* incoming header/field parsing for traceparent and tracestate
* Rust crate holding all the tracer logic.
* update openapi spec
* regenerate unitctl openapi package

Things to do before merge:

  • test gRPC configuration
  • test HTTP request error path
  • decide if we want to generate a span for response as part of this changeset or as an additional improvement later down the road.

Copy link
Member

@ac000 ac000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 1 OpenTelemetry Proof of Concept

* configure feature for OpenTelemetry support
* feature gated request fields for traceparent and tracestate headers
* feature gated triggers to OpenTelemetry state machine
* feature gated logic to intialize OpenTelemetry state structure
* temporary call to initialize OpenTelemetry tracer
* temporary logging callback
* OpenTelemetry state machine to manage calls into rust crate
* incoming header/field parsing for traceparent and tracestate
* Rust crate holding all the tracer logic.

Hmm, you may have noticed that we generally use English prose to describe changes to Unit, not just a list of bullet points, of which this tells me there is way too much happening in this commit.

So at a basic level this will need splitting up. You can take a look at the cgroup work

$ git log 9466daf9bdafa3e00f521a47f4ce218353bf7f86^..f67a01b88fd7c7057767e18a3dd06c24e94c8aa8

The work to switch from clone(2) to fork(2) + unshare(2) on Linux

$ git log 763396b8be07be41b1baf336952fd222cbeb8db7^..96f33c0395538256ab192ea7b26203a2061c4941

The introduction of the wasm language module

$ git log 46573e6993552e0ecbcbcd6f4e191b4b2e3f5dc9^..47ff51009fa05d83bb67cd5db16829ab4c0081d7

The introduction of the wasm-wasi-component language module

$ git log f2e6447567eef6eeafa833adae0ef155568ec20f^..4e6d7e87685c30a62a654fc91e271d34dd642202

for some examples of how this can be done...

Co-authored-by: Gabor Javorszky <[email protected]>
Signed-off-by: Ava Hahn <[email protected]>

s/Co-authored-by/Co-developed-by/

Also this should be immediately followed by a s-o-b from the same person.

auto/otel Outdated Show resolved Hide resolved
auto/otel Outdated Show resolved Hide resolved
auto/otel Outdated Show resolved Hide resolved
src/nxt_h1proto.c Show resolved Hide resolved
src/nxt_http_request.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
Copy link
Contributor

@javorszky javorszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of files which do not have any change in them besides a rearrange / alignment. Those files should not form part of this PR.

tools/unitctl/unit-openapi/.openapi-generator/FILES Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/cmd/applications.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/cmd/edit.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/cmd/execute.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/cmd/listeners.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/cmd/mod.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/cmd/save.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/cmd/status.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/main.rs Outdated Show resolved Hide resolved
tools/unitctl/unitctl/src/wait.rs Outdated Show resolved Hide resolved
@ac000
Copy link
Member

ac000 commented Sep 17, 2024

There are a bunch of files which do not have any change in them besides a rearrange / alignment. Those files should not form part of this PR.

Well, they can (and probably should) be part of the PR, as I assume they form preparatory work, but should be done as separate commits first.

@javorszky
Copy link
Contributor

There are a bunch of files which do not have any change in them besides a rearrange / alignment. Those files should not form part of this PR.

Well, they can (and probably should) be part of the PR, as I assume they form preparatory work, but should be done as separate commits first.

They are just alignment changes, things like

- something
-   .call()
+ something.call()

or

- something(StructName{property});
+ something( StructName { property } );

I don't see how these are prep works 🙁 , though I might be missing something.

@ac000
Copy link
Member

ac000 commented Sep 17, 2024

I don't see how these are prep works 🙁 , though I might be missing something.

Well, if they're not being subsequently touched then leave them alone full stop...

Copy link
Contributor

@hongzhidao hongzhidao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!
I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

src/nxt_conf_validation.c Outdated Show resolved Hide resolved
src/nxt_conf_validation.c Outdated Show resolved Hide resolved
src/nxt_conf_validation.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.h Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
src/nxt_otel.c Outdated Show resolved Hide resolved
@ac000
Copy link
Member

ac000 commented Sep 17, 2024

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

I would like to see some document describing the overall architecture of this... especially as a lot of it is effectively hidden away in a black box...

@javorszky
Copy link
Contributor

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns.

The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

@hongzhidao
Copy link
Contributor

hongzhidao commented Sep 17, 2024

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns.

The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

The sender which calls nxt_otel_send_trace() synchronously runs in the threads of the router process,
where does the receiver which receives span and handles it asynchronously run?

@ac000
Copy link
Member

ac000 commented Sep 17, 2024

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns.

The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

To be clear, the other side of this channel is tokio?

I think what @hongzhidao is getting at is whether this will cause Unit to block.

IIRC the plan was to have a thread pool rust side that would deal with the actual sending of the telemetry data and thus allowing Unit to get back to handling client requests ASAP.

@javorszky
Copy link
Contributor

Great job! I wonder if nxt_otel_send_trace() is asynchronous or synchronous?

that function call is synchronous. It grabs the span at the pointer, and sends it to the channel and returns.
The span will then be handled asynchronously on the other end of the channel, but NGINX no longer needs to know about it.

The sender which calls nxt_otel_send_trace() synchronously runs in the threads of the router process, where does the receiver which receives span and handles it asynchronously run?

when the nxt_otel_init() is called, we spawn a new thread from the rust code with the receiver passed to it, and in that thread the function called runtime will loop over messages coming from the channel. As long as the channel is open, and no shutdown message is received, that function loops in a separate thread.

@hongzhidao
Copy link
Contributor

hongzhidao commented Sep 17, 2024

I think what @hongzhidao is getting at is whether this will cause Unit to block.

Correct.

@javorszky
Copy link
Contributor

I think what @hongzhidao is getting at is whether this will cause Unit to block.

Correct.

No, it will not block.

@hongzhidao
Copy link
Contributor

I think what @hongzhidao is getting at is whether this will cause Unit to block.

Correct.

No, it will not block.

That would be great.
Btw, you could test it with some performance tool likewrk.

src/nxt_otel.h Outdated Show resolved Hide resolved
src/nxt_otel.h Outdated Show resolved Hide resolved
src/nxt_otel.h Outdated Show resolved Hide resolved
@ac000
Copy link
Member

ac000 commented Sep 17, 2024

@avahahn

I think it would be a big help if you were to squash these two commits together, then see how it should be split up.

E.g.

  • Add core OTEL stuff (C & Rust)
  • Add build infrastructure
  • Add tracepoints
  • Wire it up to the config system

So we end up with a logical set of commits that progress the incorporation of this feature. Each commit should make sense and be reviewable on its own.

That would significantly improve the review process.

@avahahn
Copy link
Contributor Author

avahahn commented Sep 17, 2024

Well, they can (and probably should) be part of the PR, as I assume they form preparatory work, but should be done as separate commits first.

I agree, if we just get it linted and merged we wont have to deal with it again.

@avahahn
Copy link
Contributor Author

avahahn commented Sep 17, 2024

I would like to see some document describing the overall architecture of this... especially as a lot of it is effectively hidden away in a black box...

In this case it is simple: the Span object has a custom destructor (implementation of the Drop trait) defined in the library we are using that calls the linked tracer object to send off the span before it is deallocated.

src/nxt_otel.c Outdated Show resolved Hide resolved
avahahn and others added 3 commits September 17, 2024 14:58
* configure feature for OpenTelemetry support
* feature gated request fields for traceparent and tracestate headers
* feature gated triggers to OpenTelemetry state machine
* feature gated logic to intialize OpenTelemetry state structure
* temporary call to initialize OpenTelemetry tracer
* temporary logging callback
* OpenTelemetry state machine to manage calls into rust crate
* incoming header/field parsing for traceparent and tracestate
* Rust crate holding all the tracer logic.

Co-authored-by: Gabor Javorszky <[email protected]>
Signed-off-by: Ava Hahn <[email protected]>
* migrate opentelemetry tracer lifetime into router process
* extend unit configuration API with validators for otel config
* update openapi spec
* regenerate unitctl openapi package
* add method and path fields to opentelemetry span
* spans export in configurable batches on background thread

Co-authored-by: Gabor Javorszky <[email protected]>
Signed-off-by: Ava Hahn <[email protected]>
@ac000
Copy link
Member

ac000 commented Sep 19, 2024

All the tools/unitctl stuff needs to go into their own commit(s)...

Yep, that's what I was going for with this comment: #1431 (review)

Every single change in the unitctl package is just code alignment, and I get that they may be prep work (#1431 (comment)), they shouldn't form part here 🙂

This would imply it's more than just that...

* update openapi spec
* regenerate unitctl openapi package

Which seems to include some name changes...

@ac000
Copy link
Member

ac000 commented Sep 19, 2024

Let me quote from our CONTRIBUTING.md

- Create atomic commits.  A commit should do just one thing, i.e. you
  shouldn't mix refactoring with functional code changes.  Do the
  refactoring in one or more commits first.

  Ideally you should rebase locally and force push new commits up.

This could be be expanded out to something like

Each commit should do one thing, and one thing only. If you find        
yourself, in the commit message, adding phrases like "Also do [...]" or 
"While in here [...]", then that's a sign that the change should have   
been split into multiple commits. If your change includes some          
refactoring of code to make your change possible, then that refactoring 
should be a separate commit, done first. That means this preparatory     
commit won't have any functional changes, and hence should be a no-op.   
It also means that your main commit, with the change that you actually  
care about, will be smaller and easier to review.                       
                                                                        
Each commit must stand on its own in terms of what it provides, and how 
it works. Lots of changes are just a single commit, but for something a 
bit more involved, it's not uncommon to have a pull request contain     
multiple commits. Make each commit as simple as possible, and not any   
simpler. We'd much rather see 10 simple commits than 2 more complicated 
ones. If you stumble across something that needs fixing while making an 
unrelated change, then please make that change as a separate commit,    
explaining why it's being made.                                         
                                                                        
Each commit in a series must be buildable, it's not enough that the end 
result is buildable. See reason 3 in the introduction for why that's the
case.                                                                   
                                                                        
No fixup commits! Sometimes people post a change and errors are pointed 
out in the commit, and the author then does a followup fix for that     
error. This isn't acceptable, please squash fixup commits into the      
commit that introduced the problem in the first place. This is done by  
amending the fix into the original commit that caused the issue. You can
do that with git rebase -i and arrange the commit order such that the   
fixup is right after the original commit, and then use 's' (for squash) 
to squash the fixup into the original commit. Don't forget to edit the  
commit message while doing that, as git will combine the two commit     
messages into one. Or you can do it manually. Once done, force push your
rewritten git history. See reasons 1-3 in the introduction series for   
why that is. 

Now I didn't just make that up... That's from the liburing CONTRIBUTING.md.

Just saying...

src/otel/src/lib.rs Outdated Show resolved Hide resolved
@ac000
Copy link
Member

ac000 commented Sep 19, 2024

@avahahn

Can I kindly ask you to present these patches in a coherent manner.

As it currently stands it would be easier to review it if was all one big patch, which is how I'm currently reviewing it (git diff master..)

If we don't follow our own advice how can we expect others to?

@ac000
Copy link
Member

ac000 commented Sep 19, 2024

Oopsie, looks like an errant change...

diff --git ./auto/ssltls ./auto/ssltls
index 6512d330..d8b79558 100644
--- ./auto/ssltls
+++ ./auto/ssltls
@@ -212,4 +212,4 @@ if [ $NXT_POLARSSL = YES ]; then
         $echo
         exit 1;
     fi
-fi
+fi
\ No newline at end of file

@ac000
Copy link
Member

ac000 commented Sep 19, 2024

Another newline issue...

diff --git ./auto/otel ./auto/otel
new file mode 100644
index 00000000..11387a1b
--- /dev/null
+++ ./auto/otel
@@ -0,0 +1,25 @@
+
+# Copyright (C) NGINX, Inc.
+
+
+nxt_found=no
+NXT_HAVE_OTEL=NO
+
+NXT_OTEL_LIB_LOC=src/otel/target/debug/libotel.a
+
+cat << END >> $NXT_AUTO_CONFIG_H
+#ifndef NXT_HAVE_OTEL
+#define NXT_HAVE_OTEL 1
+#endif
+END
+
+NXT_LIB_AUX_LIBS="$NXT_LIB_AUX_LIBS $(pkgconf openssl --cflags --libs || "-lssl -lcrypto") $NXT_OTEL_LIB_LOC"
+
+cat << END >> $NXT_MAKEFILE
+
+$NXT_OTEL_LIB_LOC:
+    pushd $NXT_BUILD_DIR/src/otel/
+    cargo build
+    cd ../../
+
+END
\ No newline at end of file

What editor are you using? I've also noticed some erratic indenting in your patches...

The .editorconfig does cover these files

[{auto/**,*.toml}]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true

But maybe your editor doesn't use it...

src/nxt_otel.h Outdated
Comment on lines 48 to 49
typedef struct {
u_char *trace_id, *version, *parent_id, *trace_flags;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please declare these on separate lines. Really don't want this spreading to structures...

@avahahn
Copy link
Contributor Author

avahahn commented Sep 19, 2024

Can I kindly ask you to present these patches in a coherent manner.

As it currently stands it would be easier to review it if was all one big patch, which is how I'm currently reviewing it (git diff master..)

If we don't follow our own advice how can we expect others to?

I am working on that after I take care of low hanging fruit like nits and style preferences.

Signed-off-by: Ava Hahn <[email protected]>
@avahahn
Copy link
Contributor Author

avahahn commented Sep 19, 2024

What editor are you using? I've also noticed some erratic indenting in your patches...

Emacs primarily

Copy link
Contributor

@hongzhidao hongzhidao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all,
Please help me understand what otel does in http?
It adds spans such as "headers", "body", and then collect/pass them to the thread running Rust, right?
If yes, could we just do it when the request is finalized, It should happen in the same place as logging. Is it good enough to do them in function like nxt_http_trace(r)?
What's the difference between the two ways?

@avahahn
Copy link
Contributor Author

avahahn commented Sep 20, 2024

Please help me understand what otel does in http?
The otel state is allocated when a new request is made.

First thing is that when the traceparent header is received it is parsed to get the span ID and parent ID. Spans may come from many other services and all end up in one unified hierarchy in the collector service.

Next the span is initialized. We make one span per request. It is either linked to the parent declared in the Traceparent header or the span id and parent may be fully autogenerated.

Once all header fields are done parsing they are added to the span. At that point it is guaranteed that any traceparent header has been parsed and that the span is fully initialized.

After the body has been processed the size is added to the span.

Finally as soon as we are sure there is no more data to parse the span is finished and sent to the background thread that sends all spans in batches to the tracer.

What's the difference between the two ways?

What would the benefit of that be over the current state machine? It seems like a slower design that would be more error prone. The current state machine for otel uses pre-existing routines in Unit like the header field parsing and generates the span as soon as it can be in small amounts.

Either way might work but I don't understand why you would prefer it all to be done at once at the end of request processing. I would worry that the request data doesnt live long enough to be copied during the request closing, and I would worry that concentrating all the work at the end would add latency to request closing.

@ac000
Copy link
Member

ac000 commented Sep 20, 2024

@avahahn

What's the currently expected way to build this?

$ ./configure ... --otel
$ make
...
  AR     build/lib/libnxt.a
  CC     build/src/nxt_main.o
  LD     build/sbin/unitd
/usr/bin/ld: cannot find src/otel/target/debug/libotel.a: No such file or directory
collect2: error: ld returned 1 exit status
make: *** [build/Makefile:2044: build/sbin/unitd] Error 1
$

There is

#ifndef NXT_HAVE_OTEL
#define NXT_HAVE_OTEL 1

in build/include/nxt_auto_config.h

There is seemingly no

src/otel/target/debug/libotel.a:
    pushd $NXT_BUILD_DIR/src/otel/
    cargo build
    cd ../../

target in build/Makefile

@hongzhidao
Copy link
Contributor

Please help me understand what otel does in http?
The otel state is allocated when a new request is made.

First thing is that when the traceparent header is received it is parsed to get the span ID and parent ID. Spans may come from many other services and all end up in one unified hierarchy in the collector service.

Next the span is initialized. We make one span per request. It is either linked to the parent declared in the Traceparent header or the span id and parent may be fully autogenerated.

Once all header fields are done parsing they are added to the span. At that point it is guaranteed that any traceparent header has been parsed and that the span is fully initialized.

After the body has been processed the size is added to the span.

Finally as soon as we are sure there is no more data to parse the span is finished and sent to the background thread that sends all spans in batches to the tracer.

What's the difference between the two ways?

What would the benefit of that be over the current state machine? It seems like a slower design that would be more error prone. The current state machine for otel uses pre-existing routines in Unit like the header field parsing and generates the span as soon as it can be in small amounts.

Either way might work but I don't understand why you would prefer it all to be done at once at the end of request processing. I would worry that the request data doesnt live long enough to be copied during the request closing, and I would worry that concentrating all the work at the end would add latency to request closing.

Is it right that we add serval spans but only passed them to Rust engine once?
If yes, what's the good to add spans in different places but not add them when they are prepared to send?

@avahahn
Copy link
Contributor Author

avahahn commented Sep 20, 2024

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

@hongzhidao
Copy link
Contributor

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

The two functions are defined in rust part?

nxt_otel_add_event_to_trace
nxt_otel_send_trace

Will both of them be communicating with rust?

@ac000
Copy link
Member

ac000 commented Sep 20, 2024

Trying to manually build the libotel part

   Compiling otel v0.1.0 (/home/andrew/src/unit/src/otel)
error[E0308]: mismatched types
   --> src/lib.rs:199:35
    |
199 |     assert_eq!(traceparent.len(), TRACEPARENT_HEADER_LEN);
    |                                   ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
    |
help: you can convert a `u8` to a `usize`
    |
199 |     assert_eq!(traceparent.len(), TRACEPARENT_HEADER_LEN.into());
    |                                                         +++++++

error[E0308]: mismatched types
   --> src/lib.rs:204:9
    |
201 |     ptr::copy_nonoverlapping(
    |     ------------------------ arguments to this function are incorrect
...
204 |         TRACEPARENT_HEADER_LEN,
    |         ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
    |
note: function defined here
   --> /builddir/build/BUILD/rustc-1.81.0-src/library/core/src/intrinsics.rs:3036:21
help: you can convert a `u8` to a `usize`
    |
204 |         TRACEPARENT_HEADER_LEN.into(),
    |                               +++++++

error[E0308]: mismatched types
   --> src/lib.rs:207:14
    |
207 |     *buf.add(TRACEPARENT_HEADER_LEN) = b'\0' as _;
    |          --- ^^^^^^^^^^^^^^^^^^^^^^ expected `usize`, found `u8`
    |          |
    |          arguments to this method are incorrect
    |
note: method defined here
   --> /builddir/build/BUILD/rustc-1.81.0-src/library/core/src/ptr/mut_ptr.rs:1039:25
help: you can convert a `u8` to a `usize`
    |
207 |     *buf.add(TRACEPARENT_HEADER_LEN.into()) = b'\0' as _;
    |                                    +++++++

For more information about this error, try `rustc --explain E0308`.
error: could not compile `otel` (lib) due to 3 previous errors
$ cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)
$ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04) (Fedora 1.81.0-1.fc40

@javorszky
Copy link
Contributor

javorszky commented Sep 20, 2024

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

The two functions are defined in rust part?

nxt_otel_add_event_to_trace
nxt_otel_send_trace

Will both of them be communicating with rust?

nxt_otel_add_event_to_trace adds events to a span we started with nxt_otel_get_or_create_trace

nxt_otel_send_trace is used when we've finished decorating the span with information as we had access to them, and at that point the span from Unit is deemed "complete", and can be sent off to the collector.

Here "event" refers to additional properties of the request: its http verb, the headers used, the path, the body size (if there's any), etc... Event should probably be called Attribute (there's a set_attribute method on the span), but the function of the add event fn is to collect the properties of the single request we have a span for.

@hongzhidao
Copy link
Contributor

Is it right that we add serval spans but only passed them to Rust engine once?

we make one span, and add several events to it.

The two functions are defined in rust part?

nxt_otel_add_event_to_trace
nxt_otel_send_trace

Will both of them be communicating with rust?

nxt_otel_add_event_to_trace adds events to a span we started with nxt_otel_get_or_create_trace

nxt_otel_send_trace is used when we've finished decorating the span with information as we had access to them, and at that point the span from Unit is deemed "complete", and can be sent off to the collector.

Here "event" refers to additional properties of the request: its http verb, the headers used, the path, the body size (if there's any), etc... Event should probably be called Attribute (there's a set_attribute method on the span), but the function of the add event fn is to collect the properties of the single request we have a span for.

Thanks for the explanation.

can be sent off to the collector.

What does collector mean here? Does it mean the thread running rust or the otel server outside of Unit?

@javorszky
Copy link
Contributor

javorszky commented Sep 20, 2024

What does collector mean here? Does it mean the thread running rust or the otel server outside of Unit?

There are 4 parts to most OpenTelemetry functionality:

  1. the instrumented code: this is the bit that lives inside your functions and adds the attributes / events to spans, creates the spans, measures things, etc
  2. the exporter: it's part of the SDK that you add to the code. The finished spans get collected here, batched, and once they reach a certain number, the exporter will open a connection to someplace to send the data. This is the part that exfils the tracing / metrics / logs / whatever you configured out of your application
  3. the collector: this is what receives the data from the exporter in most cases. It usually lives as a sidecar or daemon to your application, but can be in a completely different pod / container / server. In most cases it means the thing called "OpenTelemetry Collector", see https://opentelemetry.io/docs/collector/. Its job is to receive all the data, do some transformation on it optionally, and then forward it to someplace that we can use to make sense of the raw data
  4. the dashboard / ui / aggregator. This will receive the transformed / filtered data from the collector, and will draw charts and dashboards for us to look at. Grafana is an example.

This code implementation is mainly part 1, and merely configuring part 2.

Parts 3 and 4 are vendor supplied pieces that we hook up. This is what I used for local testing: https://github.com/grafana/docker-otel-lgtm

@ac000
Copy link
Member

ac000 commented Sep 20, 2024

After "fixing" the compile errors

$ cargo build
   Compiling otel v0.1.0 (/home/andrew/src/unit/src/otel)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.69s
$ make
  AR     build/lib/libnxt.a
  LD     build/sbin/unitd
$

Hmm...

$ ls -oh /opt/unit/sbin/unitd 
-rwxr-xr-x 1 andrew 2.4M Sep 18 14:08 /opt/unit/sbin/unitd
$ ls -oh build/sbin/unitd 
-rwxr-xr-x 1 andrew 92M Sep 20 16:54 build/sbin/unitd

Crikey!

$ strip build/sbin/unitd
$ ls -oh build/sbin/unitd 
-rwxr-xr-x 1 andrew 21M Sep 20 16:56 build/sbin/unitd

Rust eh?1

... and /opt/unit/sbin/unitd wasn't even stripped...

$ strip /opt/unit/sbin/unitd
$ ls -oh /opt/unit/sbin/unitd 
-rwxr-xr-x 1 andrew 580K Sep 20 17:01 /opt/unit/sbin/unitd

Now I know libotel is statically linked. But even the libc .a is only 6.6MB

(Just observing...)

@ac000
Copy link
Member

ac000 commented Sep 20, 2024

How do you configure this in Unit?

I'd expect to see an example in the commit message...

@ac000
Copy link
Member

ac000 commented Sep 20, 2024

Parts 3 and 4 are vendor supplied pieces that we hook up. This is what I used for local testing: https://github.com/grafana/docker-otel-lgtm

Ugh. My plan is to get it running then observe it under wireshark...

@avahahn
Copy link
Contributor Author

avahahn commented Sep 20, 2024

Closing for now, as some large work has to be done in order to address some of these comments. I am excited to put it back up once everything has been taken care of!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants