BatchProcessor shutdown issue #2478

cijothomas · 2024-12-27T00:31:51Z

BatchProcessor relies on a channel to pass messages (which includes Log/Span, and also control signals like Shutdown/Flush). If the channel is full, then Shutdown() signals will get lost. For example, the buffer got 500 items, and the thread is doing blocking export() call. During this time, 2000 items got into the channel. Any new items are dropped, which is by design. But if a shutdown is initiated during this time, that message is lost as well. When the thread is back to processing the queue, it'll drain the normal telemetry items, but shutdown signal is lost forever. Even when shutdown message is not lost, the message gets to the last position in the queue, so it'll be processed after all other export messages are processed (potential after multiple exports()).

May need to use a separate channel for control flow like shutdown/flush to mitigate the issue. This may not be easy to fully eliminate due to blocking call being made, and anything occurring during this time is hard to respond to.

PeriodicReader has a similar problem, but less severe as the Shutdown message will not be processed, but reacted to immediately after the blocking export is done.

lalitb · 2024-12-27T05:32:26Z

Good point, I observed this while doing unit-tests for dedicated thread span processor. The separate control channel for shutdown/flush looks to be good idea - reading control channel first. However, there may now be some log events in data-queue arrived before shutdown message. We need to somehow ensure to also drain those message for export, apart from one existing in internal vector.

utpilla · 2025-01-03T02:54:45Z

There is another issue with the current implementation of BatchLogProcessor. The background thread is woken up for every log record that gets emitted. This is not ideal. We want the background thread to have low utilization so that it doesn't aggressively compete with the user's application threads. We only want to wake up the background thread on these conditions:

The number of log records in the channel has reached max_export_batch_size.
The scheduled_delay interval has elapsed.
force_flush is called.
shut_down is called.

I have created #2494 to show one possible approach to handle both the shutdown issue and the background thread's utilization issue.

ThomsonTan · 2025-01-06T10:44:08Z

For missing shutdown/forceflush, probably we can just use a CondVar instead of a separate channel? As it should be fine to aggregate multiple flushes into one.

Ideally, we could replace the channel here by our own circular buffer like C++/C# to avoid the potential excessive synchronization.

ThomsonTan · 2025-01-06T10:55:02Z

There could also be another issue about BatchProcessor. As it spawns its own thread and the other threads are not blocked before it is ready, the shutdown or log will fail before that. I think we can add an extra synchronization when creating the background thread to notify the user thread that the BatchProcessor is ready.

lalitb · 2025-01-06T16:43:48Z

As it spawns its own thread and the other threads are not blocked before it is ready, the shutdown or log will fail before that.

I don’t think this would really be an issue. In the current design, the background thread for the BatchLogProcessor is ready to handle requests as soon as the LoggerProvider (or OpenTelemetryTracingBridge) is fully initialized. It’s important for the application to make sure that shutdown or force_flush are only called after everything is properly set up.

If those methods are called too early, they’d essentially run on Noop objects, which should be expected behavior.

cijothomas · 2025-01-06T19:11:11Z

As it spawns its own thread and the other threads are not blocked before it is ready, the shutdown or log will fail before that.

I don’t think this would really be an issue. In the current design, the background thread for the BatchLogProcessor is ready to handle requests as soon as the LoggerProvider (or OpenTelemetryTracingBridge) is fully initialized. It’s important for the application to make sure that shutdown or force_flush are only called after everything is properly set up.

If those methods are called too early, they’d essentially run on Noop objects, which should be expected behavior.

It’s important for the application to make sure that shutdown or force_flush are only called after everything is properly set up.

Is that something users have to worry about? They can call shutdown only on an instance of TracerProvider, and they won't have TracerProvider unless the processors (including Batch) are already setup, which would not be setup without the required threads being spun up.

lalitb · 2025-01-07T06:05:44Z

Is that something users have to worry about? They can call shutdown only on an instance of TracerProvider, and they won't have TracerProvider unless the processors (including Batch) are already setup, which would not be setup without the required threads being spun up.

Yes, was thinking about global::shutdown_tracer_provide() invoked in another thread while main thread is initializing TracerProvider. We don't have that scenario anymore :)

cijothomas added this to the Logging SDK Stable milestone Dec 27, 2024

utpilla mentioned this issue Jan 3, 2025

Refactor BatchLogProcessor #2494

Open

4 tasks

utpilla mentioned this issue Jan 6, 2025

[Metrics][POC] Use Weak instead of Arc in Counter to respect MeterProvider shutdown #2497

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchProcessor shutdown issue #2478

BatchProcessor shutdown issue #2478

cijothomas commented Dec 27, 2024 •

edited

Loading

lalitb commented Dec 27, 2024

utpilla commented Jan 3, 2025 •

edited

Loading

ThomsonTan commented Jan 6, 2025 •

edited

Loading

ThomsonTan commented Jan 6, 2025

lalitb commented Jan 6, 2025 •

edited

Loading

cijothomas commented Jan 6, 2025

lalitb commented Jan 7, 2025 •

edited

Loading

BatchProcessor shutdown issue #2478

BatchProcessor shutdown issue #2478

Comments

cijothomas commented Dec 27, 2024 • edited Loading

lalitb commented Dec 27, 2024

utpilla commented Jan 3, 2025 • edited Loading

ThomsonTan commented Jan 6, 2025 • edited Loading

ThomsonTan commented Jan 6, 2025

lalitb commented Jan 6, 2025 • edited Loading

cijothomas commented Jan 6, 2025

lalitb commented Jan 7, 2025 • edited Loading

cijothomas commented Dec 27, 2024 •

edited

Loading

utpilla commented Jan 3, 2025 •

edited

Loading

ThomsonTan commented Jan 6, 2025 •

edited

Loading

lalitb commented Jan 6, 2025 •

edited

Loading

lalitb commented Jan 7, 2025 •

edited

Loading