Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Fix lifetime of GlobalLogHandler #3221

Merged

Conversation

owent
Copy link
Member

@owent owent commented Dec 24, 2024

Fixes #3215

Changes

  • Keep the lifetime of GlobalLogHandler when calling OTEL_INTERNAL_LOG_*.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

Copy link

netlify bot commented Dec 24, 2024

Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name Link
🔨 Latest commit f7d1993
🔍 Latest deploy log https://app.netlify.com/sites/opentelemetry-cpp-api-docs/deploys/6787d65006545800080651a0

Copy link

codecov bot commented Dec 24, 2024

Codecov Report

Attention: Patch coverage is 86.95652% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87.71%. Comparing base (62b7197) to head (f7d1993).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sdk/src/common/global_log_handler.cc 86.96% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3221      +/-   ##
==========================================
- Coverage   87.73%   87.71%   -0.02%     
==========================================
  Files         198      198              
  Lines        6258     6271      +13     
==========================================
+ Hits         5490     5500      +10     
- Misses        768      771       +3     
Files with missing lines Coverage Δ
...lude/opentelemetry/sdk/common/global_log_handler.h 46.16% <ø> (-18.84%) ⬇️
sdk/src/common/global_log_handler.cc 77.28% <86.96%> (+6.44%) ⬆️

{
return GetHandlerAndLevel().first;
if OPENTELEMETRY_UNLIKELY_CONDITION (GetHandlerAndLevel().destroyed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I interpret the standard correctly, this may be an undefined behavior.

[basic.life] says that "[t]he lifetime of an object o of type T ends when: <...> if T is a class type, the destructor call starts", and "after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. <...> The program has undefined behavior if: <...> the pointer is used to access a non-static data member or call a non-static member function of the object".

Copy link
Member Author

@owent owent Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a trick that the memory address of a static local variable will always be available in practice. So we can visit it even if it's destructed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a UB, the compiler can optimize away the destroyed = true; statement in the destructor. Clang does not, but gcc does:

#include <iostream>

class test {
public:
    bool destroyed;

    test() : destroyed(false)
    {}

    void hello()
    {
        std::cout << "Hello\n";
    }

    ~test()
    {
        destroyed = true;
    }
};

int main()
{
    test t;
    t.hello();
    t.~test();

    return (int)t.destroyed;
}
main:
.LFB2064:
        .cfi_startproc
        endbr64
        sub     rsp, 8
        .cfi_def_cfa_offset 16
        mov     edx, 6
        lea     rsi, .LC0[rip]
        lea     rdi, _ZSt4cout[rip]
        call    _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@PLT
        xor     eax, eax
        add     rsp, 8
        .cfi_def_cfa_offset 8
        ret

In other words, the result is not guaranteed if you rely upon undefined behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I checked our singlehon codes before, it use a static variable but not the member variable to check if the singlehon object is destroyed, and it's not effected by optimization of both gcc and clang. I will modify the codes and try to add a unit test later.

Comment on lines 97 to 111
struct GlobalLogHandlerData
{
nostd::shared_ptr<LogHandler> handler;
LogLevel log_level;
bool destroyed;

GlobalLogHandlerData();
~GlobalLogHandlerData();

GlobalLogHandlerData(const GlobalLogHandlerData &) = delete;
GlobalLogHandlerData(GlobalLogHandlerData &&) = delete;

GlobalLogHandlerData &operator=(const GlobalLogHandlerData &) = delete;
GlobalLogHandlerData &operator=(GlobalLogHandlerData &&) = delete;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having one structure with all the state is much better than having this from the existing code:

std::pair<nostd::shared_ptr<LogHandler>, LogLevel>

I would be even more aggressive with the cleanup, and:

  • move the declaration of struct GlobalLogHandlerData to the cc file, so it is never seen in the header
  • remove GetHandlerAndLevel() entirely
  • move all the implementation, like code for GetLogLevel() and SetLogLevel(), from the header to the cc file, leaving only declarations in the header (without inline implementation)

And then, as @sjinks suggested, change the bool destroyed to a plain static in the cc file, instead of a static member. Given how this code can be executed by multiple threads, we may need an atomic boolean as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of the issue of random order of static initialization and destruction, there is also the issue of multiple threads making calls to the global handler, and races:

  • a thread still tries to export data and fails, writing an error to the log
  • another thread change the log handler to add/remove a custom handler on startup or cleanup

This is unsafe at the moment, so we may need a mutex to protect all operations on GlobalLogHandlerData::handle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last, instead of returning a shared pointer with the handler in each call to GetLogHandler(), which imply a counter increment and decrement, we could have a GlobalLogHandler::Handle() method, to be invoked by OTEL_INTERNAL_LOG_DISPATCH, to actually write to the log.

This method would lock the mutex, to prevent races with SetLogHandler().

Currently, I am not convinced the shared pointer returned by GetLogHandler() is safe, when SetLogHandler is called concurrently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having one structure with all the state is much better than having this from the existing code:

std::pair<nostd::shared_ptr<LogHandler>, LogLevel>

I would be even more aggressive with the cleanup, and:

  • move the declaration of struct GlobalLogHandlerData to the cc file, so it is never seen in the header
  • remove GetHandlerAndLevel() entirely
  • move all the implementation, like code for GetLogLevel() and SetLogLevel(), from the header to the cc file, leaving only declarations in the header (without inline implementation)

And then, as @sjinks suggested, change the bool destroyed to a plain static in the cc file, instead of a static member. Given how this code can be executed by multiple threads, we may need an atomic boolean as well.

Done

On top of the issue of random order of static initialization and destruction, there is also the issue of multiple threads making calls to the global handler, and races:

This checking is used when the whole program is exiting and some singleton objects or global objects call OTEL_INTERNAL_LOG_* when destroying. I think the behaviour should be UB and this checking is just used to prevent crash.If user want to write logs when deinitializing, they should do it before program exiting.

Changing value of a shared_ptr is not thread-safe, but increase and descrease shared_ptr's counter is thread-safe. Return a shared pointer will only increase the counter. This PR do not solve the thread-safe problem here, but we can discuss, should we use and pay for a lock to keep thread-safe here?I think SetLogHandler will not be call frequently.

Copy link
Member

@marcalff marcalff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix.

@marcalff marcalff changed the title Try to fix lifetime of GlobalLogHandler [SDK] Fix lifetime of GlobalLogHandler Jan 15, 2025
@marcalff marcalff merged commit 95baed7 into open-telemetry:main Jan 15, 2025
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage pointer dereference when using OTLP GRPC exporter with inacessible endpoint
3 participants