Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lsan race with thread initialization and fast_unwind_on_malloc=0 #1836

Open
peff opened this issue Jan 2, 2025 · 1 comment
Open

lsan race with thread initialization and fast_unwind_on_malloc=0 #1836

peff opened this issue Jan 2, 2025 · 1 comment

Comments

@peff
Copy link

peff commented Jan 2, 2025

In the Git project we've seen some false positives running our test suite with LSan. They seem to be caused by racing between a thread being created and another one exiting the process (and thus triggering a leak check). Here's a simplified reproduction program:

#include <stdlib.h>
#include <pthread.h>

static void *run(void *data)
{
        return NULL;
}

int main(void)
{
        pthread_t threads[256];

        for (int i = 0; i < sizeof(threads)/sizeof(*threads); i++)
                pthread_create(&threads[i], NULL, run, NULL);
        exit(0);
}

If I compile with LSan (this is using gcc 14.2.0, but I get the same result with clang 19.1.6):

gcc -fsanitize=leak foo.c

and then run it a few times (the race usually triggers within a handful of runs), with fast_unwind_on_malloc=0:

while LSAN_OPTIONS=fast_unwind_on_malloc=0 ./a.out
do
  : nothing
done

then I get this leak reported:

=================================================================
==3861584==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7f52d6c14556 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
    #1 0x7f52d6a9d2c1 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
    #2 0x7f52d6c2500d in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
    #3 0x7f52d6c25187 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
    #4 0x7f52d6c17d18 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
    #5 0x7f52d6c143a9 in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
    #6 0x7f52d6a9bf51 in start_thread nptl/pthread_create.c:447
    #7 0x7f52d6b1a677 in __clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).

It looks like setting up the stack requires pthread_getattr_np, which allocates. Presumably the allocation is properly cleaned up, but if the original thread calls exit at the wrong time, we never get a chance to do so. And since we haven't yet set up the thread's stack, it's not considered reachable.

Curiously, without fast_unwind_on_malloc=0, I haven't been able to trigger the problem. I don't know if that's because it is just less likely to trigger the race somehow, or if it uses a different code path for setting up the thread stack.

Allocating in pthread_getattr_np is going to be system dependent. In this case it's a Debian Linux/glibc system.

@peff
Copy link
Author

peff commented Jan 2, 2025

You can avoid the race if you use a barrier to stop all threads until they are all finished initializing, like:

#include <stdlib.h>
#include <pthread.h>

static pthread_barrier_t barrier;

static void *run(void *data)
{
        pthread_barrier_wait(&barrier);
        return NULL;
}

int main(void)
{
        pthread_t threads[256];

        pthread_barrier_init(&barrier, NULL, sizeof(threads)/sizeof(*threads)+1);
        for (int i = 0; i < sizeof(threads)/sizeof(*threads); i++)
                pthread_create(&threads[i], NULL, run, NULL);
        pthread_barrier_wait(&barrier);
        exit(0);
}

That's effectively stopping the world during initialization so that nobody can call exit. But it's not a very flexible solution.

My guess (as a complete layman who is not at all familiar with the sanitizer code) is that the pthread_create interceptor could probably take a lock/mutex while setting up the thread, and likewise the atexit sanitizer run would wait on the same lock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant