This repository has been archived by the owner on Jan 7, 2022. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Don't start worker threads until Processor initialization is complete
Summary: When creating various components in Processor::init() (watchdog thread, health monitor, failure detector, etc), at first create all of them in a dormant state, then start all of them. In particular, pause all worker threads until later in the startup sequence. This prevents race conditions when the parts initialized early may try to access parts of the Processor that are not initialized yet. In particular, fixes this race when assigning Processor::failure_detector_ in a unit test (it's a test-only code path, but afaict the same thing can happen in server): ================== WARNING: ThreadSanitizer: data race (pid=886149) Write of size 8 at 0x7b88001dfe10 by main thread: #0 _ZSt4swapIPN8facebook9logdevice15FailureDetectorEENSt9enable_ifIXsr6__and_ISt6__not_ISt15__is_tuple_likeIT_EESt21is_move_constructibleIS7_ESt18is_move_assignableIS7_EEE5valueEvE4typeERS7_SG_ at ./third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/move.h:199 #1 std::unique_ptr<facebook::logdevice::FailureDetector, std::default_delete<facebook::logdevice::FailureDetector> >::reset(facebook::logdevice::FailureDetector*) at ./third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:374 #2 _ZNSt10unique_ptrIN8facebook9logdevice15FailureDetectorESt14default_deleteIS2_EEaSINS1_19MockFailureDetectorES3_IS7_EEENSt9enable_ifIXsr6__and_ISt6__and_IJSt14is_convertibleINS_IT_T0_E7pointerEPS2_ESt6__not_ISt8is_arrayISC_EESt5__or_IJSA_IJSt12is_referenceIS4_ESt7is_sameIS4_SD_EEESA_IJSI_ISO_ESB_ISD_S4_EEEEEEESt13is_assignableIRS4_OSD_EEE5valueERS5_E4typeEOSE_ at ./third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:303 #3 (anonymous namespace)::make_processor_with_detector(short, unsigned long, facebook::logdevice::GossipSettings const&) at ./logdevice/server/test/FailureDetectorTest.cpp:185 #4 (anonymous namespace)::create_processors_and_detectors(unsigned long, facebook::logdevice::GossipSettings const&) at ./logdevice/server/test/FailureDetectorTest.cpp:266 #5 (anonymous namespace)::simulate(unsigned long, facebook::logdevice::GossipSettings const&) at ./logdevice/server/test/FailureDetectorTest.cpp:296 #6 FailureDetectorTest_RandomGossip_Test::TestBody() at ./logdevice/server/test/FailureDetectorTest.cpp:313 #7 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) at /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2417 #8 main at ./common/gtest/LightMain.cpp:19 #9 ?? ??:0 Previous read of size 8 at 0x7b88001dfe10 by thread T79: #0 std::__uniq_ptr_impl<facebook::logdevice::FailureDetector, std::default_delete<facebook::logdevice::FailureDetector> >::_M_ptr() const at ./third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:147 #1 std::unique_ptr<facebook::logdevice::FailureDetector, std::default_delete<facebook::logdevice::FailureDetector> >::get() const at ./third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:337 #2 facebook::logdevice::ServerWorker::onServerConfigUpdated() at ./logdevice/server/ServerWorker.cpp:190 #3 facebook::logdevice::Worker::onNodesConfigurationUpdated() at ./logdevice/common/Worker.cpp:393 #4 facebook::logdevice::Worker::initializeSubscriptions() at ./logdevice/common/Worker.cpp:541 #5 facebook::logdevice::Worker::setupWorker() at ./logdevice/common/Worker.cpp:582 #6 facebook::logdevice::ServerWorker::setupWorker() at ./logdevice/server/ServerWorker.cpp:201 #7 facebook::logdevice::ServerProcessor::createWorker(folly::Executor::KeepAlive<folly::Executor>, facebook::logdevice::worker_id_t, facebook::logdevice::WorkerType)::$_0::operator()() const at ./logdevice/server/ServerProcessor.cpp:26 #8 void folly::detail::function::FunctionTraits<void ()>::callSmall<facebook::logdevice::ServerProcessor::createWorker(folly::Executor::KeepAlive<folly::Executor>, facebook::logdevice::worker_id_t, facebook::logdevice::WorkerType)::$_0>(folly::detail::function::Data&) at ./folly/Function.h:376 #9 folly::detail::function::FunctionTraits<void ()>::operator()() at ./folly/Function.h:392 #10 facebook::logdevice::Worker::addWithPriority(folly::Function<void ()>, signed char)::$_9::operator()() at ./logdevice/common/Worker.cpp:1413 #11 void folly::detail::function::FunctionTraits<void ()>::callBig<facebook::logdevice::Worker::addWithPriority(folly::Function<void ()>, signed char)::$_9>(folly::detail::function::Data&) at ./folly/Function.h:383 #12 folly::detail::function::FunctionTraits<void ()>::operator()() at ./folly/Function.h:392 #13 facebook::logdevice::EventLoopTaskQueue::executeTasks(unsigned int) at ./logdevice/common/EventLoopTaskQueue.cpp:154 #14 facebook::logdevice::EventLoopTaskQueue::haveTasksEventHandler()::$_1::operator()(unsigned int) const at ./logdevice/common/EventLoopTaskQueue.cpp:101 #15 void facebook::logdevice::LifoEventSemImpl<std::atomic>::AsyncWaiter::processBatch<facebook::logdevice::EventLoopTaskQueue::haveTasksEventHandler()::$_1&>(facebook::logdevice::EventLoopTaskQueue::haveTasksEventHandler()::$_1&, unsigned int) at ./logdevice/common/LifoEventSem.h:368 #16 facebook::logdevice::EventLoopTaskQueue::haveTasksEventHandler() at ./logdevice/common/EventLoopTaskQueue.cpp:106 #17 facebook::logdevice::EventLoopTaskQueue::EventLoopTaskQueue(facebook::logdevice::EvBase&, unsigned long, std::array<unsigned int, 3ul> const&)::$_0::operator()() const at ./logdevice/common/EventLoopTaskQueue.cpp:36 #18 void folly::detail::function::FunctionTraits<void ()>::callSmall<facebook::logdevice::EventLoopTaskQueue::EventLoopTaskQueue(facebook::logdevice::EvBase&, unsigned long, std::array<unsigned int, 3ul> const&)::$_0>(folly::detail::function::Data&) at ./folly/Function.h:376 #19 folly::detail::function::FunctionTraits<void ()>::operator()() at ./folly/Function.h:392 #20 facebook::logdevice::EventLegacy::evCallback(int, short, void*) at ./logdevice/common/libevent/EventLegacy.cpp:41 #21 event_persist_closure at ./logdevice/external/libevent-2.1.3-alpha/event.c:1452 #22 event_process_active_single_queue at ./logdevice/external/libevent-2.1.3-alpha/event.c:1508 #23 event_process_active at ./logdevice/external/libevent-2.1.3-alpha/event.c:1596 #24 ld_event_base_loop at ./logdevice/external/libevent-2.1.3-alpha/event.c:1819 #25 facebook::logdevice::EvBaseLegacy::loop() at ./logdevice/common/libevent/EvBaseLegacy.cpp:58 #26 facebook::logdevice::EvBase::loop() at ./logdevice/common/libevent/LibEventCompatibility.cpp:52 W1101 14:22:40.569391 907396 [logdevice:WG0] Worker.cpp:1319] processRequest() Request queued for 433 msec: NODE_STATE_UPDATED (id: 4294979886), p :LO_PRI #27 facebook::logdevice::EventLoop::run() at ./logdevice/common/EventLoop.cpp:140 W1101 14:22:40.584447 907396 [logdevice:WG0] Worker.cpp:1319] processRequest() Request queued for 446 msec: NODE_STATE_UPDATED (id: 4294979887), p :LO_PRI #28 facebook::logdevice::EventLoop::EventLoop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, facebook::logdevice::ThreadID::Type, unsigned long, bool, std::array<unsigned int, 3ul> const&, facebook::logdevice::EvBase::EvBaseType)::$_0::operator()() const at ./logdevice/common/EventLoop.cpp:81 #29 std::_Function_handler<void (), facebook::logdevice::EventLoop::EventLoop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, facebook::logdevice::ThreadID::Type, unsigned long, bool, std::array<unsigned int, 3ul> const&, facebook::logdevice::EvBase::EvBaseType)::$_0>::_M_invoke(std::_Any_data const&) at ./third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/std_function.h:316 W1101 14:22:40.603377 907396 [logdevice:WG0] Worker.cpp:1319] processRequest() Request queued for 468 msec: NODE_STATE_UPDATED (id: 4294979888), p :LO_PRI #30 std::function<void ()>::operator()() const at ./third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/std_function.h:706 W1101 14:22:40.621593 907396 [logdevice:WG0] Worker.cpp:1319] processRequest() Request queued for 491 msec: NODE_STATE_UPDATED (id: 4294979892), p :LO_PRI #31 facebook::logdevice::thread_func(void*) at ./logdevice/common/PThread.cpp:18 W1101 14:22:40.645430 907396 [logdevice:WG0] Worker.cpp:1319] processRequest() Request queued for 505 msec: NODE_STATE_UPDATED (id: 4294979893), p :LO_PRI #32 __tsan_thread_start_func at tsan.c:? Reviewed By: mcrnic Differential Revision: D18286463 fbshipit-source-id: 1ea365415f1e9dc8e907eabcbdd8a8249652a16e
- Loading branch information