Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test test_global_control hang on release and failed on debug #745

Open
phprus opened this issue Jan 21, 2022 · 13 comments · Fixed by #781
Open

Test test_global_control hang on release and failed on debug #745

phprus opened this issue Jan 21, 2022 · 13 comments · Fixed by #781
Labels

Comments

@phprus
Copy link
Contributor

phprus commented Jan 21, 2022

A new bug was discovered after applying PR #739.

Commit: 0ef8048

Release build with LTO:

phprus@mbp release % ps ax | grep test
31860 s011  S+     0:04.12 ./appleclang_13.0_cxx17_64_release/test_global_control
31876 s014  R+     0:00.00 grep test
phprus@mbp release % lldb --attach-pid 31860
(lldb) process attach --pid 31860
Process 31860 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00000001b079370c libsystem_kernel.dylib`__ulock_wait + 8
libsystem_kernel.dylib`__ulock_wait:
->  0x1b079370c <+8>:  b.lo   0x1b079372c               ; <+40>
    0x1b0793710 <+12>: pacibsp
    0x1b0793714 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0793718 <+20>: mov    x29, sp
Target 0: (test_global_control) stopped.

Executable module set to "/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/build/release/appleclang_13.0_cxx17_64_release/test_global_control".
Architecture set to: arm64e-apple-macosx-.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001b079370c libsystem_kernel.dylib`__ulock_wait + 8
    frame #1: 0x00000001b07cf574 libsystem_pthread.dylib`_pthread_join + 452
    frame #2: 0x00000001b072873c libc++.1.dylib`std::__1::thread::join() + 36
    frame #3: 0x000000010290632c test_global_control`DOCTEST_ANON_FUNC_51() + 108
    frame #4: 0x0000000102903694 test_global_control`main + 11016
    frame #5: 0x0000000102ced0f4 dyld`start + 520
(lldb) thread select 2
* thread #2
    frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
->  0x1b0794ebc <+8>:  b.lo   0x1b0794edc               ; <+40>
    0x1b0794ec0 <+12>: pacibsp
    0x1b0794ec4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0794ec8 <+20>: mov    x29, sp
(lldb) bt
* thread #2
  * frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
    frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
    frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
    frame #5: 0x0000000102a9f104 libtbb.12.6.dylib`tbb::detail::r1::task_dispatcher::execute_and_wait(tbb::detail::d1::task*, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&) + 1028
    frame #6: 0x000000010291915c test_global_control`tbb::detail::d1::task_arena_function<TestBlockingTerminateNS::ExceptionTest2::operator()()::'lambda'(), void>::operator()() const + 276
    frame #7: 0x0000000102a8f458 libtbb.12.6.dylib`tbb::detail::r1::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) + 312
    frame #8: 0x0000000102909344 test_global_control`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void utils::NativeParallelFor<int, DOCTEST_ANON_FUNC_51()::$_4>(int, DOCTEST_ANON_FUNC_51()::$_4 const&)::'lambda'()> >(void*) + 432
    frame #9: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 3
* thread #3
    frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
libsystem_kernel.dylib`semaphore_wait_trap:
->  0x1b0791990 <+8>: ret

libsystem_kernel.dylib`semaphore_wait_signal_trap:
    0x1b0791994 <+0>: mov    x16, #-0x25
    0x1b0791998 <+4>: svc    #0x80
    0x1b079199c <+8>: ret
(lldb) bt
* thread #3
  * frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000102a9c0fc libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 680
    frame #2: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 4
* thread #4
    frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
libsystem_kernel.dylib`semaphore_wait_trap:
->  0x1b0791990 <+8>: ret

libsystem_kernel.dylib`semaphore_wait_signal_trap:
    0x1b0791994 <+0>: mov    x16, #-0x25
    0x1b0791998 <+4>: svc    #0x80
    0x1b079199c <+8>: ret
(lldb) bt
* thread #4
  * frame #0: 0x00000001b0791990 libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000102a9c0fc libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 680
    frame #2: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 5
* thread #5
    frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
->  0x1b0794ebc <+8>:  b.lo   0x1b0794edc               ; <+40>
    0x1b0794ec0 <+12>: pacibsp
    0x1b0794ec4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0794ec8 <+20>: mov    x29, sp
(lldb) bt
* thread #5
  * frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
    frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
    frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
    frame #5: 0x0000000102a8d56c libtbb.12.6.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1444
    frame #6: 0x0000000102a99a9c libtbb.12.6.dylib`tbb::detail::r1::market::process(rml::job&) + 52
    frame #7: 0x0000000102a9bf60 libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 268
    frame #8: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb) thread select 6
* thread #6
    frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
libsystem_kernel.dylib`__semwait_signal:
->  0x1b0794ebc <+8>:  b.lo   0x1b0794edc               ; <+40>
    0x1b0794ec0 <+12>: pacibsp
    0x1b0794ec4 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1b0794ec8 <+20>: mov    x29, sp
(lldb) bt
* thread #6
  * frame #0: 0x00000001b0794ebc libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001b069fd88 libsystem_c.dylib`nanosleep + 216
    frame #2: 0x00000001b0728820 libc++.1.dylib`std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) + 84
    frame #3: 0x0000000102919c08 test_global_control`TestBlockingTerminateNS::ExceptionTest2::Body::operator()(int) const + 168
    frame #4: 0x0000000102919948 test_global_control`tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>, tbb::detail::d1::parallel_for_body_wrapper<TestBlockingTerminateNS::ExceptionTest2::Body, int>, tbb::detail::d1::auto_partitioner const>::execute(tbb::detail::d1::execution_data&) + 1204
    frame #5: 0x0000000102a8d56c libtbb.12.6.dylib`tbb::detail::r1::arena::process(tbb::detail::r1::thread_data&) + 1444
    frame #6: 0x0000000102a99a9c libtbb.12.6.dylib`tbb::detail::r1::market::process(rml::job&) + 52
    frame #7: 0x0000000102a9bf60 libtbb.12.6.dylib`tbb::detail::r1::rml::private_worker::thread_routine(void*) + 268
    frame #8: 0x00000001b07cd240 libsystem_pthread.dylib`_pthread_start + 148
(lldb)

Output:

[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options










^C===============================================================================
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:239:
TEST CASE:  prolong lifetime advanced

/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:239: FATAL ERROR: test case CRASHED: SIGINT - Terminal interrupt signal

===============================================================================
[doctest] test cases: 3 | 2 passed | 1 failed | 1 skipped
[doctest] assertions: 9 | 9 passed | 0 failed |
[doctest] Status: FAILURE!

Debug build:

[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
Assertion pred() failed (located in the enforce function, line in file: 173)
===============================================================================
/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:250:
TEST CASE:  prolong lifetime multiple wait

/Users/phprus/Devel/oneapi-src/tmp/pr739/oneTBB-0ef804884856e791fe7bb173078a92daf3929f76/test/tbb/test_global_control.cpp:250: FATAL ERROR: test case CRASHED: SIGABRT - Abort (abnormal termination) signal

===============================================================================
[doctest] test cases:  4 |  3 passed | 1 failed | 0 skipped
[doctest] assertions: 58 | 58 passed | 0 failed |
[doctest] Status: FAILURE!
zsh: abort      ./appleclang_13.0_cxx17_64_debug/test_global_control

Assert:
https://github.com/oneapi-src/oneTBB/blob/0ef804884856e791fe7bb173078a92daf3929f76/src/tbb/market.h#L171-L174

See:
#739 (comment)
#739 (comment)

@phprus
Copy link
Contributor Author

phprus commented Jan 27, 2022

Commit: cd6a5f9
The behavior doesn't change.

@alexey-katranov
Copy link
Contributor

The behavior doesn't change.

Yes, it seems another issue. The observed assert is about relation between market and arena while #739 fixes an issue in private_server (thread pool).

@phprus
Copy link
Contributor Author

phprus commented Feb 16, 2022

Any news?

@alexey-katranov
Copy link
Contributor

It seems the testing approach is broken on ARM, any test using utils::SpinBarrier might hang without real issue. We are thinking about better approach for testing.

@phprus
Copy link
Contributor Author

phprus commented Feb 16, 2022

Is this a bug in the tests, not in oneTBB library?
This is good news for me!

@alexey-katranov
Copy link
Contributor

alexey-katranov commented Feb 16, 2022

Currently, it is difficult to say that observed anomalies are fully unrelated to oneTBB library. The deadlocks are mostly related to the testing approach but the assertion seems to be of a different nature.

@alexey-katranov
Copy link
Contributor

@phprus , can you please check that test does not fail in debug?

Reopening the issue because it is not fixed in release yet

@phprus
Copy link
Contributor Author

phprus commented Feb 21, 2022

@alexey-katranov

Debug build works fine:

phprus@mbp debug % ./appleclang_13.0_cxx17_64_debug/test_global_control
[doctest] doctest version is "2.4.7"
[doctest] run with "--help" for options
===============================================================================
[doctest] test cases:  4 |  4 passed | 0 failed | 0 skipped
[doctest] assertions: 86 | 86 passed | 0 failed |
[doctest] Status: SUCCESS!

Release build - hangs.

@alexey-katranov
Copy link
Contributor

Thank you for the confirmation. As for release, we have some experiments in dev/alexey-katranov/exp-seq-cst but it is not clear if we go with such approach because it might affect performance.

@phprus
Copy link
Contributor Author

phprus commented Feb 21, 2022

@alexey-katranov

Commit: 29fb19b (https://github.com/oneapi-src/oneTBB/tree/dev/alexey-katranov/exp-seq-cst)

test_global_control and test_arena_constraints (#756) works without hangs.

@alexey-katranov
Copy link
Contributor

test_global_control and test_arena_constraints (#756) works without hangs.

Sounds great. Unfortunately, conformance_parallel_pipeline still hangs after ~859800 runs...

@phprus
Copy link
Contributor Author

phprus commented Feb 21, 2022

I limited the total run times to 15 minutes...

@phprus
Copy link
Contributor Author

phprus commented Jun 7, 2022

Is there any news on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants