-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate test hang #1954
Comments
I have reproduced the hang locally Using the following script import pytest
import dpctl.tensor as dpt
from dpctl.tests.helper import get_queue_or_skip, skip_if_dtype_not_supported
from dpctl.tests.elementwise.utils import _integral_dtypes
@pytest.mark.parametrize("iters", range(1, 2001))
@pytest.mark.parametrize("dtype", ["?"] + _integral_dtypes)
def test_bitwise_xor_inplace_python_scalar(dtype, iters):
assert iters
q = get_queue_or_skip()
skip_if_dtype_not_supported(dtype, q)
X = dpt.zeros((10, 10), dtype=dtype, sycl_queue=q)
dt_kind = X.dtype.kind
if dt_kind == "b":
X ^= False
else:
X ^= int(0) and then $ CL_CONFIG_CPU_TARGET_ARCH=corei7-avx ONEAPI_DEVICE_SELECTOR=*:cpu pytest test_bitwise_xor.py the test ended up hanging for me after a time. $ CL_CONFIG_CPU_TARGET_ARCH=corei7-avx ONEAPI_DEVICE_SELECTOR=*:cpu pytest test_bitwise_xor.py
================================================= test session starts ==================================================
platform linux -- Python 3.11.9, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/ngrigori/test
plugins: hypothesis-6.104.2
collected 18000 items
test_bitwise_xor.py ................................................................................ Introducing =========================================== 18000 passed in 69.62s (0:01:09) =========================================== |
Turns out that specifying the architecture is not necessary $ ONEAPI_DEVICE_SELECTOR=*:cpu pytest test_bitwise_xor.py
================================================= test session starts ==================================================
platform linux -- Python 3.11.9, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/ngrigori/test
plugins: hypothesis-6.104.2
collected 18000 items
test_bitwise_xor.py ............................................................................................ [ 0%]
................................................................................................................ [ 1%]
................................................................................................................ [ 1%]
................................................................................................................ [ 2%]
................................................................................................................ [ 3%]
................................................................................................................ [ 3%]
................................................................................................................ [ 4%]
................................ meaning the hang can be reproduced on any CPU in theory. |
I tried to run the test twice on my Core i7 1185G7 CPU, and the 18000 tests passed for me twice. This is consistent with the hang encountered in the CI only occasionally. @ndgrigorian If you can reproduce the hang somewhat reliably, please try running it under |
@ndgrigorian I was able to reproduce the hang using I encountered the hang on an umpteen attempt, and the backtrace seem to suggest a deadlock in CPU runtime: GDB threads backtrace in the hung state
|
@oleksandr-pavlyk backtrace(gdb) thread apply all bt
Thread 8 (Thread 0x7fffaeffd6c0 (LWP 568168) "python"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007fffc0a42064 in tbb::detail::r1::futex_wait (futex=0x7fffaeffb330, comparand=2) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:101
#2 tbb::detail::r1::binary_semaphore::P (this=0x7fffaeffb330) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:253
#3 tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>::wait (this=0x7fffaeffb300) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/concurrent_monitor.h:170
#4 0x00007fffc0a41c8f in tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (this=0x7fffbf7f3a80, node=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/concurrent_monitor.h:232
#5 tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&>(tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&, tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>&&) (this=0x7fffbf7f3a80, node=..., pred=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/concurrent_monitor.h:262
#6 tbb::detail::r1::sleep_waiter::sleep<tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}>(unsigned long, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}) (this=0x7fffaeffb4a8, uniq_tag=<optimized out>, wakeup_condition=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/waiters.h:133
#7 tbb::detail::r1::external_waiter::pause (this=this@entry=0x7fffaeffb4a8) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/waiters.h:160
#8 0x00007fffc0a41364 in tbb::detail::r1::task_dispatcher::receive_or_steal_task<false, tbb::detail::r1::external_waiter> (this=0x7fffbf7e2880, tls=..., ed=..., waiter=..., isolation=0, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.h:232
#9 tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (this=this@entry=0x7fffbf7e2880, t=<optimized out>, t@entry=0x0, waiter=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.h:362
#10 0x00007fffc0a3f10c in tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (this=0--Type <RET> for more, q to quit, c to continue without paging--c
x7fffbf7e2880, t=0x0, waiter=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.h:470
#11 tbb::detail::r1::task_dispatcher::execute_and_wait (t=0x0, wait_ctx=..., w_ctx=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.cpp:168
#12 0x00007fffcc654e8a in tbb::detail::d2::task_group_base::wait() () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#13 0x00007fffcc65a055 in tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorWaiter, void>::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#14 0x00007fffc0a28a37 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=warning: RTTI symbol not found for class 'tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorWaiter, void>'
...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/arena.cpp:821
#15 0x00007fffcc659c9d in Intel::OpenCL::TaskExecutor::TaskGroup::WaitForAll() () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#16 0x00007fffcc659b5d in tbb::detail::d1::task_arena_function<TaskGroupWaiter, void>::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#17 0x00007fffc0a2a41d in tbb::detail::r1::delegated_task::execute(tbb::detail::d1::execution_data&)::{lambda()#1}::operator()() const (this=<optimized out>) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/arena.cpp:734
#18 tbb::detail::d0::try_call_proxy<tbb::detail::r1::delegated_task::execute(tbb::detail::d1::execution_data&)::{lambda()#1}>::on_completion<tbb::detail::r1::delegated_task::execute(tbb::detail::d1::execution_data&)::{lambda()#2}>(tbb::detail::r1::delegated_task::execute(tbb::detail::d1::execution_data&)::{lambda()#2}) (this=<optimized out>, on_completion_body=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/../../include/oneapi/tbb/detail/_template_helpers.h:230
#19 tbb::detail::r1::delegated_task::execute (this=0x7fffffff8f00, ed=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/arena.cpp:735
#20 0x00007fffc0a410ae in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (this=this@entry=0x7fffbf7e2880, t=0x7fffffff8f00, t@entry=0x0, waiter=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/../../include/oneapi/tbb/task_group.h:382
#21 0x00007fffc0a3f10c in tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (this=0x7fffbf7e2880, t=0x0, waiter=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.h:470
#22 tbb::detail::r1::task_dispatcher::execute_and_wait (t=0x0, wait_ctx=..., w_ctx=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.cpp:168
#23 0x00007fffcc654e8a in tbb::detail::d2::task_group_base::wait() () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#24 0x00007fffcc65a055 in tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorWaiter, void>::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#25 0x00007fffc0a28a37 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=warning: RTTI symbol not found for class 'tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorWaiter, void>'
...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/arena.cpp:821
#26 0x00007fffcc659dd0 in Intel::OpenCL::TaskExecutor::SpawningTaskGroup::WaitForAll() () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#27 0x00007fffcc640436 in Intel::OpenCL::TaskExecutor::out_of_order_executor_task::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#28 0x00007fffcc659a88 in tbb::detail::d1::enqueue_task<Intel::OpenCL::TaskExecutor::ArenaFunctorRunner<Intel::OpenCL::TaskExecutor::out_of_order_executor_task> >::execute(tbb::detail::d1::execution_data&) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#29 0x00007fffc0a410ae in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (this=this@entry=0x7fffbf7e2880, t=0x7fffbf7efa00, t@entry=0x0, waiter=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/../../include/oneapi/tbb/task_group.h:382
#30 0x00007fffc0a3f10c in tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (this=0x7fffbf7e2880, t=0x0, waiter=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.h:470
#31 tbb::detail::r1::task_dispatcher::execute_and_wait (t=0x0, wait_ctx=..., w_ctx=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/task_dispatcher.cpp:168
#32 0x00007fffcc654e8a in tbb::detail::d2::task_group_base::wait() () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#33 0x00007fffcc65a055 in tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorWaiter, void>::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#34 0x00007fffc0a28a37 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=warning: RTTI symbol not found for class 'tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorWaiter, void>'
...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/arena.cpp:821
#35 0x00007fffcc659dd0 in Intel::OpenCL::TaskExecutor::SpawningTaskGroup::WaitForAll() () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#36 0x00007fffcc640436 in Intel::OpenCL::TaskExecutor::out_of_order_executor_task::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#37 0x00007fffcc659b12 in tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorRunner<Intel::OpenCL::TaskExecutor::out_of_order_executor_task>, void>::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#38 0x00007fffc0a28a37 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=warning: RTTI symbol not found for class 'tbb::detail::d1::task_arena_function<Intel::OpenCL::TaskExecutor::ArenaFunctorRunner<Intel::OpenCL::TaskExecutor::out_of_order_executor_task>, void>'
...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/arena.cpp:821
#39 0x00007fffcc656c42 in Intel::OpenCL::TaskExecutor::out_of_order_command_list::LaunchExecutorTask(bool, Intel::OpenCL::Utils::SharedPtr<Intel::OpenCL::TaskExecutor::ITaskBase> const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#40 0x00007fffcc65633e in Intel::OpenCL::TaskExecutor::base_command_list::InternalFlush(bool) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#41 0x00007fffcc6561e2 in Intel::OpenCL::TaskExecutor::base_command_list::WaitForCompletion(Intel::OpenCL::Utils::SharedPtr<Intel::OpenCL::TaskExecutor::ITaskBase> const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#42 0x00007fffcc62e3f5 in Intel::OpenCL::CPUDevice::CPUDevice::clDevCommandListWaitCompletion(void*, cl_dev_cmd_desc*) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#43 0x00007fffcc5e7149 in Intel::OpenCL::Framework::IOclCommandQueueBase::WaitForCompletion(Intel::OpenCL::Utils::SharedPtr<Intel::OpenCL::Framework::QueueEvent> const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#44 0x00007fffcc5b52a4 in Intel::OpenCL::Framework::EventsManager::WaitForEvents(unsigned int, _cl_event* const*, bool) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#45 0x00007fffcc5ca033 in Intel::OpenCL::Framework::ExecutionModule::WaitForEvents(unsigned int, _cl_event* const*) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#46 0x00007fffcc523856 in clWaitForEvents () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#47 0x00007fffd5b5436a in urEventWait () from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_opencl.so.0
#48 0x00007ffff3a7f9eb in ur_loader::urEventWait(unsigned int, ur_event_handle_t_* const*) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
#49 0x00007ffff3a92c27 in urEventWait () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
#50 0x00007ffff4446c64 in sycl::_V1::detail::DispatchHostTask::waitForEvents() const () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#51 0x00007ffff4445aa4 in sycl::_V1::detail::DispatchHostTask::operator()() const () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#52 0x00007ffff4380dd5 in sycl::_V1::detail::ThreadPool::worker() () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#53 0x00007ffff654cdb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#54 0x00007ffff7d43a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#55 0x00007ffff7dd0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 7 (Thread 0x7fffaffff6c0 (LWP 568167) "python"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007fffc0a3af06 in tbb::detail::r1::futex_wait (futex=0x7fffbfa7f024, comparand=2) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:101
#2 tbb::detail::r1::binary_semaphore::P (this=0x7fffbfa7f024) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:253
#3 tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x7fffbfa7f020) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/rml_thread_monitor.h:235
#4 tbb::detail::r1::rml::private_worker::run (this=0x7fffbfa7f000) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/private_server.cpp:273
#5 0x00007fffc0a3adc6 in tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffbfa7f024) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/private_server.cpp:221
#6 0x00007ffff7d43a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#7 0x00007ffff7dd0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 6 (Thread 0x7fffaf7fe6c0 (LWP 568166) "python"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007fffc0a3af06 in tbb::detail::r1::futex_wait (futex=0x7fffbfa7f124, comparand=2) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:101
#2 tbb::detail::r1::binary_semaphore::P (this=0x7fffbfa7f124) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:253
#3 tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x7fffbfa7f120) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/rml_thread_monitor.h:235
#4 tbb::detail::r1::rml::private_worker::run (this=0x7fffbfa7f100) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/private_server.cpp:273
#5 0x00007fffc0a3adc6 in tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffbfa7f124) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/private_server.cpp:221
#6 0x00007ffff7d43a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#7 0x00007ffff7dd0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 5 (Thread 0x7fffb4a346c0 (LWP 568165) "python"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007fffc0a3af06 in tbb::detail::r1::futex_wait (futex=0x7fffbfa7f0a4, comparand=2) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:101
#2 tbb::detail::r1::binary_semaphore::P (this=0x7fffbfa7f0a4) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:253
#3 tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x7fffbfa7f0a0) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/rml_thread_monitor.h:235
#4 tbb::detail::r1::rml::private_worker::run (this=0x7fffbfa7f080) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/private_server.cpp:273
#5 0x00007fffc0a3adc6 in tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffbfa7f0a4) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/private_server.cpp:221
#6 0x00007ffff7d43a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#7 0x00007ffff7dd0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 4 (Thread 0x7fffebbfd6c0 (LWP 568159) "python"):
#0 0x00007ffff7d3fd61 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff29c8860 <thread_status+352>) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7ffff29c8860 <thread_status+352>) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff29c8860 <thread_status+352>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007ffff7d427dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffff29c8810 <thread_status+272>, cond=0x7ffff29c8838 <thread_status+312>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x7ffff29c8838 <thread_status+312>, mutex=0x7ffff29c8810 <thread_status+272>) at ./nptl/pthread_cond_wait.c:627
#5 0x00007ffff191cb8b in blas_thread_server () from /home/ngrigori/miniforge3/envs/dpctl_dev/lib/python3.11/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#6 0x00007ffff7d43a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#7 0x00007ffff7dd0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 3 (Thread 0x7ffff03fe6c0 (LWP 568158) "python"):
#0 0x00007ffff7d3fd61 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff29c87e0 <thread_status+224>) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7ffff29c87e0 <thread_status+224>) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff29c87e0 <thread_status+224>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007ffff7d427dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffff29c8790 <thread_status+144>, cond=0x7ffff29c87b8 <thread_status+184>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x7ffff29c87b8 <thread_status+184>, mutex=0x7ffff29c8790 <thread_status+144>) at ./nptl/pthread_cond_wait.c:627
#5 0x00007ffff191cb8b in blas_thread_server () from /home/ngrigori/miniforge3/envs/dpctl_dev/lib/python3.11/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#6 0x00007ffff7d43a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#7 0x00007ffff7dd0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 2 (Thread 0x7ffff0bff6c0 (LWP 568157) "python"):
#0 0x00007ffff7d3fd61 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7ffff29c8760 <thread_status+96>) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x7ffff29c8760 <thread_status+96>) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff29c8760 <thread_status+96>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007ffff7d427dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7ffff29c8710 <thread_status+16>, cond=0x7ffff29c8738 <thread_status+56>) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x7ffff29c8738 <thread_status+56>, mutex=0x7ffff29c8710 <thread_status+16>) at ./nptl/pthread_cond_wait.c:627
#5 0x00007ffff191cb8b in blas_thread_server () from /home/ngrigori/miniforge3/envs/dpctl_dev/lib/python3.11/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-ff651d7f.so
#6 0x00007ffff7d43a94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#7 0x00007ffff7dd0c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Thread 1 (Thread 0x7ffff7ca0b80 (LWP 568154) "python"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007fffc0a2a244 in tbb::detail::r1::futex_wait (futex=0x7fffffff8ef8, comparand=2) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:101
#2 tbb::detail::r1::binary_semaphore::P (this=0x7fffffff8ef8) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/semaphore.h:253
#3 tbb::detail::r1::sleep_node<unsigned long>::wait (this=0x7fffffff8ed0) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/concurrent_monitor.h:170
#4 0x00007fffc0a287a3 in tbb::detail::r1::concurrent_monitor_base<unsigned long>::commit_wait (this=0x7fffbf7e23a8, node=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/concurrent_monitor.h:232
#5 tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /localdisk/tmp/onetbb-ci/onetbb_source_code/src/tbb/arena.cpp:797
#6 0x00007fffcc6570f4 in Intel::OpenCL::TaskExecutor::out_of_order_command_list::WaitForIdle() () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#7 0x00007fffcc6561f2 in Intel::OpenCL::TaskExecutor::base_command_list::WaitForCompletion(Intel::OpenCL::Utils::SharedPtr<Intel::OpenCL::TaskExecutor::ITaskBase> const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#8 0x00007fffcc62e3f5 in Intel::OpenCL::CPUDevice::CPUDevice::clDevCommandListWaitCompletion(void*, cl_dev_cmd_desc*) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#9 0x00007fffcc5e7149 in Intel::OpenCL::Framework::IOclCommandQueueBase::WaitForCompletion(Intel::OpenCL::Utils::SharedPtr<Intel::OpenCL::Framework::QueueEvent> const&) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#10 0x00007fffcc5b52a4 in Intel::OpenCL::Framework::EventsManager::WaitForEvents(unsigned int, _cl_event* const*, bool) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#11 0x00007fffcc5ca033 in Intel::OpenCL::Framework::ExecutionModule::WaitForEvents(unsigned int, _cl_event* const*) () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#12 0x00007fffcc523856 in clWaitForEvents () from /opt/intel/oneapi/compiler/2025.0/lib/libintelocl.so
#13 0x00007fffd5b5436a in urEventWait () from /opt/intel/oneapi/compiler/2025.0/lib/libur_adapter_opencl.so.0
#14 0x00007ffff3a7f9eb in ur_loader::urEventWait(unsigned int, ur_event_handle_t_* const*) () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
#15 0x00007ffff3a92c27 in urEventWait () from /opt/intel/oneapi/compiler/2025.0/lib/libur_loader.so.0
#16 0x00007ffff4376aef in sycl::_V1::detail::event_impl::waitInternal(bool*) () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#17 0x00007ffff4376fa2 in sycl::_V1::detail::event_impl::wait(std::shared_ptr<sycl::_V1::detail::event_impl>, bool*) () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#18 0x00007ffff447976e in sycl::_V1::event::wait() () from /opt/intel/oneapi/compiler/2025.0/lib/libsycl.so.8
#19 0x00007fffe8788024 in ?? () from /home/ngrigori/repos/dpctl/dpctl/tensor/_tensor_impl.cpython-311-x86_64-linux-gnu.so
#20 0x00007fffe834672f in ?? () from /home/ngrigori/repos/dpctl/dpctl/tensor/_tensor_impl.cpython-311-x86_64-linux-gnu.so
#21 0x00007fffe832a06f in ?? () from /home/ngrigori/repos/dpctl/dpctl/tensor/_tensor_impl.cpython-311-x86_64-linux-gnu.so
#22 0x0000555555755b06 in cfunction_call (func=0x7fffe9209da0, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/methodobject.c:542
#23 0x00005555557348b3 in _PyObject_MakeTpCall (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7fffe9209da0, args=<optimized out>, nargs=0, keywords=0x7ffff6fbebb0) at /usr/local/src/conda/python-3.11.9/Objects/call.c:214
#24 0x00005555557423b6 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae2ab8, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769
#25 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae2ab8, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#26 _PyEval_Vector (kwnames=<optimized out>, argcount=2, args=0x7fffffffa228, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#27 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffffffa228, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#28 0x00007fffe55b9a36 in __pyx_pf_5dpctl_6tensor_9_usmarray_11usm_ndarray_68__setitem__(PyUSMArrayObject*, _object*, _object*) () from /home/ngrigori/repos/dpctl/dpctl/tensor/_usmarray.cpython-311-x86_64-linux-gnu.so
#29 0x0000555555746071 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae2768, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:2297
#30 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae2768, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#31 _PyEval_Vector (kwnames=<optimized out>, argcount=3, args=0x7fffffffa480, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#32 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffffffa480, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#33 0x00007fffe55ae5db in __pyx_pw_5dpctl_6tensor_9_usmarray_11usm_ndarray_127__ixor__(_object*, _object*) () from /home/ngrigori/repos/dpctl/dpctl/tensor/_usmarray.cpython-311-x86_64-linux-gnu.so
#34 0x00005555557c5984 in binary_iop1 (op_slot=112, iop_slot=216, w=0x555555aa7ac0 <_Py_FalseStruct>, v=0x7fffd5e36960) at /usr/local/src/conda/python-3.11.9/Objects/abstract.c:1190
#35 binary_iop (op_name=0x555555889b71 "^=", op_slot=112, iop_slot=216, w=0x555555aa7ac0 <_Py_FalseStruct>, v=0x7fffd5e36960) at /usr/local/src/conda/python-3.11.9/Objects/abstract.c:1215
#36 PyNumber_InPlaceXor (v=0x7fffd5e36960, w=0x555555aa7ac0 <_Py_FalseStruct>) at /usr/local/src/conda/python-3.11.9/Objects/abstract.c:1248
#37 0x0000555555742fe9 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae26c8, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5548
#38 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae26c8, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#39 _PyEval_Vector (kwnames=<optimized out>, argcount=0, args=0x7fffd5e48058, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#40 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffd5e48058, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#41 0x000055555576f6b0 in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7ffff5f867a0, func=0x555555765800 <_PyFunction_Vectorcall>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:257
#42 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7ffff5f867a0, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:328
#43 PyObject_Call (callable=0x7ffff5f867a0, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:355
#44 0x00005555557466e4 in do_call_core (use_tracing=<optimized out>, kwdict=0x7fffd5e45ec0, callargs=0x555555ab65f8 <_PyRuntime+58904>, func=0x7ffff5f867a0, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349
#45 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae2610, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5376
#46 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae2610, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#47 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7fffd71062a8, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#48 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffd71062a8, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#49 0x00005555557466e4 in do_call_core (use_tracing=<optimized out>, kwdict=0x0, callargs=0x7fffd7106290, func=0x7ffff7118fe0, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349
#50 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae23a8, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5376
#51 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae23a8, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#52 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7fffd5e28058, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#53 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffd5e28058, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#54 0x0000555555739420 in _PyObject_FastCallDictTstate (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff76ae480, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:152
#55 0x000055555576d399 in _PyObject_Call_Prepend (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, callable=callable@entry=0x7ffff76ae480, obj=obj@entry=0x7ffff6de5bc0, args=args@entry=0x555555ab65f8 <_PyRuntime+58904>, kwargs=kwargs@entry=0x7fffd5e45a80) at /usr/local/src/conda/python-3.11.9/Objects/call.c:482
#56 0x000055555583f108 in slot_tp_call (self=0x7ffff6de5bc0, args=0x555555ab65f8 <_PyRuntime+58904>, kwds=0x7fffd5e45a80) at /usr/local/src/conda/python-3.11.9/Objects/typeobject.c:7624
#57 0x00005555557348b3 in _PyObject_MakeTpCall (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff6de5bc0, args=<optimized out>, nargs=0, keywords=0x7ffff7113af0) at /usr/local/src/conda/python-3.11.9/Objects/call.c:214
#58 0x00005555557423b6 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae22c8, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769
#59 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae22c8, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#60 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff6ebfaa8, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#61 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ffff6ebfaa8, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#62 0x00005555557466e4 in do_call_core (use_tracing=<optimized out>, kwdict=0x0, callargs=0x7ffff6ebfa90, func=0x7ffff704efc0, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349
#63 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae2060, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5376
#64 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae2060, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#65 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7fffd5e48418, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#66 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffd5e48418, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#67 0x0000555555739420 in _PyObject_FastCallDictTstate (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff76ae480, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:152
#68 0x000055555576d399 in _PyObject_Call_Prepend (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, callable=callable@entry=0x7ffff76ae480, obj=obj@entry=0x7ffff6de5da0, args=args@entry=0x555555ab65f8 <_PyRuntime+58904>, kwargs=kwargs@entry=0x7fffd7cf2800) at /usr/local/src/conda/python-3.11.9/Objects/call.c:482
#69 0x000055555583f108 in slot_tp_call (self=self@entry=0x7ffff6de5da0, args=args@entry=0x555555ab65f8 <_PyRuntime+58904>, kwds=0x7fffd7cf2800) at /usr/local/src/conda/python-3.11.9/Objects/typeobject.c:7624
#70 0x000055555576f63e in _PyObject_Call (kwargs=<optimized out>, args=0x555555ab65f8 <_PyRuntime+58904>, callable=0x7ffff6de5da0, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:343
#71 PyObject_Call (callable=0x7ffff6de5da0, args=0x555555ab65f8 <_PyRuntime+58904>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:355
#72 0x00005555557466e4 in do_call_core (use_tracing=<optimized out>, kwdict=0x7fffd7cf2800, callargs=0x555555ab65f8 <_PyRuntime+58904>, func=0x7ffff6de5da0, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349
#73 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae1d00, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5376
#74 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae1d00, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#75 _PyEval_Vector (kwnames=<optimized out>, argcount=2, args=0x7fffe92ad618, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#76 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffe92ad618, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#77 0x00005555557466e4 in do_call_core (use_tracing=<optimized out>, kwdict=0x0, callargs=0x7fffe92ad600, func=0x7ffff704ed40, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349
#78 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae1a98, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5376
#79 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae1a98, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#80 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff715b878, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#81 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ffff715b878, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#82 0x0000555555739420 in _PyObject_FastCallDictTstate (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff76ae480, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:152
#83 0x000055555576d399 in _PyObject_Call_Prepend (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, callable=callable@entry=0x7ffff76ae480, obj=obj@entry=0x7ffff6de5f30, args=args@entry=0x555555ab65f8 <_PyRuntime+58904>, kwargs=kwargs@entry=0x7fffd5dd8800) at /usr/local/src/conda/python-3.11.9/Objects/call.c:482
#84 0x000055555583f108 in slot_tp_call (self=0x7ffff6de5f30, args=0x555555ab65f8 <_PyRuntime+58904>, kwds=0x7fffd5dd8800) at /usr/local/src/conda/python-3.11.9/Objects/typeobject.c:7624
#85 0x00005555557348b3 in _PyObject_MakeTpCall (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff6de5f30, args=<optimized out>, nargs=0, keywords=0x7ffff735d180) at /usr/local/src/conda/python-3.11.9/Objects/call.c:214
#86 0x00005555557423b6 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae1a00, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769
#87 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae1a00, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#88 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff6ebcda8, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#89 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ffff6ebcda8, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#90 0x00005555557466e4 in do_call_core (use_tracing=<optimized out>, kwdict=0x0, callargs=0x7ffff6ebcd90, func=0x7ffff704fe20, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349
#91 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae1798, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5376
#92 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae1798, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#93 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff715b898, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#94 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ffff715b898, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#95 0x0000555555739420 in _PyObject_FastCallDictTstate (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff76ae480, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:152
#96 0x000055555576d399 in _PyObject_Call_Prepend (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, callable=callable@entry=0x7ffff76ae480, obj=obj@entry=0x7ffff6de6020, args=args@entry=0x555555ab65f8 <_PyRuntime+58904>, kwargs=kwargs@entry=0x7ffff6e91480) at /usr/local/src/conda/python-3.11.9/Objects/call.c:482
#97 0x000055555583f108 in slot_tp_call (self=0x7ffff6de6020, args=0x555555ab65f8 <_PyRuntime+58904>, kwds=0x7ffff6e91480) at /usr/local/src/conda/python-3.11.9/Objects/typeobject.c:7624
#98 0x00005555557348b3 in _PyObject_MakeTpCall (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff6de6020, args=<optimized out>, nargs=0, keywords=0x7ffff75a90f0) at /usr/local/src/conda/python-3.11.9/Objects/call.c:214
#99 0x00005555557423b6 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae15f0, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769
#100 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae15f0, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#101 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff6ebcaa8, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#102 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ffff6ebcaa8, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#103 0x00005555557466e4 in do_call_core (use_tracing=<optimized out>, kwdict=0x0, callargs=0x7ffff6ebca90, func=0x7ffff704fce0, tstate=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:7349
#104 _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae1388, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:5376
#105 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae1388, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#106 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff715b858, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#107 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ffff715b858, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#108 0x0000555555739420 in _PyObject_FastCallDictTstate (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff76ae480, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:152
#109 0x000055555576d399 in _PyObject_Call_Prepend (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, callable=callable@entry=0x7ffff76ae480, obj=obj@entry=0x7ffff6de5350, args=args@entry=0x555555ab65f8 <_PyRuntime+58904>, kwargs=kwargs@entry=0x7ffff6eab5c0) at /usr/local/src/conda/python-3.11.9/Objects/call.c:482
#110 0x000055555583f108 in slot_tp_call (self=0x7ffff6de5350, args=0x555555ab65f8 <_PyRuntime+58904>, kwds=0x7ffff6eab5c0) at /usr/local/src/conda/python-3.11.9/Objects/typeobject.c:7624
#111 0x00005555557348b3 in _PyObject_MakeTpCall (tstate=0x555555ad0998 <_PyRuntime+166328>, callable=0x7ffff6de5350, args=<optimized out>, nargs=0, keywords=0x7ffff75a97b0) at /usr/local/src/conda/python-3.11.9/Objects/call.c:214
#112 0x00005555557423b6 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae11b8, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769
#113 0x00005555557f9a8d in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae11b8, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#114 _PyEval_Vector (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, func=func@entry=0x7ffff7c21f80, locals=locals@entry=0x7ffff7c3eb80, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#115 0x00005555557f911f in PyEval_EvalCode (co=co@entry=0x7ffff6fdddf0, globals=globals@entry=0x7ffff7c3eb80, locals=locals@entry=0x7ffff7c3eb80) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:1148
#116 0x00005555558106ee in builtin_exec_impl (module=<optimized out>, closure=<optimized out>, locals=0x7ffff7c3eb80, globals=0x7ffff7c3eb80, source=0x7ffff6fdddf0) at /usr/local/src/conda/python-3.11.9/Python/bltinmodule.c:1077
#117 builtin_exec (module=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Python/clinic/bltinmodule.c.h:465
#118 0x000055555574efbf in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7ffff7bd0f90, args=0x7ffff7ae1180, nargsf=<optimized out>, kwnames=0x0) at /usr/local/src/conda/python-3.11.9/Include/cpython/methodobject.h:52
#119 0x000055555574eeac in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffff7bd0f90, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_call.h:92
#120 PyObject_Vectorcall (callable=0x7ffff7bd0f90, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:299
#121 0x00005555557423b6 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x555555ad0998 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7ffff7ae1020, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:4769
#122 0x0000555555765981 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7ae1020, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Include/internal/pycore_ceval.h:73
#123 _PyEval_Vector (kwnames=<optimized out>, argcount=2, args=0x7ffff779ea98, locals=0x0, func=<optimized out>, tstate=0x555555ad0998 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.9/Python/ceval.c:6434
#124 _PyFunction_Vectorcall (func=<optimized out>, stack=0x7ffff779ea98, nargsf=<optimized out>, kwnames=<optimized out>) at /usr/local/src/conda/python-3.11.9/Objects/call.c:393
#125 0x0000555555823158 in pymain_run_module (modname=<optimized out>, set_argv0=set_argv0@entry=1) at /usr/local/src/conda/python-3.11.9/Modules/main.c:300
#126 0x0000555555822ad9 in pymain_run_python (exitcode=0x7fffffffc724) at /usr/local/src/conda/python-3.11.9/Modules/main.c:595
#127 Py_RunMain () at /usr/local/src/conda/python-3.11.9/Modules/main.c:680
#128 0x00005555557e9027 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.9/Modules/main.c:734
#129 0x00007ffff7cd11ca in __libc_start_call_main (main=main@entry=0x5555557e8f80 <main>, argc=argc@entry=4, argv=argv@entry=0x7fffffffc988) at ../sysdeps/nptl/libc_start_call_main.h:58
#130 0x00007ffff7cd128b in __libc_start_main_impl (main=0x5555557e8f80 <main>, argc=4, argv=0x7fffffffc988, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffc978) at ../csu/libc-start.c:360
#131 0x00005555557e8ecd in _start () |
I modified the test script to add PID and data type worked on as follows: import pytest
import os
import dpctl.tensor as dpt
from dpctl.tests.helper import get_queue_or_skip, skip_if_dtype_not_supported
from dpctl.tests.elementwise.utils import _integral_dtypes
@pytest.mark.parametrize("iters", range(1, 2001))
@pytest.mark.parametrize("dtype", ["?"] + _integral_dtypes)
def test_bitwise_xor_inplace_python_scalar(dtype, iters):
assert iters
q = get_queue_or_skip()
skip_if_dtype_not_supported(dtype, q)
if iters == 1:
print((dtype, q.sycl_device, os.getpid()))
X = dpt.zeros((10, 10), dtype=dtype, sycl_queue=q)
dt_kind = X.dtype.kind
if dt_kind == "b":
X ^= False
else:
X ^= int(0) then created a shell script to drive execution repeated to reproduce the hang: export ONEAPI_DEVICE_SELECTOR=opencl:cpu
for i in `seq 0 60`;
do
echo "Run ${i}"
taskset -c 0-3 python -m pytest -s test_hang.py
done Once hung (which may hang for any data type), use In gdb, thread states are as follows:
Threads 6-8 are spawned by openBLAS, since their backtrace is
The top of thread 1 backtrace is:
The frame 2 of the backtrace indicates that the hang is in the loop of The registers in frame 0 (syscall.S:38) is as follows:
The value of So this hang, is likely an adverse interaction between GNU OpenMP threads spawned by openBLAS, and TBB threads. It might be instructive to try to reproduce the hang with NumPy from Intel channel, which is linked to Intel MKL, and run the test with |
The hang is still reproducible when NumPy linked against MKL rather than openBLAS is used. We could not reproduce the hang on a NUC with Tiger Lake CPU and Ubuntu 22.04, with NumPy powered by openBLAS. |
Sometimes, and more frequently than I would have liked it to, the Linux test run times out, seemingly hanging on with this being the last output:
Usually, rerunning the test_linux step runs fine.
I was not able to reproduce locally yet.
The text was updated successfully, but these errors were encountered: