Skip to content

Commit

Permalink
Fix thread-lock issue when exec enqueue fails (#24)
Browse files Browse the repository at this point in the history
If enqueueing an executable fails it immediately terminates with out
adjust the semaphore values. This puts the timeline in a bad state such
that the next enqueued semaphore will not be reached. Adding a barrier
corrects the sacred timeline so that tasks execute as needed.
  • Loading branch information
rsuderman authored Mar 15, 2023
1 parent 3675069 commit e29a158
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ external/
.vscode/
__pycache__/
user.bazelrc
__pycache__/
15 changes: 13 additions & 2 deletions iree/integrations/pjrt/common/api_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1499,11 +1499,22 @@ iree_status_t LoadedExecutableInstance::BatchExecute(
iree_status_t status = iree_ok_status();
for (size_t dev_index = 0; dev_index < args->num_devices; ++dev_index) {
auto& inv = invs[dev_index];
status = iree_vm_invoke(
auto new_status = iree_vm_invoke(
inv.res_exe->vm_context.get(), inv.res_exe->main_function,
IREE_VM_INVOCATION_FLAG_NONE,
/*policy=*/nullptr, inv.inputs.get(), inv.outputs.get(), allocator);
if (!iree_status_is_ok(status)) break;
// Any invocation that fails needs a barrier so that signal fence is incremented
// otherwise future waits will fail. We do this instead of incrementing as only
// a subset of devices may fail.
if (!iree_status_is_ok(new_status)) {
status = new_status;
// We can ignore the error as we are already erroring out earlier.
IREE_IGNORE_ERROR(iree_hal_device_queue_barrier(
inv.res_exe->device_instance->device(),
IREE_HAL_QUEUE_AFFINITY_ANY,
iree_hal_fence_semaphore_list(inv.wait_fence.get()),
iree_hal_fence_semaphore_list(inv.signal_fence.get())));
}
}

// Process results.
Expand Down

0 comments on commit e29a158

Please sign in to comment.