Skip to content

Commit

Permalink
radv: Remove check_status
Browse files Browse the repository at this point in the history
Following discussion on kernel mailing list[1], we are not gaining
anything from this right now, and it does not handle soft recovery.

We will hear about the context loss and rationale when we vkQueueSubmit
next.

We can come back to this if there is ever a Vulkan extension for
figuring out innocent vs guilty like GL_EXT_robustness.

This does mean however that we return VK_SUCCESS for cancelled semaphore
and fence waits, but this is legal per the Vulkan spec:

"Commands that wait indefinitely for device execution (namely
vkDeviceWaitIdle, vkQueueWaitIdle, vkWaitForFences with a maximum
timeout, and vkGetQueryPoolResults with the VK_QUERY_RESULT_WAIT_BIT
bit set in flags) must return in finite time even in the case of a lost
device, and return either VK_SUCCESS or VK_ERROR_DEVICE_LOST."

"If device loss occurs (see Lost Device) before the timeout has expired,
vkWaitSemaphores must return in finite time with either VK_SUCCESS or
VK_ERROR_DEVICE_LOST."

[1]: https://lists.freedesktop.org/archives/amd-gfx/2024-January/103337.html

Signed-off-by: Joshua Ashton <[email protected]>

Reviewed-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Friedrich Vock <[email protected]>
  • Loading branch information
misyltoad committed Jan 18, 2024
1 parent 1e40dd6 commit c55ac77
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 27 deletions.
27 changes: 0 additions & 27 deletions src/amd/vulkan/radv_device.c
Original file line number Diff line number Diff line change
Expand Up @@ -598,32 +598,6 @@ init_dispatch_tables(struct radv_device *device, struct radv_physical_device *ph
add_entrypoints(&b, &vk_common_device_entrypoints, RADV_DISPATCH_TABLE_COUNT);
}

static VkResult
radv_check_status(struct vk_device *vk_device)
{
struct radv_device *device = container_of(vk_device, struct radv_device, vk);
enum radv_reset_status status;
bool context_reset = false;

/* If an INNOCENT_CONTEXT_RESET is found in one of the contexts, we need to
* keep querying in case there's a guilty one, so we can correctly log if the
* hung happened in this app or not */
for (int i = 0; i < RADV_NUM_HW_CTX; i++) {
if (device->hw_ctx[i]) {
status = device->ws->ctx_query_reset_status(device->hw_ctx[i]);

if (status == RADV_GUILTY_CONTEXT_RESET)
return vk_device_set_lost(&device->vk, "GPU hung detected in this process");
else if (status == RADV_INNOCENT_CONTEXT_RESET)
context_reset = true;
}
}

if (context_reset)
return vk_device_set_lost(&device->vk, "GPU hung triggered by other process");
return VK_SUCCESS;
}

static VkResult
capture_trace(VkQueue _queue)
{
Expand Down Expand Up @@ -816,7 +790,6 @@ radv_CreateDevice(VkPhysicalDevice physicalDevice, const VkDeviceCreateInfo *pCr
device->vk.capture_trace = capture_trace;

device->vk.command_buffer_ops = &radv_cmd_buffer_ops;
device->vk.check_status = radv_check_status;

device->instance = physical_device->instance;
device->physical_device = physical_device;
Expand Down
2 changes: 2 additions & 0 deletions src/amd/vulkan/radv_queue.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@
#include "vk_semaphore.h"
#include "vk_sync.h"

#include "ac_debug.h"

enum radeon_ctx_priority
radv_get_queue_global_priority(const VkDeviceQueueGlobalPriorityCreateInfoKHR *pObj)
{
Expand Down

0 comments on commit c55ac77

Please sign in to comment.