Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure main thread can extract from executor if possible #635

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions src/queue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,18 @@ cvk_executor_thread::extract_cmds_required_by(bool only_non_batch_cmds,
}
}
if (executor_cmds->commands.size() > 0) {
// Make sure there is something to extract if possible for the main
// thread to avoid having the executor to signal the main thread to get
// to completion.
if (executor_cmds->commands.size() > 1) {
auto cmd = executor_cmds->commands.back();
executor_cmds->commands.pop_back();
m_groups.push_back(std::move(executor_cmds));
queue->group_sent();

executor_cmds.reset(new cvk_command_group());
executor_cmds->commands.push_front(cmd);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU after a quick review this just splits the group of commands that the app thread is not interested in into two groups before sending them back to the executor. It also sends the last command appended first. The extraction logic does support splitting groups so the only reason I can see for doing this is to enable the app thread to execute commands in parallel with or independently of the executor in the absence of dependencies, which as far as I can tell never happens. The last command appended will depend on the previous ones that because of in-order semantics. I see the following two scenarios:

  1. The app thread can extract all required commands and using one or two groups does not matter.
  2. The executor has begun processing of the last required group. Without this change the app thread has to wait for the executor to complete executing the group. With this change the app thread has to wait (the commands depend on each other so satisfy in-order execution at present) for the executor to complete its group before it can execute the remaining command split out with this change.

I guess I must be missing something. Is it possible to write a test that hits the case of interest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interesting one is scenario 2.

It has to be put in the context of timeline semaphore. This PR allows to reduce latency on the thread app, which in some cases has been measured at 500us on ChromeOS.

The idea here is instead of having the executor thread waiting for the command to be completed to then tell the thread app, we can now have the thread app to wait on the command by itself, remove a thread communication that can cost a lot as the thread need to be wake up, and might start with a low frequency.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the app thread still has to wait for the other dependent commands that have been picked up by the executor thread in that case. That's what I don't understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot avoid waiting for dependencies. What we are trying to change here is what is happening when a dependency is complete. In the case of timeline semaphore, we do not want the executor thread to wait for the timeline semaphore and then notify the thread app. We want the thread app to wait for the timeline semaphore instead.
I think the usecase here is when we have kernels executing and then a readbuffer. We want to be able to have the thread app taking care of doing the readbuffer, but with the thread app waiting on the timeline semaphore instead of a notification from the executor thread.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Depending on how the timeline is implemented, I can see this change being a win.

m_groups.push_back(std::move(executor_cmds));
queue->group_sent();
}
Expand Down
Loading