-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make sure main thread can extract from executor if possible #635
base: main
Are you sure you want to change the base?
Conversation
|
||
executor_cmds.reset(new cvk_command_group()); | ||
executor_cmds->commands.push_front(cmd); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU after a quick review this just splits the group of commands that the app thread is not interested in into two groups before sending them back to the executor. It also sends the last command appended first. The extraction logic does support splitting groups so the only reason I can see for doing this is to enable the app thread to execute commands in parallel with or independently of the executor in the absence of dependencies, which as far as I can tell never happens. The last command appended will depend on the previous ones that because of in-order semantics. I see the following two scenarios:
- The app thread can extract all required commands and using one or two groups does not matter.
- The executor has begun processing of the last required group. Without this change the app thread has to wait for the executor to complete executing the group. With this change the app thread has to wait (the commands depend on each other so satisfy in-order execution at present) for the executor to complete its group before it can execute the remaining command split out with this change.
I guess I must be missing something. Is it possible to write a test that hits the case of interest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interesting one is scenario 2.
It has to be put in the context of timeline semaphore. This PR allows to reduce latency on the thread app, which in some cases has been measured at 500us on ChromeOS.
The idea here is instead of having the executor thread waiting for the command to be completed to then tell the thread app, we can now have the thread app to wait on the command by itself, remove a thread communication that can cost a lot as the thread need to be wake up, and might start with a low frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the app thread still has to wait for the other dependent commands that have been picked up by the executor thread in that case. That's what I don't understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot avoid waiting for dependencies. What we are trying to change here is what is happening when a dependency is complete. In the case of timeline semaphore, we do not want the executor thread to wait for the timeline semaphore and then notify the thread app. We want the thread app to wait for the timeline semaphore instead.
I think the usecase here is when we have kernels executing and then a readbuffer. We want to be able to have the thread app taking care of doing the readbuffer, but with the thread app waiting on the timeline semaphore instead of a notification from the executor thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Depending on how the timeline is implemented, I can see this change being a win.
No description provided.