Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make krun-server wait for child processes #34

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

teohhanhui
Copy link
Collaborator

@teohhanhui teohhanhui commented Jun 9, 2024

Continue to accept and handle incoming connections, until the server is idle and all child processes have exited.

Fixes #33

Tested working with all of the scenarios mentioned in the linked issue, and confirmed that the server exits correctly once all the child processes have exited (doesn't seem to run into deadlocks).

@teohhanhui teohhanhui requested a review from slp as a code owner June 9, 2024 09:28
Cargo.lock Show resolved Hide resolved
@teohhanhui teohhanhui force-pushed the fix/krun-server-wait-for-children branch 2 times, most recently from 397b12b to 6e60d61 Compare June 9, 2024 09:49
@teohhanhui teohhanhui force-pushed the fix/krun-server-wait-for-children branch from 6e60d61 to 569f6ca Compare June 9, 2024 10:24
@slp
Copy link
Collaborator

slp commented Jun 10, 2024

Frankly, I'm not a big fan of this approach. Here we're sacrificing joining cleaning the thread and introducing an overly complex pattern for what we really need to do. And, what's worse, we're not reaping the children until the process is going to end, meaning we're leaving zombie processes in the system.

The hardest part to solve is the latter. We basically have 3 options:

  1. A signal handler for SIGCHLD with a global Arc<Mutex<Vec<Child>>> that's kept updated by everyone. When exiting, wait on whatever children are still on the Vec. Not a big fan of this one either.
  2. Spawn a thread with each children and have it wait on the child. Put all threads in a Arc<Mutex<Vec<JoinHandle>>> shared by krun-server.rs:main() and server.rs. On exit, join both the the server thread and all process monitoring threads.
  3. Bite the bullet and use Tokyo, which allows dealing with children (and the server) in an asynchronous fashion.

I think I would favor 2, but I suspect we're going to need Tokyo sooner than later (a crate I'm looking for changing the way in which the network interface is configured requires it, but maybe there's some other option), so I'm fine with either.

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Jun 11, 2024

@slp I'd favour tokio. But I think the implementation will only end up being more complex, not less. (I think it's pretty much unavoidable.)

Would you be okay with merging this as-is? I think it really helps with usability despite the issues you've identified above. I'll work on a follow-up PR.

@teohhanhui teohhanhui force-pushed the fix/krun-server-wait-for-children branch from 569f6ca to fe48ccc Compare June 11, 2024 14:40
@slp
Copy link
Collaborator

slp commented Jun 11, 2024

@slp I'd favour tokio. But I think the implementation will only end up being more complex, not less. (I think it's pretty much unavoidable.)

It'll be more complex internally, but the code for us would be simpler thanks to tokio/async's sugar syntax.

Would you be okay with merging this as-is? I think it really helps with usability despite the issues you've identified above. I'll work on a follow-up PR.

Since we're not in a hurry to provide this to users, I think we should take our time to do it right. I might be able to work on a first implementation of tokio later this week, if you don't beat me to it ;-)

@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Jun 11, 2024

It'll be more complex internally, but the code for us would be simpler thanks to tokio/async's sugar syntax.

I guess we could use JoinSet

Since we're not in a hurry to provide this to users, I think we should take our time to do it right. I might be able to work on a first implementation of tokio later this week, if you don't beat me to it ;-)

Alright. I'll start working on this tomorrow today. 😄

@teohhanhui teohhanhui force-pushed the fix/krun-server-wait-for-children branch from fe48ccc to 92fc18e Compare June 12, 2024 00:04
@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Jun 12, 2024

Done. 🚀

Tested working with the same scenarios.

tokio::select! {
res = &mut server_handle, if !server_died => {
// If an error is received here, accepting connections from the
// TCP listener failed due to non-transient errors and the
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This is a lie. We're logging and ignoring all connection errors at this time... We should continue on transient errors but bail out on non-transient errors.

https://book.async.rs/patterns/accept-loop

Continue to accept and handle incoming connections, until the
server is idle and all child processes have exited.

Signed-off-by: Teoh Han Hui <[email protected]>
@teohhanhui teohhanhui force-pushed the fix/krun-server-wait-for-children branch from 92fc18e to 8eb3261 Compare June 12, 2024 00:13
.with_context(|| format!("Failed to execute {command:?} as child process"));
if let Err(err) = &res {
let msg = format!("{err:?}");
stream.write_all(msg.as_bytes()).await.ok();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_all is not cancellation safe. Expect Heisenbugs!

@slp
Copy link
Collaborator

slp commented Jun 13, 2024

Woah, that was fast! I think we could refactor start_server's loop in various methods to make it look better, but I don't want to delay this PR just for that. Let's get it merged!

@slp slp merged commit 3c6f941 into AsahiLinux:main Jun 13, 2024
1 check passed
@teohhanhui teohhanhui deleted the fix/krun-server-wait-for-children branch June 13, 2024 21:04
@teohhanhui
Copy link
Collaborator Author

teohhanhui commented Jun 13, 2024

Working on the refactoring of start_server :D

EDIT: #35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

krun-server should outlive first command
2 participants