Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAGAIN / EWOULDBLOCK when writing logs to stderr, causing failure #378

Open
piegamesde opened this issue Feb 26, 2023 · 10 comments · May be fixed by #382
Open

EAGAIN / EWOULDBLOCK when writing logs to stderr, causing failure #378

piegamesde opened this issue Feb 26, 2023 · 10 comments · May be fixed by #382

Comments

@piegamesde
Copy link
Contributor

Separating out the issue from #367. Backtrace:

Resource temporarily unavailable (os error 11)
Stack backtrace:
0: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
1: bupstash::client::SendSession::send_dir
2: bupstash::client::send
3: bupstash::put_main
4: bupstash::main
5: std::sys_common::backtrace::__rust_begin_short_backtrace
6: main

I'll try to get a better trace with line numbers, but this might already be sufficient for tracking it down.

This might very well be an error caused on my side due to how I deploy it (systemd service), but currently I don't understand enough about what's going wrong there to be sure.

@piegamesde
Copy link
Contributor Author

Got another backtrace, not sure if a good one:

0: bupstash::put_main::{{closure}}
1: bupstash::put::BatchFileProcessor::process_batch
2: core::ops::function::FnOnce::call_once{{vtable.shim}}
3: std::sys_common::backtrace::__rust_begin_short_backtrace
4: core::ops::function::FnOnce::call_once{{vtable.shim}}
5: std::sys::unix::thread::Thread::new::thread_start
6: start_thread
7: clone3

@andrewchambers
Copy link
Owner

andrewchambers commented Apr 6, 2023

It looks like your OS is not letting you create new processes/threads - you have run out of them for some reason.

I think the solution is to find every place bupstash is spawning threads and add better error messages and also reevaluate the defaults for how many threads are being created.

For your system as a work around I think you need to find the right resource limit to increase.

@piegamesde
Copy link
Contributor Author

I think it can't be threads, because then it would be a panic, but instead put fails on an Err

@andrewchambers
Copy link
Owner

Is the error in clone3? that looks like creating a thread, see https://man.archlinux.org/man/clone3.2.en

@piegamesde
Copy link
Contributor Author

piegamesde commented Apr 12, 2023

You are reading the backtrace backwards, the most recent call is at the top. So clone is not where it ends, but instead where it starts, which makes sense. I added some .context to various error sites:

bupstash put: Failed to backup the data
Caused by:
    0: Failed to process a batch
    1: Failed to log file action
    2: Resource temporarily unavailable (os error 11)
Stack backtrace:
   0: bupstash::put_main::{{closure}}
   1: bupstash::put::BatchFileProcessor::process_batch
   2: core::ops::function::FnOnce::call_once{{vtable.shim}}
   3: std::sys_common::backtrace::__rust_begin_short_backtrace
   4: core::ops::function::FnOnce::call_once{{vtable.shim}}
   5: std::sys::unix::thread::Thread::new::thread_start
   6: start_thread
   7: clone3

So the error happens while trying to log the file action to stderr for some reason. Now I can finally debug why that happens (probably some resource limit somewhere?) and how to best fix it


As per travis-ci/travis-ci#4704 (comment), it might be the case that stderr is somehow opened in non-blocking mode (n.b. "Resource temporarily unavailable" is the name for EAGAIN as per errno -l). And it works fine most of the time because the buffer is usually large enough, except when logging a lot of information at once. I see that we are creating a progress bar even if it is hidden, maybe the progress libraries below set the file to nonblocking?

@andrewchambers
Copy link
Owner

andrewchambers commented Apr 13, 2023

Great detective work - sorry for reading it backwards - must be tired. That would make some sense, stderr might be backed up on a busy system when writing to the systemd journal.

@piegamesde piegamesde changed the title OS error 11: resource temporarily unavailable EAGAIN / EWOULDBLOCK when writing logs to stderr, causing failure Apr 13, 2023
@piegamesde
Copy link
Contributor Author

fcntl(2, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)

I can now confirm that stderr is set to non-blocking mode for some reason, that this is the failure cause and that it already is set like that from the very first log_fn call.

I am not sure why and how and where it is set to non-blocking though, my strace skills are honestly not strong enough for that. (I thought I found something interesting, then it turned out to be in the ssh sub-process and thus irrelevant)

@piegamesde

This comment was marked as outdated.

piegamesde added a commit to piegamesde/bupstash that referenced this issue Apr 17, 2023
piegamesde added a commit to piegamesde/bupstash that referenced this issue Apr 17, 2023
@piegamesde
Copy link
Contributor Author

I thought I found something interesting, then it turned out to be in the ssh sub-process and thus irrelevant

Well, turns out that fd properties may be shared across processes, so this is not only relevant but turns out to actually be the root cause. #382 has a workaround-fix for it.

@jtrees
Copy link

jtrees commented Dec 25, 2023

I think I may also be running into this. Tried setting up a systemd service but it won't run for more than a second or so.

@piegamesde piegamesde linked a pull request Feb 3, 2024 that will close this issue
piegamesde added a commit to piegamesde/bupstash that referenced this issue Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants