diff --git a/docs/EINTR.md b/docs/EINTR.md new file mode 100644 index 0000000000..8d1ab52e2d --- /dev/null +++ b/docs/EINTR.md @@ -0,0 +1,91 @@ +# EINTR + +## The problem + +If your code is blocked in a system call when a signal needs to be delivered, +the kernel needs to interrupt that system call. For something like a read(2) +call where some data has already been read, the call can just return with +what data it has. (This is one reason why read(2) sometimes returns less data +than you asked for, even though more data is available. It also explains why +such behavior is relatively rare, and a cause of bugs.) + +But what if read(2) hasn't read any data yet? Or what if you've made some other +system call, for which there is no equivalent "partial" success, such as +poll(2)? In poll(2)'s case, there's either something to report (in which +case the system call would already have returned), or there isn't. + +The kernel's solution to this problem is to return failure (-1) and set +errno to `EINTR`: "interrupted system call". + +### Can I just opt out? + +Technically, yes. In practice on Android, no. Technically if a signal's +disposition is set to ignore, the kernel doesn't even have to deliver the +signal, so your code can just stay blocked in the system call it was already +making. In practice, though, you can't guarantee that all signals are either +ignored or will kill your process... Unless you're a small single-threaded +C program that doesn't use any libraries, you can't realistically make this +guarantee. If any code has installed a signal handler, you need to cope with +`EINTR`. And if you're an Android app, the zygote has already installed a whole +host of signal handlers before your code even starts to run. (And, no, you +can't ignore them instead, because some of them are critical to how ART works. +For example: Java `NullPointerException`s are optimized by trapping `SIGSEGV` +signals so that the code generated by the JIT doesn't have to insert explicit +null pointer checks.) + +### Why don't I see this in Java code? + +You won't see this in Java because the decision was taken to hide this issue +from Java programmers. Basically, all the libraries like `java.io.*` and +`java.net.*` hide this from you. (The same should be true of `android.*` too, +so it's worth filing bugs if you find any exceptions that aren't documented!) + +### Why doesn't libc do that too? + +For most people, things would be easier if libc hid this implementation +detail. But there are legitimate use cases, and automatically retrying +would hide those. For example, you might want to use signals and `EINTR` +to interrupt another thread (in fact, that's how interruption of threads +doing I/O works in Java behind the scenes!). As usual, C/C++ choose the more +powerful but more error-prone option. + +## The fix + +### Easy cases + +In most cases, the fix is simple: wrap the system call with the +`TEMP_FAILURE_RETRY` macro. This is basically a while loop that retries the +system call as long as the result is -1 and errno is `EINTR`. + +So, for example: +``` + n = read(fd, buf, buf_size); // BAD! + n = TEMP_FAILURE_RETRY(read(fd, buf, buf_size)); // GOOD! +``` + +### close(2) + +TL;DR: *never* wrap close(2) calls with `TEMP_FAILURE_RETRY`. + +The case of close(2) is complicated. POSIX explicitly says that close(2) +shouldn't close the file descriptor if it returns `EINTR`, but that's *not* +true on Linux (and thus on Android). See +[Returning EINTR from close()](https://lwn.net/Articles/576478/) +for more discussion. + +Given that most Android code (and especially "all apps") are multithreaded, +retrying close(2) is especially dangerous because the file descriptor might +already have been reused by another thread, so the "retry" succeeds, but +actually closes a *different* file descriptor belonging to a *different* +thread. + +### Timeouts + +System calls with timeouts are the other interesting case where "just wrap +everything with `TEMP_FAILURE_RETRY()`" doesn't work. Because some amount of +time will have elapsed, you'll want to recalculate the timeout. Otherwise you +can end up with your 1 minute timeout being indefinite if you're receiving +signals at least once per minute, say. In this case you'll want to do +something like adding an explicit loop around your system call, calculating +the timeout _inside_ the loop, and using `continue` each time the system call +fails with `EINTR`.