From 1470e32543e8bf9402db4057abe830e1cbe49bdb Mon Sep 17 00:00:00 2001 From: Julian Squires Date: Thu, 12 Oct 2017 12:51:31 -0400 Subject: [PATCH] Start fleshing out text of stages --- README.md | 11 +- stage_1.md | 158 +++++++++++++++++++++++------ stage_1/10-multiline-list.t | 2 + stage_2.md | 98 ++++++++++++------ stage_2/08-builtin-in-pipeline.t | 4 + stage_3.md | 147 +++++++++++++++++++++++++-- stage_4.md | 167 ++++++++++++++++++++++++++++--- stage_5.md | 74 +++++++++++--- 8 files changed, 559 insertions(+), 102 deletions(-) create mode 100644 stage_1/10-multiline-list.t create mode 100644 stage_2/08-builtin-in-pipeline.t diff --git a/README.md b/README.md index d2f3044..b196186 100644 --- a/README.md +++ b/README.md @@ -115,13 +115,14 @@ of the shells that were written as a result of this workshop here. ## Documents - [Advanced Programming in the Unix Environment] by Stevens covers - all this stuff and is a must-read. + all this stuff and is a must-read. I call this *APUE* throughout + this tutorial. - Chet Ramey describes [the Bourne-Again Shell] in [the Architecture of Open Source Applications]; this is probably the best thing to read to understand the structure of a real shell. - Michael Kerrisk's [the Linux Programming Interface], though fairly Linux-specific, has some great coverage of many of the topics we'll - touch on; + touch on. I call this *LPI* throughout this tutorial. - [Unix system programming in OCaml] shows the development of a simple shell. - [Advanced Unix Programming] by Rochkind; chapter 5 has a simple shell. - the [tour of the Almquist shell] is outdated but may help you find @@ -135,10 +136,10 @@ go far enough, but all of these are worth reading, especially if you're having trouble with a stage they cover: - Stephen Brennan's [Write a Shell in C] is a more detailed look at - what is [stage 1](stage_1) here. + what is [stage 1](stage_1.md) here. - Jesse Storimer's [A Unix Shell in Ruby] gets as far as pipes. - Nelson Elhage's [Signalling and Job Control] covers some - of [stage 3](stage_3)'s material. + of [stage 3](stage_3.md)'s material. ### References @@ -168,11 +169,11 @@ you're having trouble with a stage they cover: - [zsh]: C; extremely maximal. - [fish]: C++11; has expect-based interactive tests. - [Thompson shell]: C; the original Unix shell; very minimal. - - [xonsh]: Python. - [scsh]: Scheme and C; intended for scripting. - [cash]: OCaml; based on scsh. - [eshell]: Emacs Lisp. - [oil]: Python and C++; has an extensive test suite. + - [xonsh]: Python. - [oh]: Go. [busybox]: https://git.busybox.net/busybox/tree/shell diff --git a/stage_1.md b/stage_1.md index 161f793..ffb45c4 100644 --- a/stage_1.md +++ b/stage_1.md @@ -2,9 +2,12 @@ ## Ingredients -At least `fork(2)`, `execve(2)`, and `wait(2)`. - -`chdir(2)` + - [`fork(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html) + ([Linux](http://man7.org/linux/man-pages/man2/fork.2.html), + [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=fork&manpath=FreeBSD+11.1-RELEASE+and+Ports)) + - [`execve(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html) + - [`wait(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html) + - [`chdir(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/chdir.html) ## Instructions @@ -21,44 +24,116 @@ loop waitpid(pid) ``` -To that, we should add at least a builtin for `cd`, which will call -`chdir(2)` with the supplied argument. Think about why we couldn't -implement this as an external command. +(Warning: note that most of these functions can fail; you'll want +error handling in your actual code.) We'll do a bit of extra work in this stage, compared to many simple shell tutorials, since it will prepare us for the later stages. -To run the test suite, invoke it as `./validate /path/to/your/shell`. - ### Basics You should - make your shell print the `$ ` prompt; - - accept a line of input; - - `chdir` if the command is `cd`, otherwise - - execute that as a command (which might be absolute, relative, or in - `PATH`) with space-delimited arguments; + - accept a line of input, and split it on space; + - if the first word is a builtin (a command your shell will handle + itself instead of executing an external program), call it; + otherwise + - execute that as a command with space-delimited arguments; - repeat until you receive EOF on `stdin`. +### Executing a command + +First, we `fork`, which creates a child process whose state is a copy +of our shell's; the memory looks the same, and any file descriptors +open are the same. After the fork, there are two parallel universes +happening: the parent (who gets the process ID of the child from +`fork`) and the child (who gets 0 from `fork`). + +In the child, we `execve` the command we want to run, with its +arguments; this replaces the running process (the child copy of the +shell) with the new command. So now we have both our shell and the +command running, and the shell knows the process ID of the command. +(See Patrick Mooney's talk [On Wings of +exec(2)](https://systemswe.love/archive/minneapolis-2017/patrick-mooney) +for deeper details of what happens after we `execve`.) + +The parent now waits for the child to complete; we do this with +`wait`. This lets the shell sleep while the command runs, and wakes +us up with the exit status of the command. + +This part is explained in a lot of detail in the resources listed in +[the main README](README.md), so if this isn't clear, please refer to +the books and tutorials cited. + +### Running the tests + +This should be enough to get at least the first test of stage 1 to +pass. Run `./validate ../path-to-my-shell/my-shell` and see how far +you get. + The test suite will try to execute various well-known POSIX commands inside your shell. Make sure actual binaries of `true`, `false`, -`ls`, and `echo` are in your PATH. - -Let's also allow `\` at the end of a line as a continuation character. -Print `> ` as the prompt while reading a continued line. +`cat`, `pwd`, and `echo` are in `/bin`. About the prompt: the test suite avoids testing the specific prompt, because it turns out this is something people like to have fun with, but I recommend emitting at least some prompt for each of the cases I mention, because this will make it easier for you to debug your shell, -and eventually to use it. +and eventually to use it. Unfortunately, if your default prompt and +display is too elaborate (e.g.: `zsh` with RPROMPT, `fish`), the tests +may not work, as they are not terribly robust. + +### The `cd` builtin + +To that, we should add at least a builtin for `cd`, which will call +`chdir(2)` with the supplied argument. Think about why we couldn't +implement this as an external command. If you're not sure, try +implementing `cd` as a standalone binary that invokes `chdir` and see +what happens. You can exec `pwd` or call `getcwd(3)` to find out the +current directory: you might want to print it as part of your prompt. + +### Searching `PATH` + +We don't want to have to type the explicit path to every command +you're executing. One of the crucial conveniences of every shell is +that, if we find an unqualified command name (one that doesn't contain +the `/` character), we look for it in every directory specified in the +`PATH` environment variable. + +Many languages provide versions of `exec` that do some of this extra +work for you. That's probably fine as long as you know who is +responsible for what. When you add completion in stage 5, you'll want +to be able to search the path yourself. + +We'll talk more about the enviroment in stage 4. For now, you can use +whatever your language provides for accessing environment variables +(`getenv(3)` in C) to get the value of `PATH`, then split it on the +`:` character to get a list of directories to search. + +For a more rigorous specification, see [Command Search and Execution] +in the POSIX standard. (This also tells you when you're allowed to +cache the location of a command.) `PATH` itself is described in +[Other Environment Variables]. + +[Command Search and Execution]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01_01 +[Other Environment Variables]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03 ### Exit status and `!` -- how to get the exit status, and its general importance +We mentioned that you get the exit status of the child from `wait`. +Don't just throw this away; this will be a key part of the shell's +power, and it's why there are commands like `/bin/true` and +`/bin/false`. -If the command begins with `!` (separated by whitespace), we will -negate the exit status, which will come in handy shortly. +There is an unfortunate amount of information packed into the status +returned to us by `wait`, but we're only interested in *exit status* +right now. The Unix convention is that a zero exit status represents +success, and any non-zero exit status represents failure. + +If the first word we read is `!`, we negate the exit status of what +follows. Note that's word, not character, so `! true` returns a +non-zero exit code, but `!true` is not defined in POSIX, and is +usually part of a history mechanism in `ksh` descendents. **Bonus:** Change the prompt to red if the last command run exited with a non-zero status? (Don't worry about termcap and supporting @@ -67,8 +142,6 @@ for now.) [ANSI escape codes]: https://en.wikipedia.org/wiki/ANSI_escape_code#Colors -### `exit` builtin - ### Lists of commands Split your input on `;`, `&&`, and `||`. Commands separated by any of @@ -85,8 +158,15 @@ false && echo foo || echo bar true || echo foo && echo bar ``` -(The standard calls these "sequential lists", "AND lists", and "OR -lists", respectively.) +The standard calls these *sequential lists*, *AND lists*, and *OR +lists*, respectively. See section 2.9.3, [Lists]. We'll look at +*asynchronous lists* in stage 3. + +(If you want to add support for compound lists surrounded by braces, +go ahead, but I didn't consider them important enough for an +interactive shell to bother testing them. + +[Lists]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_03 ### `exec` builtin @@ -103,7 +183,7 @@ The `exec` builtin has another use we will discuss in the next stage. ### Subshells Now, for the other half of `fork`/`exec` -- what if we `fork` (and -`wait`), but don't `exec`? If a command is surrounded in parenthesis, +`wait`), but don't `exec`? If a command is surrounded in parentheses, we fork, and in the parent, wait for the child as we normally would, and in the child, we process the command normally. @@ -115,10 +195,25 @@ This allows us to do things in an isolated shell environment. Should print `/tmp` and whatever your previous working directory was. -(This isn't that important, relative to the other things we're going -to talk about, but having enough of a parser in place that you can -read the parentheses correctly will be helpful for the following -stages.) +This isn't that important, relative to the other things we're going to +talk about, but having enough of a parser in place that you can read +the parentheses correctly will be helpful for the following stages. + +### Line continuation + +Let's allow `\` at the end of a line as a continuation character. +Print `> ` as the prompt while reading a continued line. + +Likewise, if a line ends with `&&` or `||`, you'll want to continue +reading on the next line as part of the same list. + +### Aside: parsing + +See [Token Recognition] in the standard. Writing a tokenizer that +follows this, ignoring the parts we aren't doing yet, will make +writing your parser easier. I'll expand this section more, shortly. + +[Token Recognition]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03 ## Notes @@ -161,6 +256,11 @@ Compare the various libc implementations of `posix_spawn`: See also https://ewontfix.com/7/ +### Further Reading + +Stevens, APUE, chapter 8. +Kerrisk, LPI, chapters 24 through 28. + ["A much faster popen and system implementation for Linux"]: https://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/ [FreeBSD's posix_spawn]: https://github.com/freebsd/freebsd/blob/master/lib/libc/gen/posix_spawn.c [glibc's generic implementation of posix_spawn]: https://github.com/bminor/glibc/blob/master/sysdeps/posix/spawni.c diff --git a/stage_1/10-multiline-list.t b/stage_1/10-multiline-list.t new file mode 100644 index 0000000..c78e190 --- /dev/null +++ b/stage_1/10-multiline-list.t @@ -0,0 +1,2 @@ +→ true &&⏎false ||⏎echo foo ||⏎echo bar⏎ +← foo diff --git a/stage_2.md b/stage_2.md index 86fd5aa..20af0d7 100644 --- a/stage_2.md +++ b/stage_2.md @@ -2,7 +2,18 @@ ## Dramatis Personae -`dup2(2)`, `pipe(2)`, `open(2)`, `close(2)` + - [`dup2(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/dup.html) + ([Linux](http://man7.org/linux/man-pages/man2/dup2.2.html), + [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=dup2&sektion=2&manpath=FreeBSD+11.1-RELEASE+and+Ports)) + - [`pipe(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/pipe.html) + ([Linux](http://man7.org/linux/man-pages/man2/pipe.2.html), + [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=pipe&sektion=2&manpath=FreeBSD+11.1-RELEASE+and+Ports)) + - [`open(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html) + ([Linux](http://man7.org/linux/man-pages/man2/open.2.html), + [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=open&sektion=2&manpath=FreeBSD+11.1-RELEASE+and+Ports)) + - [`close(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html) + ([Linux](http://man7.org/linux/man-pages/man2/close.2.html), + [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=close&sektion=2&manpath=FreeBSD+11.1-RELEASE+and+Ports)) ## Prologue @@ -12,23 +23,9 @@ correspond to standard input (stdin), output (stdout), and error feature: pipes and redirections. First, we'll add a bit more syntax. This is where a simple parsing -framework might start to be useful. - -You'll need to separate your input by occurrances of `|`, and for each -command, find expressions of the form `n< path`, `n> path`, and `n>> -path`, where `n` is an optional integer. - -If you'd like to handle quoting now, you might find it handy to look -at the [POSIX shell grammar] specification. I don't think you -actually need to do this yet, though. Just consider in your design -that splitting on whitespace won't be enough. - -Note that you now may have to read multiple lines of input, if the -last token read is a pipe. Take this opportunity to also support `\` -(backslash) as a line-continuation character. - -If you're planning to use a parser generator, consider -[what Chet Ramey has to say] about bash and bison: +framework might start to be useful. If you're planning to use a +parser generator, consider [what Chet Ramey has to say] about bash and +bison: > One thing I've considered multiple times, but never done, is > rewriting the bash parser using straight recursive-descent rather @@ -38,6 +35,21 @@ If you're planning to use a parser generator, consider > bash from scratch, I probably would have written a parser by > hand. It certainly would have made some things easier. +You've already broken input up into lists; within each list, you have +conceptually one *pipeline*, which might be several commands connected +together. You'll need to split these pipelines by occurrances of `|`, +and for each command, find expressions of the form `n< path`, `n> +path`, and `n>> path`, where `n` is an optional integer. They don't +get passed to the underlying command. + +If you'd like to handle quoting now, you might find it handy to look +at the [POSIX shell grammar] specification. I don't think you +actually need to do this yet, though. Just consider in your design +that splitting on whitespace won't be enough. + +Note that if the last token read is a pipe, you'll need to continue +reading the next line, just like with `\`, `||`, and `&&`. + [POSIX shell grammar]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10 [what Chet Ramey has to say]: http://www.aosabook.org/en/bash.html @@ -90,37 +102,67 @@ The following table summarizes the fd redirections: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07 +Note that `!`, which you added in stage 1, may appear only at the +start of a pipeline, and affects the return value of the whole +pipeline. + +(`hush` uses the term *squirrel* to refer to an fd that's been +redirected where it wants to "squirrel away" the original fd, but I +initially thought it referred to `<&` and `&>`, which look like +squirrels; so now I'm calling those operators the squirrels.) + ## Act 2 -Let's also add process substitution. If you find `<(...)`, open a -pipe as before, but replace the substitution with a reference to -`/dev/fd/n`. +Let's also add process substitution and command substitution. + +If you find `$(...)`, execute the contents of the parentheses as shell +input, capture its output in a string, and insert it in the argument +list. (Feel free to also handle backtick syntax.) -Although it's not related, now that you're handling balanced -parenthesis syntax, let's add command substitution. If you find -`$(...)`, execute the program, capture its output in a string, and -insert it in the argument list. (Feel free to also handle backtick -syntax.) +If you find `<(...)`, open a pipe as before, but replace the +substitution with a reference to `/dev/fd/n`. This isn't POSIX sh, +but it's really convenient. ## Epilogue ### CLOEXEC -(why it's important, and why you can't use it here) +You might have a bunch of files open in your shell, for example, files +for maintaining history or configuration. Your children shouldn't +have to care about these files; file descriptors are a limited +resource and most peolple wouldn't appreciate having that limit +unnecessarily decreased just because you were lazy. How can you prevent your children from inheriting fds you don't want -them to have? +them to have? Classically, people would loop over some number of fds, +closing them, which is time-consuming and error-prone. A moderately +more recent idea is the `CLOEXEC` option on various fd-opening +syscalls, which tells the operating system to close this fd when +`execve` happens. A lot of modern programming language libraries do +this by default, so you may already be safe, but it's worth thinking +about, particularly when writing a library that might open fds. + +Recently, Linux [added an `fdmap` syscall] that could be used for +this. + +See Chris Siebenmann's [fork() and closing file descriptors] and +CERT's [FIO22-C] (close files before spawning processes). You can use tools like `lsof` to debug problems with fd redirection. Under Linux, you can also try running `ls -l /proc/self/fd` inside your shell, with various redirections, and see what happens. +[fork() and closing file descriptors]: https://utcc.utoronto.ca/~cks/space/blog/unix/ForkFDsAndRaces +[FIO22-C]: https://www.securecoding.cert.org/confluence/display/c/FIO22-C.+Close+files+before+spawning+processes +[added an `fdmap` syscall]: https://lwn.net/Articles/734709/ + ### Builtins So far your builtins have been simple and haven't interacted much with commands. What if we had a builtin in a pipeline? If you don't run the builtin in a subshell, the pipeline may stall. +So you'll need to fork, but only when in a multi-command pipeline. ### posix_spawn diff --git a/stage_2/08-builtin-in-pipeline.t b/stage_2/08-builtin-in-pipeline.t new file mode 100644 index 0000000..84ce145 --- /dev/null +++ b/stage_2/08-builtin-in-pipeline.t @@ -0,0 +1,4 @@ +→ echo foo | cd /tmp | pwd⏎ +≠ /tmp +→ echo foo | cd /tmp | echo bar⏎ +← bar diff --git a/stage_3.md b/stage_3.md index 64a0052..2d6b623 100644 --- a/stage_3.md +++ b/stage_3.md @@ -2,43 +2,168 @@ ## Reagents -`sigsetaction`, `sigwait`, `kill` - -`tcsetpgrp`, `setpgid`, `tcgetpgrp`, `killpg` + - [`sigaction`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html) + - [`tcsetpgrp`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/tcsetpgrp.html) + - [`setpgid`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/setpgrp.html) + - [`tcgetpgrp`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/tcgetpgrp.html) + - [`killpg`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/killpg.html) + - [`isatty`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/isatty.html) ## Synthesis -Extend your input handling to separate lines by `&` and `;`. +An outline: + +``` +open /dev/tty +ignore SIGTTOU in the shell +tcgetpgrp to get the current process group +create new pgrp (setpgid) with first child's PID +setpgid and tcsetpgrp in both child and parent +when returning control to the shell, tcsetpgrp to shell's pgrp +``` + +See APUE chapter 9, and glibc's [Implementing a Job Control Shell]. + +[Implementing a Job Control Shell]: https://www.gnu.org/software/libc/manual/html_node/Implementing-a-Shell.html + +### Background processes -A pipeline forms a single process group. +Extend your input handling to separate lines by `&`. Commands ending +with `&` are run in the background, which means you aren't going to +`wait()` on them after you've forked. Instead, you have a few choices +of how to deal with your background processes. + +When a child dies, the parent is notified with a signal, `SIGCHLD`. +Signals and the complications and dangers they present are too huge a +topic for this workshop, so I recommend APUE chapter 10 and LPI +chapters 20-22. You may want an event loop that can deal with signals for you, like `libev`. For diving deeper, take a look at `sigfd` on Linux and `kqueue`'s signal support on BSDs. +- different ways of dealing with waiting for children +- zombies +- orphaned process groups + +### Process groups + +A pipeline forms a single process group. This is called a *job*. + +When you create a pipeline, the first child should put itself in a new +process group, with `setpgid(getpid(), getpid())`, and every other +child should `setpgid(getpid(), pgrp_of_pipeline)` (you'll need the +parent to keep track of the first child's PID and make sure the other +children have access to it). You should do this in both the parent +and the child, to avoid races. + +This is also where those negative arguments to `kill(2)` come in +handy: when you send `SIGCONT`, you'll want to send it to the whole +process group, not just the child alone. + +### Terminal foreground process group + +How do chords like `^C` know to interrupt the foreground process and +not the shell? `tcsetpgrp` tells the tty driver, which is what +translates hitting `^C` into sending `SIGINT`, that the given process +group is the one in charge of the terminal right now. + +This is also prone to races, so you'll need to `tcsetpgrp` in both the +parent and the child. + +And crucially, you'll need to `tcsetpgrp` back to the shell's process +group every time control returns to the shell: when a foreground child +exits or is stopped. + +### Signals + We'll need to handle `SIGTSTP`, `SIGTTIN`, `SIGTTOU`, and we'll end up sending `SIGCONT`. -The built-ins `fg` and `bg` should send `SIGCONT` to the current job, -doing `waitpid` in the former case and continuing onwards in the -latter. +The built-ins `fg` and `bg` should send `SIGCONT` to the current job +(its process group), doing `waitpid` in the former case and continuing +onwards in the latter. + +From [hush.c:1640](https://git.busybox.net/busybox/tree/shell/hush.c#n1640) +```c +/* Basic theory of signal handling in shell +[...] + * Signals are handled only after each pipe ("cmd | cmd | cmd" thing) + * is finished or backgrounded. It is the same in interactive and + * non-interactive shells, and is the same regardless of whether + * a user trap handler is installed or a shell special one is in effect. + * ^C or ^Z from keyboard seems to execute "at once" because it usually + * backgrounds (i.e. stops) or kills all members of currently running + * pipe. +[...] + * Commands which are run in command substitution ("`cmd`") + * have SIGTTIN, SIGTTOU, SIGTSTP set to SIG_IGN. + * + * Ordinary commands have signals set to SIG_IGN/DFL as inherited + * by the shell from its parent. + * + * Signals which differ from SIG_DFL action + * (note: child (i.e., [v]forked) shell is not an interactive shell): + * + * SIGQUIT: ignore + * SIGTERM (interactive): ignore + * SIGHUP (interactive): + * send SIGCONT to stopped jobs, send SIGHUP to all jobs and exit + * SIGTTIN, SIGTTOU, SIGTSTP (if job control is on): ignore + * Note that ^Z is handled not by trapping SIGTSTP, but by seeing + * that all pipe members are stopped. Try this in bash: + * while :; do :; done - ^Z does not background it + * (while :; do :; done) - ^Z backgrounds it + * SIGINT (interactive): wait for last pipe, ignore the rest + * of the command line, show prompt. NB: ^C does not send SIGINT + * to interactive shell while shell is waiting for a pipe, + * since shell is bg'ed (is not in foreground process group). +``` + +### Job control + +Keep track of backgrounded jobs. `fg` brings the most recent +backgrounded job into the foreground (do the `tcsetpgrp` dance, send +`SIGCONT` if it was stopped, and `waitpid` for it). `^Z` will send +`SIGTSTP` to a foreground job and suspend it; `waitpid` will tell you +the child got suspended, so don't just forget about it; keep track of +it as a stopped job. `bg` puts the most recently stopped job into the +background: just send `SIGCONT` to it (and keep track of it), but +don't give it back the TTY. See https://blog.nelhage.com/2010/01/a-brief-introduction-to-termios-signaling-and-job-control/ For fun, you may want to implement `jobs` (to list running jobs). +From the bash source (jobs.c): + +```c + /* Set the process group before trying to mess with the terminal's + process group. This is mandated by POSIX. */ + /* This is in accordance with the Posix 1003.1 standard, + section B.7.2.4, which says that trying to set the terminal + process group with tcsetpgrp() to an unused pgrp value (like + this would have for the first child) is an error. Section + B.4.3.3, p. 237 also covers this, in the context of job control + shells. */ + if (setpgid (mypid, pipeline_pgrp) < 0) + sys_error (_("child setpgid (%ld to %ld)"), (long)mypid, (long)pipeline_pgrp); +``` + +### `SIGTTIN`/`SIGTTOU` + about sigttin/ttou: http://curiousthing.org/sigttin-sigttou-deep-dive-linux ### SIGHUP When you exit, you should send SIGHUP to your children. But first, -you will want to make sure they're not stopped, so continue all your -jobs then HUP them. +you will want to make sure they're not stopped, so send `SIGCONT` to +all of them first, then HUP them. You may implement the builtin `disown` to remove a job from the list of active jobs, so it won't be sent HUP in this case. -## +## Notes Interaction with builtins is again a strange topic. Compare how various shells handle sending `^Z` to `sleep 20 | false` and then diff --git a/stage_4.md b/stage_4.md index 51d3d49..f6cffeb 100644 --- a/stage_4.md +++ b/stage_4.md @@ -2,41 +2,176 @@ ## Ingredients -a map type + - a map type; + - [`getpwnam(3)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getpwnam.html); + - [`glob(3)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/glob.html); + - some way to get at the environment; this varies by language. -see also `setenv(3)`, `getenv(3)` +See for example +[`getenv(3)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html). ## Instructions -This is the point at which you may want to use a generated parser or -parser combinators rather than dealing with ad hoc parsing of user -input. +We'll implement the basics of environments, variables, and expansions, +because they're useful and good to understand. We will, however, +greatly simplify some things, because I don't think they matter too +much for an interactive shell. For example, shell purists will be +furious that we don't deal with `IFS`. Sorry. + +### Environment variables + +Extend your input parser so that, if a command starts with any words +containing the `=` character, these are treated as variable +assignments, and the actual command starts with the first word not +containing an `=` (and there might only be variable assignments on a +line, with no command). + +So you're looking to recognize something like: +``` +foo=baz bar=quux ls +``` + +This runs `ls` with an environment where `baz` is assigned to `foo` +(that is, `getenv("foo")` returns `"baz"`, or `echo $foo` echos `baz`) +and `quux` is assigned to `bar`. Note that the current shell's +environment is not modified by this. + +Meanwhile, +``` +foo=baz bar=quux +``` +without a command will set these variables in the local environment. + +A key thing here is how exported variables work. There is a lot of +misconception around when `export` needs to be called in shells. Once +a variable has been exported, it remains exported. Exported variables +will be passed on to child processes, while unexported variables are +only visible in the current shell. -See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_23 +let's look at how various libcs implement `getenv` and friends: +FreeBSD, musl, glibc -Each word containing the `=` character on the command line +From [hush.c:717](https://git.busybox.net/busybox/tree/shell/hush.c#n717): -`if` +```c +/* On program start, environ points to initial environment. + * putenv adds new pointers into it, unsetenv removes them. + * Neither of these (de)allocates the strings. + * setenv allocates new strings in malloc space and does putenv, + * and thus setenv is unusable (leaky) for shell's purposes */ +#define setenv(...) setenv_is_leaky_dont_use() +``` +You might enjoy ["the setenv fiasco"]. +### Special parameters -let's look at how various libcs implement `getenv` and friends: -FreeBSD, musl, glibc +There are some special "variables" that can be expanded with `$`, +which the standard calls *special parameters*. Most of them are +primarily useful for scripting, so we won't implement them. -You might enjoy ["the setenv fiasco"]. +You should implement `$?`, which expands to the exit status of the +last pipeline executed, `$$`, which expands to the PID of the current +shell, and `$!` which expands to the PID of the most recent background +command. + +See [Special Parameters] in the standard for the gory details. + +[Special Parameters]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02 + + +### Quoting + +Quoting in shell is notoriously hard, and the edge cases are still a +subject of active debate. Don't try for perfect; the important thing +is to really get an understanding of the difference between single +(`'`) and double (`"`) quotes; to understand why `"foo$(echo +"$bar")baz"` is safe and `"foo"$bar"baz"` isn't. + +I think the most common misconception among casual users of the shell +for scripting is that quotes form some kind of special data type; +implementing quoting and realizing it's all just strings will greatly +improve any shell scripts you write in the future. + +See 2.2 [Quoting] in the standard. + +[Quoting]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02 +### Heredocs -# Globbing +```shell +sudo -u postgres psql <