Start fleshing out text of stages

tokenrove · Oct 17, 2017 · 1470e32 · 1470e32
1 parent 5b5fe8d
commit 1470e32
Show file tree

Hide file tree

Showing 8 changed files with 559 additions and 102 deletions.
diff --git a/README.md b/README.md
@@ -115,13 +115,14 @@ of the shells that were written as a result of this workshop here.
 ## Documents
 
  - [Advanced Programming in the Unix Environment] by Stevens covers
-   all this stuff and is a must-read.
+   all this stuff and is a must-read.  I call this *APUE* throughout
+   this tutorial.
  - Chet Ramey describes [the Bourne-Again Shell] in [the Architecture
    of Open Source Applications]; this is probably the best thing to
    read to understand the structure of a real shell.
  - Michael Kerrisk's [the Linux Programming Interface], though fairly
    Linux-specific, has some great coverage of many of the topics we'll
-   touch on;
+   touch on.  I call this *LPI* throughout this tutorial.
  - [Unix system programming in OCaml] shows the development of a simple shell.
  - [Advanced Unix Programming] by Rochkind; chapter 5 has a simple shell.
  - the [tour of the Almquist shell] is outdated but may help you find
@@ -135,10 +136,10 @@ go far enough, but all of these are worth reading, especially if
 you're having trouble with a stage they cover:
 
  - Stephen Brennan's [Write a Shell in C] is a more detailed look at
-   what is [stage 1](stage_1) here.
+   what is [stage 1](stage_1.md) here.
  - Jesse Storimer's [A Unix Shell in Ruby] gets as far as pipes.
  - Nelson Elhage's [Signalling and Job Control] covers some
-   of [stage 3](stage_3)'s material.
+   of [stage 3](stage_3.md)'s material.
 
 ### References
 
@@ -168,11 +169,11 @@ you're having trouble with a stage they cover:
  - [zsh]: C; extremely maximal.
  - [fish]: C++11; has expect-based interactive tests.
  - [Thompson shell]: C; the original Unix shell; very minimal.
- - [xonsh]: Python.
  - [scsh]: Scheme and C; intended for scripting.
  - [cash]: OCaml; based on scsh.
  - [eshell]: Emacs Lisp.
  - [oil]: Python and C++; has an extensive test suite.
+ - [xonsh]: Python.
  - [oh]: Go.
 
 [busybox]: https://git.busybox.net/busybox/tree/shell

diff --git a/stage_1.md b/stage_1.md
@@ -2,9 +2,12 @@
 
 ## Ingredients
 
-At least `fork(2)`, `execve(2)`, and `wait(2)`.
-
-`chdir(2)`
+ - [`fork(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html)
+   ([Linux](http://man7.org/linux/man-pages/man2/fork.2.html),
+   [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=fork&manpath=FreeBSD+11.1-RELEASE+and+Ports))
+ - [`execve(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html)
+ - [`wait(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html)
+ - [`chdir(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/chdir.html)
 
 ## Instructions
 
@@ -21,44 +24,116 @@ loop
     waitpid(pid)
 ```
 
-To that, we should add at least a builtin for `cd`, which will call
-`chdir(2)` with the supplied argument.  Think about why we couldn't
-implement this as an external command.
+(Warning: note that most of these functions can fail; you'll want
+error handling in your actual code.)
 
 We'll do a bit of extra work in this stage, compared to many simple
 shell tutorials, since it will prepare us for the later stages.
 
-To run the test suite, invoke it as `./validate /path/to/your/shell`.
-
 ### Basics
 
 You should
  - make your shell print the `$ ` prompt;
- - accept a line of input;
- - `chdir` if the command is `cd`, otherwise
- - execute that as a command (which might be absolute, relative, or in
-   `PATH`) with space-delimited arguments;
+ - accept a line of input, and split it on space;
+ - if the first word is a builtin (a command your shell will handle
+   itself instead of executing an external program), call it;
+   otherwise
+ - execute that as a command with space-delimited arguments;
  - repeat until you receive EOF on `stdin`.
 
+### Executing a command
+
+First, we `fork`, which creates a child process whose state is a copy
+of our shell's; the memory looks the same, and any file descriptors
+open are the same.  After the fork, there are two parallel universes
+happening: the parent (who gets the process ID of the child from
+`fork`) and the child (who gets 0 from `fork`).
+
+In the child, we `execve` the command we want to run, with its
+arguments; this replaces the running process (the child copy of the
+shell) with the new command.  So now we have both our shell and the
+command running, and the shell knows the process ID of the command.
+(See Patrick Mooney's talk [On Wings of
+exec(2)](https://systemswe.love/archive/minneapolis-2017/patrick-mooney)
+for deeper details of what happens after we `execve`.)
+
+The parent now waits for the child to complete; we do this with
+`wait`.  This lets the shell sleep while the command runs, and wakes
+us up with the exit status of the command.
+
+This part is explained in a lot of detail in the resources listed in
+[the main README](README.md), so if this isn't clear, please refer to
+the books and tutorials cited.
+
+### Running the tests
+
+This should be enough to get at least the first test of stage 1 to
+pass.  Run `./validate ../path-to-my-shell/my-shell` and see how far
+you get.
+
 The test suite will try to execute various well-known POSIX commands
 inside your shell.  Make sure actual binaries of `true`, `false`,
-`ls`, and `echo` are in your PATH.
-
-Let's also allow `\` at the end of a line as a continuation character.
-Print `> ` as the prompt while reading a continued line.
+`cat`, `pwd`, and `echo` are in `/bin`.
 
 About the prompt: the test suite avoids testing the specific prompt,
 because it turns out this is something people like to have fun with,
 but I recommend emitting at least some prompt for each of the cases I
 mention, because this will make it easier for you to debug your shell,
-and eventually to use it.
+and eventually to use it.  Unfortunately, if your default prompt and
+display is too elaborate (e.g.: `zsh` with RPROMPT, `fish`), the tests
+may not work, as they are not terribly robust.
+
+### The `cd` builtin
+
+To that, we should add at least a builtin for `cd`, which will call
+`chdir(2)` with the supplied argument.  Think about why we couldn't
+implement this as an external command.  If you're not sure, try
+implementing `cd` as a standalone binary that invokes `chdir` and see
+what happens.  You can exec `pwd` or call `getcwd(3)` to find out the
+current directory: you might want to print it as part of your prompt.
+
+### Searching `PATH`
+
+We don't want to have to type the explicit path to every command
+you're executing.  One of the crucial conveniences of every shell is
+that, if we find an unqualified command name (one that doesn't contain
+the `/` character), we look for it in every directory specified in the
+`PATH` environment variable.
+
+Many languages provide versions of `exec` that do some of this extra
+work for you.  That's probably fine as long as you know who is
+responsible for what.  When you add completion in stage 5, you'll want
+to be able to search the path yourself.
+
+We'll talk more about the enviroment in stage 4.  For now, you can use
+whatever your language provides for accessing environment variables
+(`getenv(3)` in C) to get the value of `PATH`, then split it on the
+`:` character to get a list of directories to search.
+
+For a more rigorous specification, see [Command Search and Execution]
+in the POSIX standard.  (This also tells you when you're allowed to
+cache the location of a command.)  `PATH` itself is described in
+[Other Environment Variables].
+
+[Command Search and Execution]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01_01
+[Other Environment Variables]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03
 
 ### Exit status and `!`
 
-- how to get the exit status, and its general importance
+We mentioned that you get the exit status of the child from `wait`.
+Don't just throw this away; this will be a key part of the shell's
+power, and it's why there are commands like `/bin/true` and
+`/bin/false`.
 
-If the command begins with `!` (separated by whitespace), we will
-negate the exit status, which will come in handy shortly.
+There is an unfortunate amount of information packed into the status
+returned to us by `wait`, but we're only interested in *exit status*
+right now.  The Unix convention is that a zero exit status represents
+success, and any non-zero exit status represents failure.
+
+If the first word we read is `!`, we negate the exit status of what
+follows.  Note that's word, not character, so `! true` returns a
+non-zero exit code, but `!true` is not defined in POSIX, and is
+usually part of a history mechanism in `ksh` descendents.
 
 **Bonus:** Change the prompt to red if the last command run exited
 with a non-zero status?  (Don't worry about termcap and supporting
@@ -67,8 +142,6 @@ for now.)
 
 [ANSI escape codes]: https://en.wikipedia.org/wiki/ANSI_escape_code#Colors
 
-### `exit` builtin
-
 ### Lists of commands
 
 Split your input on `;`, `&&`, and `||`.  Commands separated by any of
@@ -85,8 +158,15 @@ false && echo foo || echo bar
 true || echo foo && echo bar
 ```
 
-(The standard calls these "sequential lists", "AND lists", and "OR
-lists", respectively.)
+The standard calls these *sequential lists*, *AND lists*, and *OR
+lists*, respectively.  See section 2.9.3, [Lists].  We'll look at
+*asynchronous lists* in stage 3.
+
+(If you want to add support for compound lists surrounded by braces,
+go ahead, but I didn't consider them important enough for an
+interactive shell to bother testing them.
+
+[Lists]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_03
 
 ### `exec` builtin
 
@@ -103,7 +183,7 @@ The `exec` builtin has another use we will discuss in the next stage.
 ### Subshells
 
 Now, for the other half of `fork`/`exec` -- what if we `fork` (and
-`wait`), but don't `exec`?  If a command is surrounded in parenthesis,
+`wait`), but don't `exec`?  If a command is surrounded in parentheses,
 we fork, and in the parent, wait for the child as we normally would,
 and in the child, we process the command normally.
 
@@ -115,10 +195,25 @@ This allows us to do things in an isolated shell environment.
 
 Should print `/tmp` and whatever your previous working directory was.
 
-(This isn't that important, relative to the other things we're going
-to talk about, but having enough of a parser in place that you can
-read the parentheses correctly will be helpful for the following
-stages.)
+This isn't that important, relative to the other things we're going to
+talk about, but having enough of a parser in place that you can read
+the parentheses correctly will be helpful for the following stages.
+
+### Line continuation
+
+Let's allow `\` at the end of a line as a continuation character.
+Print `> ` as the prompt while reading a continued line.
+
+Likewise, if a line ends with `&&` or `||`, you'll want to continue
+reading on the next line as part of the same list.
+
+### Aside: parsing
+
+See [Token Recognition] in the standard.  Writing a tokenizer that
+follows this, ignoring the parts we aren't doing yet, will make
+writing your parser easier.  I'll expand this section more, shortly.
+
+[Token Recognition]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03
 
 ## Notes
 
@@ -161,6 +256,11 @@ Compare the various libc implementations of `posix_spawn`:
 
 See also https://ewontfix.com/7/
 
+### Further Reading
+
+Stevens, APUE, chapter 8.
+Kerrisk, LPI, chapters 24 through 28.
+
 ["A much faster popen and system implementation for Linux"]: https://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/
 [FreeBSD's posix_spawn]: https://github.com/freebsd/freebsd/blob/master/lib/libc/gen/posix_spawn.c
 [glibc's generic implementation of posix_spawn]: https://github.com/bminor/glibc/blob/master/sysdeps/posix/spawni.c

diff --git a/stage_1/10-multiline-list.t b/stage_1/10-multiline-list.t
@@ -0,0 +1,2 @@
+→ true &&⏎false ||⏎echo foo ||⏎echo bar⏎
+← foo
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		→ true &&⏎false \|\|⏎echo foo \|\|⏎echo bar⏎
		← foo