Skip to content

Commit

Permalink
Start fleshing out text of stages
Browse files Browse the repository at this point in the history
  • Loading branch information
tokenrove committed Oct 17, 2017
1 parent 5b5fe8d commit 1470e32
Show file tree
Hide file tree
Showing 8 changed files with 559 additions and 102 deletions.
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,13 +115,14 @@ of the shells that were written as a result of this workshop here.
## Documents

- [Advanced Programming in the Unix Environment] by Stevens covers
all this stuff and is a must-read.
all this stuff and is a must-read. I call this *APUE* throughout
this tutorial.
- Chet Ramey describes [the Bourne-Again Shell] in [the Architecture
of Open Source Applications]; this is probably the best thing to
read to understand the structure of a real shell.
- Michael Kerrisk's [the Linux Programming Interface], though fairly
Linux-specific, has some great coverage of many of the topics we'll
touch on;
touch on. I call this *LPI* throughout this tutorial.
- [Unix system programming in OCaml] shows the development of a simple shell.
- [Advanced Unix Programming] by Rochkind; chapter 5 has a simple shell.
- the [tour of the Almquist shell] is outdated but may help you find
Expand All @@ -135,10 +136,10 @@ go far enough, but all of these are worth reading, especially if
you're having trouble with a stage they cover:

- Stephen Brennan's [Write a Shell in C] is a more detailed look at
what is [stage 1](stage_1) here.
what is [stage 1](stage_1.md) here.
- Jesse Storimer's [A Unix Shell in Ruby] gets as far as pipes.
- Nelson Elhage's [Signalling and Job Control] covers some
of [stage 3](stage_3)'s material.
of [stage 3](stage_3.md)'s material.

### References

Expand Down Expand Up @@ -168,11 +169,11 @@ you're having trouble with a stage they cover:
- [zsh]: C; extremely maximal.
- [fish]: C++11; has expect-based interactive tests.
- [Thompson shell]: C; the original Unix shell; very minimal.
- [xonsh]: Python.
- [scsh]: Scheme and C; intended for scripting.
- [cash]: OCaml; based on scsh.
- [eshell]: Emacs Lisp.
- [oil]: Python and C++; has an extensive test suite.
- [xonsh]: Python.
- [oh]: Go.

[busybox]: https://git.busybox.net/busybox/tree/shell
Expand Down
158 changes: 129 additions & 29 deletions stage_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@

## Ingredients

At least `fork(2)`, `execve(2)`, and `wait(2)`.

`chdir(2)`
- [`fork(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html)
([Linux](http://man7.org/linux/man-pages/man2/fork.2.html),
[FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=fork&manpath=FreeBSD+11.1-RELEASE+and+Ports))
- [`execve(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html)
- [`wait(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html)
- [`chdir(2)`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/chdir.html)

## Instructions

Expand All @@ -21,44 +24,116 @@ loop
waitpid(pid)
```

To that, we should add at least a builtin for `cd`, which will call
`chdir(2)` with the supplied argument. Think about why we couldn't
implement this as an external command.
(Warning: note that most of these functions can fail; you'll want
error handling in your actual code.)

We'll do a bit of extra work in this stage, compared to many simple
shell tutorials, since it will prepare us for the later stages.

To run the test suite, invoke it as `./validate /path/to/your/shell`.

### Basics

You should
- make your shell print the `$ ` prompt;
- accept a line of input;
- `chdir` if the command is `cd`, otherwise
- execute that as a command (which might be absolute, relative, or in
`PATH`) with space-delimited arguments;
- accept a line of input, and split it on space;
- if the first word is a builtin (a command your shell will handle
itself instead of executing an external program), call it;
otherwise
- execute that as a command with space-delimited arguments;
- repeat until you receive EOF on `stdin`.

### Executing a command

First, we `fork`, which creates a child process whose state is a copy
of our shell's; the memory looks the same, and any file descriptors
open are the same. After the fork, there are two parallel universes
happening: the parent (who gets the process ID of the child from
`fork`) and the child (who gets 0 from `fork`).

In the child, we `execve` the command we want to run, with its
arguments; this replaces the running process (the child copy of the
shell) with the new command. So now we have both our shell and the
command running, and the shell knows the process ID of the command.
(See Patrick Mooney's talk [On Wings of
exec(2)](https://systemswe.love/archive/minneapolis-2017/patrick-mooney)
for deeper details of what happens after we `execve`.)

The parent now waits for the child to complete; we do this with
`wait`. This lets the shell sleep while the command runs, and wakes
us up with the exit status of the command.

This part is explained in a lot of detail in the resources listed in
[the main README](README.md), so if this isn't clear, please refer to
the books and tutorials cited.

### Running the tests

This should be enough to get at least the first test of stage 1 to
pass. Run `./validate ../path-to-my-shell/my-shell` and see how far
you get.

The test suite will try to execute various well-known POSIX commands
inside your shell. Make sure actual binaries of `true`, `false`,
`ls`, and `echo` are in your PATH.

Let's also allow `\` at the end of a line as a continuation character.
Print `> ` as the prompt while reading a continued line.
`cat`, `pwd`, and `echo` are in `/bin`.

About the prompt: the test suite avoids testing the specific prompt,
because it turns out this is something people like to have fun with,
but I recommend emitting at least some prompt for each of the cases I
mention, because this will make it easier for you to debug your shell,
and eventually to use it.
and eventually to use it. Unfortunately, if your default prompt and
display is too elaborate (e.g.: `zsh` with RPROMPT, `fish`), the tests
may not work, as they are not terribly robust.

### The `cd` builtin

To that, we should add at least a builtin for `cd`, which will call
`chdir(2)` with the supplied argument. Think about why we couldn't
implement this as an external command. If you're not sure, try
implementing `cd` as a standalone binary that invokes `chdir` and see
what happens. You can exec `pwd` or call `getcwd(3)` to find out the
current directory: you might want to print it as part of your prompt.

### Searching `PATH`

We don't want to have to type the explicit path to every command
you're executing. One of the crucial conveniences of every shell is
that, if we find an unqualified command name (one that doesn't contain
the `/` character), we look for it in every directory specified in the
`PATH` environment variable.

Many languages provide versions of `exec` that do some of this extra
work for you. That's probably fine as long as you know who is
responsible for what. When you add completion in stage 5, you'll want
to be able to search the path yourself.

We'll talk more about the enviroment in stage 4. For now, you can use
whatever your language provides for accessing environment variables
(`getenv(3)` in C) to get the value of `PATH`, then split it on the
`:` character to get a list of directories to search.

For a more rigorous specification, see [Command Search and Execution]
in the POSIX standard. (This also tells you when you're allowed to
cache the location of a command.) `PATH` itself is described in
[Other Environment Variables].

[Command Search and Execution]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_01_01
[Other Environment Variables]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03

### Exit status and `!`

- how to get the exit status, and its general importance
We mentioned that you get the exit status of the child from `wait`.
Don't just throw this away; this will be a key part of the shell's
power, and it's why there are commands like `/bin/true` and
`/bin/false`.

If the command begins with `!` (separated by whitespace), we will
negate the exit status, which will come in handy shortly.
There is an unfortunate amount of information packed into the status
returned to us by `wait`, but we're only interested in *exit status*
right now. The Unix convention is that a zero exit status represents
success, and any non-zero exit status represents failure.

If the first word we read is `!`, we negate the exit status of what
follows. Note that's word, not character, so `! true` returns a
non-zero exit code, but `!true` is not defined in POSIX, and is
usually part of a history mechanism in `ksh` descendents.

**Bonus:** Change the prompt to red if the last command run exited
with a non-zero status? (Don't worry about termcap and supporting
Expand All @@ -67,8 +142,6 @@ for now.)

[ANSI escape codes]: https://en.wikipedia.org/wiki/ANSI_escape_code#Colors

### `exit` builtin

### Lists of commands

Split your input on `;`, `&&`, and `||`. Commands separated by any of
Expand All @@ -85,8 +158,15 @@ false && echo foo || echo bar
true || echo foo && echo bar
```

(The standard calls these "sequential lists", "AND lists", and "OR
lists", respectively.)
The standard calls these *sequential lists*, *AND lists*, and *OR
lists*, respectively. See section 2.9.3, [Lists]. We'll look at
*asynchronous lists* in stage 3.

(If you want to add support for compound lists surrounded by braces,
go ahead, but I didn't consider them important enough for an
interactive shell to bother testing them.

[Lists]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_03

### `exec` builtin

Expand All @@ -103,7 +183,7 @@ The `exec` builtin has another use we will discuss in the next stage.
### Subshells

Now, for the other half of `fork`/`exec` -- what if we `fork` (and
`wait`), but don't `exec`? If a command is surrounded in parenthesis,
`wait`), but don't `exec`? If a command is surrounded in parentheses,
we fork, and in the parent, wait for the child as we normally would,
and in the child, we process the command normally.

Expand All @@ -115,10 +195,25 @@ This allows us to do things in an isolated shell environment.

Should print `/tmp` and whatever your previous working directory was.

(This isn't that important, relative to the other things we're going
to talk about, but having enough of a parser in place that you can
read the parentheses correctly will be helpful for the following
stages.)
This isn't that important, relative to the other things we're going to
talk about, but having enough of a parser in place that you can read
the parentheses correctly will be helpful for the following stages.

### Line continuation

Let's allow `\` at the end of a line as a continuation character.
Print `> ` as the prompt while reading a continued line.

Likewise, if a line ends with `&&` or `||`, you'll want to continue
reading on the next line as part of the same list.

### Aside: parsing

See [Token Recognition] in the standard. Writing a tokenizer that
follows this, ignoring the parts we aren't doing yet, will make
writing your parser easier. I'll expand this section more, shortly.

[Token Recognition]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03

## Notes

Expand Down Expand Up @@ -161,6 +256,11 @@ Compare the various libc implementations of `posix_spawn`:

See also https://ewontfix.com/7/

### Further Reading

Stevens, APUE, chapter 8.
Kerrisk, LPI, chapters 24 through 28.

["A much faster popen and system implementation for Linux"]: https://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/
[FreeBSD's posix_spawn]: https://github.com/freebsd/freebsd/blob/master/lib/libc/gen/posix_spawn.c
[glibc's generic implementation of posix_spawn]: https://github.com/bminor/glibc/blob/master/sysdeps/posix/spawni.c
Expand Down
2 changes: 2 additions & 0 deletions stage_1/10-multiline-list.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
→ true &&⏎false ||⏎echo foo ||⏎echo bar⏎
← foo
Loading

0 comments on commit 1470e32

Please sign in to comment.