diff --git a/Implementation.md b/Implementation.md new file mode 100644 index 0000000..6831ac4 --- /dev/null +++ b/Implementation.md @@ -0,0 +1,599 @@ +# lmt - literate markdown tangle + +This file implements a tangle program for a literate programming style +where the code is weaved into markdown code blocks. There is no corresponding +weave, because the markdown itself can already be read as the documentation, +either through a text editor or through an online system that already renders +markdown such as GitHub. + +## Why? + +[Literate programming](https://en.wikipedia.org/wiki/Literate_programming) is a +style of programming where, instead of directly writing source code, the programmer +writes their reasoning in human prose, and intersperses fragments of code which +can be extracted into the compilable source code with one tool (called "tangle"), +and conversely can be converted to a human readable document explaining the code +with another (called "weave"). + +[Markdown](http://daringfireball.net/projects/markdown/syntax) is a plaintextish +format popular with programmers. It's simple, easy to write and already has support +for embedding code blocks using triple backticks (```), mostly for the purposes +of syntax highlighting in documentation. + +The existing literate programming for markdown tools seem too heavyweight for me, +and too much like learning a new domain specific language which defeats the +purpose of using markdown. + +The tool is written in Go, because the Go tooling (notably `go fmt`) lends itself well +to writing in this paradigm. + +## Syntax + +To be useful for literate programming code blocks need a few features that don't +exist in standard markdown: + +1. The ability to embed macros, which will get expanded upon tangle. +2. The ability to denote code blocks as the macro to be expanded when referenced. +3. The ability to either append to or replace code blocks/macros, so that we can + expand on our train of thought incrementally. +4. The ability to redirect a code block into a file (while expanding macros.) + +Since markdown codeblocks will already let you specify the language of the block +for syntax highlighting purposes by naming the language after the three backticks, we can extend that by adding the file/codeblock name on the same line, after +the language name. + +For a convention, we'll say that a string with quotations denotes the name of a +code block, and a string without quotations denotes a filename to put the code +block into. If a code block header ends in `+=` it'll mean "append to the named +code block", otherwise it'll mean "create or replace the existing code block." +We'll use a line inside of a code block containing nothing but a title inside +`<<<` and `>>>` (with optional whitespace) as a macro to expand, because it's a +convention that's unlikely to be used otherwise inside of source code in any +language. + +### Implementation/Example. + +The above paragraph fully defines our spec. So, an example of a file code block +might look like this: + +```go main.go +package main + +import ( + <<>> +) + +<<>> + +<<>> + +func main() { + <<
>> +} +``` + +For our implementation, we'll need to parse the markdown file one line at a +time, starting from the top to ensure we replace code blocks in the right +order (which file? We'll use the arguments from the command line). If there +are multiple files, we'll process them in the order they were passed on the +command line. + +For now, we don't need to process any command line arguments, we'll just assume +everything passed is a file. + +So an example of a named code block is like this: + +```go "main implementation" +files := os.Args +for _, file := range files { + <<>> +} +``` + +How do we process a file? We'll need to keep 2 maps: one for named macros, and +one for file output content. We won't do any expansion until all the files have +been processed, because a block might refer to another block that either hasn't +been defined yet, or later has its definition changed. Let's define our maps, +define a stub of a `process file` function, and redefine our +`main implementation` to take that into account. + +Let's define our maps, with some types defined for good measure: + +(Note: we start by adding a new macro reference to our "global variables" macro so that it can be redefined in further patches without overwriting other +content that was appended.) + +```go "global variables" +<<>> +``` + +Let's really define the `"global block variables"` types and maps, now: + +```go "global block variables" +type File string +type CodeBlock string +type BlockName string + +var blocks map[BlockName]CodeBlock +var files map[File]CodeBlock +``` + +We'll similarly add a `"ProcessFile Declaration"` macro to our +`"other functions"` macro, so that it can be redefined in later +patches. + +```go "other functions" +<<>> +``` + +And then define the function prototype, leaving the implementation +to a macro for now. + +```go "ProcessFile Declaration" +// Updates the blocks and files map for the markdown read from r. +func ProcessFile(r io.Reader) error { + <<>> +} +``` + +Our main function, recall, is going to initialize the program, +process each command line argument in order, and then output +files. + +```go "main implementation" +<<>> + +// os.Args[0] is the command name, "lmt". We don't want to process it. +for _, file := range os.Args[1:] { + <<>> + +} +<<>> +``` + +We used a few standard library packages, so let's import them before +declaring the blocks we just used. + +```go "main.go imports" +"fmt" +"os" +"io" +``` + +Initializing the maps is pretty straight forward (note: the source names the +following block `"Initialize"`): + +```go "Initialize" +// Initialize the maps +blocks = make(map[BlockName]CodeBlock) +files = make(map[File]CodeBlock) +``` + +Opening and processing files is fairly straight forward as well, since we +already declared the ProcessFile function and we just need to open the +file to turn it into an `io.Reader`: + +```go "Open and process file" +f, err := os.Open(file) +if err != nil { + fmt.Fprintln(os.Stderr, "error: ", err) + continue +} + +if err := ProcessFile(f); err != nil { + fmt.Fprintln(os.Stderr, "error: ", err) +} +// Don't defer since we're in a loop, we don't want to wait until the function +// exits. +f.Close() +``` + +### Processing Files + +Now that we've got the obvious overhead out of the way, we need to begin +implementing the code which parses a file. + +We'll start by scanning each line. The Go `bufio` package has a Reader which +has a `ReadString` method that will stop at a delimiter (in our case, '\n') + +We can use this bufio Reader to iterate through lines like so: + +```go "process file implementation" +scanner := bufio.NewReader(r) +var err error +var line string +for { + line, err = scanner.ReadString('\n') + switch err { + case io.EOF: + return nil + case nil: + // Nothing special + default: + return err + } + <<>> + +} +``` + +We'll need to import the `bufio` package which we just used too, by +appending it to `"main.go imports"`: + +```go "main.go imports" += +"bufio" +``` + +How do we handle a line? We'll need to keep track of a little state: + +1. Are we in a code block? +2. If so, what name or file is it for? +3. Are we ending a code block? If so, update the map (either replace or append.) + +So let's add a little state to our implementation: + +```go "process file implementation" +scanner := bufio.NewReader(r) +var err error +var line string + +var inBlock, appending bool +var bname BlockName +var fname File +var block CodeBlock + +for { + line, err = scanner.ReadString('\n') + switch err { + case io.EOF: + return nil + case nil: + // Nothing special + default: + return err + } + <<>> +} +``` + +We'll replace all of the variables with their zero value when we're not in a +block. + +The flow of handling a line will be something like: + +```go "Handle file line" +if inBlock { + if line == "```\n" { + <<>> + continue + } else { + <<>> + } +} else { + <<>> +} +``` + +Handling a code block line is easy, we just add it to the `block` if it's not +a block ending, and update the map/reset all the variables if it is. + +`"Handle Block line`": +```go "Handle block line" +block += CodeBlock(line) +``` + +`"Handle block ending"`: +```go "Handle block ending" +// Update the files map if it's a file. +if fname != "" { + if appending { + files[fname] += block + } else { + files[fname] = block + } +} + +// Update the named block map if it's a named block. +if bname != "" { + if appending { + blocks[bname] += block + } else { + blocks[bname] = block + } +} + +<<>> +``` + +Since we've used a `"Reset block flags"` macro, we need to define +it. We said we were going to reset our state variables to their zero +value. + +```go "Reset block flags" +inBlock = false +appending = false +bname = "" +fname = "" +block = "" +``` + +#### Processing Non-Block lines + +Processing non-block lines is easy, and we don't have to do anything since we +are only concerned with code blocks. + +We don't need to care and can just reset the flags. +Otherwise, for triple backticks, we can just check the first three characters +of the line (we don't care if there's a language specified or not). + +```go "Handle nonblock line" +if line == "" { + continue +} + +switch line[0] { +case '`': + <<>> +default: + <<>> +} +``` + +When a code block is reached we will need to reset the flags and parse the line +for the following information: + + - a filename + - a block name/label + - an append flag + +`"Check block start"`: +```go "Check block start" +if len(line) >= 3 && line[0:3] == "```" { + inBlock = true + <<>> +} +``` + +#### Parsing Headers With a Regex + +Parsing headers is a little more difficult, but shouldn't be too hard with +a regular expression. There's four potential components: + + 1. 3 or more '`' characters. We don't care how many there are. + 2. 0 or more non-whitespace characters, which will be the language type. + 3. 0 or more alphanumeric characters, which can be a file name. + 4. 0 or 1 string enclosed in quotation marks. + 5. It may or may not end in the string literal `+=`. + +So the regex will look something like ```/^(`+)([a-zA-Z0-9\.]*)("[.*]"){0,1}(+=){0,1}$/``` +(there are more characters that might be in a file name, but to keep the regex simple +we'll just assume letters, numbers, and dots.) + +That regex is already starting to look hairy, so instead let's split it up into +two: one for checking if it's a named block, and if that fails one for checking +if it's a file name. It means we can't have a block which is *both* a named +block and *also* goes into a filename, but that's probably not a very useful +case and can always be done with two blocks (one named, and a file which only +contains a macro expanding to the named block.) + +In fact, we'll put the whole thing into a function to make it easier to debug +and write tests if we want to. + +```go "Check block header" +fname, bname, appending = parseHeader(line) +// We're outside of a block, so just blindly reset it. +block = "" +``` + +Then we need to define our parseHeader function: + +`"other functions"`: +```go "other functions" += +<<>> +``` + +`"ParseHeader Declaration"`: +```go "ParseHeader Declaration" +func parseHeader(line string) (File, BlockName, bool) { + line = strings.TrimSpace(line) + <<>> +} +``` + +Our implementation is going to use a regex for a namedBlock, and compare the +line against it, so let's start by importing the regex package. + +```go "main.go imports" += +"regexp" +``` + +```go "parseHeader implementation" +namedBlockRe := regexp.MustCompile("^([`]+\\s?)[\\w\\+]*[\\s]*\"(.+)\"[\\s]*([+][=])?$") +matches := namedBlockRe.FindStringSubmatch(line) +if matches != nil { + return "", BlockName(matches[2]), (matches[3] == "+=") +} +<<>> +return "", "", false +``` + +There's no reason to constantly be re-compiling the namedBlockRe, we can just +make it global and compile it once on initialization. + +`"global variables" +=`: +```go "global variables" += +var namedBlockRe *regexp.Regexp +``` + +`"Initialize" +=`: +```go "Initialize" += +<<>> +``` + +`"Named block Regex"`: +```go "Namedblock Regex" +namedBlockRe = regexp.MustCompile("^([`]+\\s?)[\\w\\+]+[\\s]+\"(.+)\"[\\s]*([+][=])?$") +``` + +Then our parse implementation without the MustCompile is: + +`"parseHeader implementation"`: +```go "parseHeader implementation" +matches := namedBlockRe.FindStringSubmatch(line) +if matches != nil { + return "", BlockName(matches[2]), (matches[3] == "+=") +} +<<>> +return "", "", false +``` + +Checking a filename header is fairly simple: just make sure there's alphanumeric +characters or dots and no spaces. If it's neither, we can just return the zero +value, since the header must immediately preceed the code block according to our +specification. + +This time, we'll just go straight to declaring the regex as a global. + +`"global variables" +=`: +```go "global variables" += +var fileBlockRe *regexp.Regexp +``` + +`"Initialize" +=`: +```go "Initialize" += +<<>> +``` + +`"File block Regex"`: +```go "Fileblock Regex" +fileBlockRe = regexp.MustCompile("^([`]+\\s?)[\\w\\+]+[\\s]+([\\w\\.\\-\\/]+)[\\s]*([+][=])?$") +``` + +`"Check filename header"`: +```go "Check filename header" +matches = fileBlockRe.FindStringSubmatch(line) +if matches != nil { + return File(matches[2]), "", (matches[3] == "+=") +} +``` + +### Outputting The Files + +Now, we've finally finished processing the file, all that remains is going through +the output files that were declared, expanding the macros, and writing them to +disk. Since our files is a `map[File]CodeBlock`, we can define methods on +`CodeBlock` as needed for things like expanding the macros. + +Let's start by just ranging through our files map, and assuming there's a method +on code block which does the replacing for `"Output files"`. + +```go "Output files" +for filename, codeblock := range files { + f, err := os.Create(string(filename)) + if err != nil { + fmt.Fprintf(os.Stderr, "%v\n", err) + continue + } + fmt.Fprintf(f, "%s", codeblock.Replace()) + // We don't defer this so that it'll get closed before the loop finishes. + f.Close() + +} +``` + +Now, we'll have to declare the Replace() method that we just used. The Replace() +will operate on a codeblock, go through it line by line, check if the current +line is a macro, and if so replace the content (recursively). We can use +another regex to determine if it's a macro line, and we can use a scanner +similar to our markdown line scanner to our previous one, + +`"other functions" +=`: +```go "other functions" += +<<>> +``` + +`"Replace Declaration"`: +```go "Replace Declaration" +// Replace expands all macros in a CodeBlock and returns a CodeBlock with no +// references to macros. +func (c CodeBlock) Replace() (ret CodeBlock) { + <<>> +} +``` + +`"Replace codeblock implementation"`, as described above: +```go "Replace codeblock implementation" +scanner := bufio.NewReader(strings.NewReader(string(c))) + +for { + line, err := scanner.ReadString('\n') + // ReadString will eventually return io.EOF and this will return. + if err != nil { + return + } + <<>> +} +return +``` + +We'll have to import the strings package we just used to convert our CodeBlock +into an io.Reader: + +```go "main.go imports" += +"strings" +``` + +Now, our replacement regex should be fairly simple: + +`"global variables" +=`: +```go "global variables" += +var replaceRe *regexp.Regexp +``` + +`"Initialize" +=`: +```go "Initialize" += +<<>> +``` + +`"Replace Regex"`: +```go "Replace Regex" +replaceRe = regexp.MustCompile(`^[\s]*<<<(.+)>>>[\s]*$`) +``` + +Okay, so let's do the actual line handling. If it doesn't match, add it to `ret` +and go on to the next line. If it matches, look up the part that matched in +`blocks` and include the replaced CodeBlock from there. (If it doesn't exist, +we'll add the line unexpanded and print a warning.) + +```go "Handle replace line" +matches := replaceRe.FindStringSubmatch(line) +if matches == nil { + ret += CodeBlock(line) + continue +} +<<>> +``` + +Looking up a replacement is fairly straight forward, since we have a map by the +time this is called. + +```go "Lookup replacement and add to ret" +bname := BlockName(matches[1]) +if val, ok := blocks[bname]; ok { + ret += val.Replace() +} else { + fmt.Fprintf(os.Stderr, "Warning: Block named %s referenced but not defined.\n", bname) + ret += CodeBlock(line) +} +``` + +## Fin + +And now, our tool is finally done! We've implemented our `lmt` tangle tool, +and can use it to write other literate markdown style programs with the +same syntax. + +The output of running it on itself (including patches) and then running `go fmt`) +is in this repo to make it a go-gettable executable for bootstrapping purposes. + +If you're not familiar with `go get`, see the README for installation +instructions and a simple non-Go example. diff --git a/README.md b/README.md index 15aaffe..23b967f 100644 --- a/README.md +++ b/README.md @@ -1,574 +1,235 @@ -# lmt - literate markdown tangle +# Literate Markdown Tangle -This README describes a tangle program for a literate programming style -where the code is weaved into markdown code blocks. There is no corresponding -weave, because the markdown itself can already be read as the documentation, -either through a text editor or through an online system that already renders -markdown such as GitHub. +[lmt](https://github.com/driusan/lmt) is a tool for extracting text from the +code blocks in markdown files. It allows programmers to write in a [literate programming](https://en.wikipedia.org/wiki/Literate_programming) style using +markdown as the source language. -## Why? +## Installing lmt -[Literate programming](https://en.wikipedia.org/wiki/Literate_programming) is a -style of programming where, instead of directly writing source code, the programmer -writes their reasoning in human prose, and intersperses fragments of code which -can be extracted into the compilable source code with one tool (called "tangle"), -and conversely can be converted to a human readable document explaining the code -with another (called "weave"). +lmt is a self-contained Go program written using the LP paradigm. The source is committed alongside the markdown source of this repository for bootstrapping +purposes. -[Markdown](http://daringfireball.net/projects/markdown/syntax) is a plaintextish -format popular with programmers. It's simple, easy and already has support -for embedding code blocks using triple backticks (```), mostly for the purposes -of syntax highlighting in documentation. +You require the [Go language](https://golang.org/) if you don't already have it. -The existing literate programming for markdown tools seem too heavyweight for me, -and too much like learning a new domain specific language which defeats the -purpose of using markdown. +To build the tool: -I started tangling [the shell](https://github.com/driusan/dsh) that I was writing -to experiment with literate programming using copy and paste. It works, but is -cumbersome. This is a tool to automate that process. - -It's written in Go, because the Go tooling (notably `go fmt`) lends itself well -to writing in this paradigm. - -## Syntax - -To be useful for literate programming code blocks need a few features that don't -exist in standard markdown: - -1. The ability to embed macros, which will get expanded upon tangle. -2. The ability to denote code blocks as the macro to be expanded when referenced. -3. The ability to either append to or replace code blocks/macros, so that we can - expand on our train of thought incrementally. -4. The ability to redirect a code block into a file (while expanding macros.) - -Since markdown codeblocks will already let you specify the language of the block -for syntax highlighting purposes by naming the language after the three backticks, -my first thought was to put the file/codeblock name on the same line, after the -language name. - -For a convention, we'll say that a string with quotations denotes the name of a -code block, and a string without quotations denotes a filename to put the code -block into. If a code block header ends in `+=` it'll mean "append to the named -code block", otherwise it'll mean "create or replace the existing code block." -We'll use a line inside of a code block containing nothing but a title inside -`<<<` and `>>>` (with optional whitespace) as a macro to expand, because it's a -convention that's unlikely to be used otherwise inside of source code in any -language. - -### Implementation/Example. - -The above paragraph fully defines our spec. So, an example of a file code block -might look like this: - -```go main.go -package main - -import ( - <<>> -) - -<<>> - -<<>> - -func main() { - <<
>> -} -``` - -For our implementation, we'll need to parse the markdown file (which file? We'll -use the arguments from the command line) one line at a time, starting from the -top to ensure we replace code blocks in the right order. If there are multiple -files, we'll process them in the order they were passed on the command line. - -For now, we don't need to process any command line arguments, we'll just assume -everything passed is a file. - -So an example of a named code block is like this: - -```go "main implementation" -files := os.Args -for _, file := range files { - <<>> -} -``` - -How do we process a file? We'll need to keep 2 maps: one for named macros, and -one for file output content. We won't do any expansion until all the files have -been processed, because a block might refer to another block that either hasn't -been defined yet, or later has its definition changed. Let's define our maps, -define a stub of a `process file` function, and redefine our `main implementation` -to take that into account. - -Our maps, with some types defined for good measure: - -```go "global variables" -<<>> -``` - -```go "global block variables" -type File string -type CodeBlock string -type BlockName string - -var blocks map[BlockName]CodeBlock -var files map[File]CodeBlock +```bash +git clone https://github.com/driusan/lmt +cd lmt +go build ``` -Our ProcessFile function: +This will build the binary named `lmt` for your platform in the current +directory. You can use the `-o $path` argument to `go build` to build +the binary in a different location. (i.e. `go build -o ~/bin/` to put the +binary in `~/bin/`.) -```go "other functions" -<<>> -``` - -```go "ProcessFile Declaration" -// Updates the blocks and files map for the markdown read from r. -func ProcessFile(r io.Reader) error { - <<>> -} -``` - -And our new main: - -```go "main implementation" -<<>> - -// os.Args[0] is the command name, "lmt". We don't want to process it. -for _, file := range os.Args[1:] { - <<>> +## Demo -} -<<>> -``` +To observe `lmt` at work, put this file in an empty directory, cd to that +directory, and run `lmt README.md`. Now look in the directory and you'll see +files extracted from the code blocks alongside this markdown file. In +literate programming lingo, this extraction is (somewhat counterintuitively) +called "tangling." Generating documentation from the source is called +"weaving", and `lmt` leaves that to existing markdown renderers (such as +the GitHub frontend.) -We used a few packages, so let's import them before declaring the blocks we -just used. - -```go "main.go imports" -"fmt" -"os" -"io" -``` - -Initializing the maps is pretty straight forward: - -```go "Initialize" -// Initialize the maps -blocks = make(map[BlockName]CodeBlock) -files = make(map[File]CodeBlock) -``` - -As is opening the files, since we already declared the ProcessFile function and -we just need to open the file to turn it into an `io.Reader`: - -```go "Open and process file" -f, err := os.Open(file) -if err != nil { - fmt.Fprintln(os.Stderr, "error: ", err) - continue -} - -if err := ProcessFile(f); err != nil { - fmt.Fprintln(os.Stderr, "error: ", err) -} -// Don't defer since we're in a loop, we don't want to wait until the function -// exits. -f.Close() -``` - -### Processing Files - -Now that we've got the obvious overhead out of the way, we need to begin -implementing the code which parses a file. - -We'll start by scanning each line. The Go `bufio` package has a Reader which -has a `ReadString` method that will stop at a delimiter (in our case, '\n') - -We can do use this bufio Reader to iterate through lines like so: - -```go "process file implementation" -scanner := bufio.NewReader(r) -var err error -var line string -for { - line, err = scanner.ReadString('\n') - switch err { - case io.EOF: - return nil - case nil: - // Nothing special - default: - return err - } - <<>> - -} -``` +`lmt` is language agnostic. The below demonstration of features is written +in (very trivial) C++ to demonstrate using other languages. -We'll need to import the `bufio` package which we just used too: +### Tangling into a file. -```go "main.go imports" += -"bufio" -``` +The markup for the code block below starts with `​```cpp hello.cpp +=`: + +```cpp hello.cpp += +<<>> +<<>> -How do we handle a line? We'll need to keep track of a little state: - -1. Are we in a code block? -2. If so, what name or file is it for? -3. Are we ending a code block? If so, update the map (either replace or append.) - -So let's add a little state to our implementation: - -```go "process file implementation" -scanner := bufio.NewReader(r) -var err error -var line string - -var inBlock, appending bool -var bname BlockName -var fname File -var block CodeBlock - -for { - line, err = scanner.ReadString('\n') - switch err { - case io.EOF: - return nil - case nil: - // Nothing special - default: - return err - } - <<>> +int main() { + <<>> } ``` -We'll replace all of the variables with their zero value when we're not in a -block. - -The flow of handling a line will be something like: - -```go "Handle file line" -if inBlock { - if line == "```\n" { - <<>> - continue - } else { - <<>> - } -} else { - <<>> -} -``` +The header says 3 things: -Handling a code block line is easy, we just add it to the `block` if it's not -a block ending, and update the map/reset all the variables if it is. +1. `cpp`: the code block is written in C++. In the rendered markdown output, that + affects syntax highlighting, to lmt it means that language-appropriate + pragma directives will be added so that when debugging the extracted code, + your debugger will show you the line in the original markdown source file. + (If you don't want this effect, you can just use an unrecognized language + name like `cxx`). +2. `hello.cpp`: The code block will be written to the file `hello.cpp`. +3. `+=`: The code block will be appended to the most recent code block + defining that file, rather than overwriting its content. Since we haven't + written anything to `hello.cpp` yet, the effect is the same, but this + demonstrates the ability to use it. + -```go "Handle block line" -block += CodeBlock(line) -``` +### Macro References -```go "Handle block ending" -// Update the files map if it's a file. -if fname != "" { - if appending { - files[fname] += block - } else { - files[fname] = block - } -} +The `<<<`*string*`>>>` sequences in the body of the code block are called +"macro references." An LMT "macro" is just a variable whose value can be +extracted from one or more code blocks, and will be substituted wherever +its name appears in triple angle brackets on a line. There are no arguments +to `lmt` macros. -// Update the named block map if it's a named block. -if bname != "" { - if appending { - blocks[bname] += block - } else { - blocks[bname] = block - } -} +If we were to run lmt on this file at this point, we would get the warnings: -<<>> ``` - -```go "Reset block flags" -inBlock = false -appending = false -bname = "" -fname = "" -block = "" +Warning: Block named copyright referenced but not defined. +Warning: Block named includes referenced but not defined. +Warning: Block named body of main referenced but not defined. ``` -#### Processing Non-Block lines +This allows us to stub in a macro reference whenever we want in our code, +and only later define them in whatever order best fits our prose. When there +are no more warnings, the `hello.cpp` file should build (assuming we didn't +include any syntax or other compiler errors.) -Processing non-block lines is easy, and we don't have to do anything since we -are only concerned with code blocks. -we don't need to care and can just reset the flags. -Otherwise, for triple backticks, we can just check the first three characters -of the line (we don't care if there's a language specified or not). +### Macro Content -```go "Handle nonblock line" -if line == "" { - continue -} +The markup for the code block below starts with `​```cpp "body of main"` -switch line[0] { -case '`': - <<>> -default: - <<>> -} +```cpp "body of main" +std::cout << "Hello, werld!" << std::endl; ``` -When a code block is reached we will need to reset the flags and parse the line -for the following information: - - - a filename - - a block name/label - - an append flag - -```go "Check block start" -if len(line) >= 3 && line[0:3] == "```" { - inBlock = true - <<>> -} -``` +The double quotes around `body of main` mean that the code block will be +extracted into a macro of that name. You can see where its value will be +injected into hello.cpp via `<<>>`, +[above](#tangling-into-a-file). Since there's no `+=` at the end of the block's +first line of markup, this code block overwrites any existing value the macro +might already have (but since it has no existing value, it's a wash). -#### Parsing Headers With a Regex +`lmt` uses quotation marks to differentiate between macros and file +destinations. If a name is encased in quotes, it's a macro, if not, it's +a file. -Parsing headers is a little more difficult, but shouldn't be too hard with -a regular expression. There's four potential components: +We can later re-define a macro to overwrite it (`​```cpp "body of main"`, +again) - 1. 3 or more '`' characters. We don't care how many there are. - 2. 0 or more non-whitespace characters, which will may be the language type. - 3. 0 or more alphanumeric characters, which can be a file name. - 4. 0 or 1 string enclosed in quotation marks. - 5. It may or may not end in `+=`. - -So the regex will look something like ```/^(`+)([a-zA-Z0-9\.]*)("[.*]"){0,1}(+=){0,1}$/``` -(there are more characters that might be in a file name, but to keep the regex simple -we'll just assume letters, numbers, and dots.) - -That regex is already starting to look hairy, so instead let's split it up into -two: one for checking if it's a named block, and if that fails one for checking -if it's a file name. It means we can't have a block which is *both* a named -block and *also* goes into a filename, but that's probably not a very useful -case and can always be done with two blocks (one named, and a file which only -contains a macro expanding to the named block.) - -In fact, we'll put the whole thing into a function to make it easier to debug -and write tests if we want to. - -```go "Check block header" -fname, bname, appending = parseHeader(line) -// We're outside of a block, so just blindly reset it. -block = "" +```cpp "body of main" +std::cout << "Hello, world!" << std::endl; ``` -Then we need to define our parseHeader function: +`lmt` parses each file passed on the command line in order. The last +definition of a macro will be used for all references to that macro in +other code blocks (including blocks which preceeded it in the source.) -```go "other functions" += -<<>> -``` - -```go "ParseHeader Declaration" -func parseHeader(line string) (File, BlockName, bool) { - line = strings.TrimSpace(line) - <<>> -} -``` - -Our implementation is going to use a regex for a namedBlock, and compare the -line against it, so let's start by importing the regex package. - -```go "main.go imports" += -"regexp" -``` - -```go "parseHeader implementation" -namedBlockRe := regexp.MustCompile("^([`]+\\s?)[\\w\\+]*[\\s]*\"(.+)\"[\\s]*([+][=])?$") -matches := namedBlockRe.FindStringSubmatch(line) -if matches != nil { - return "", BlockName(matches[2]), (matches[3] == "+=") -} -<<>> -return "", "", false -``` +### Appending To A Macro -There's no reason to constantly be re-compiling the namedBlockRe, we can just -make it global and compile it once on initialization. +We can use `#include`s to demonstrate `+=` on macros. There are two includes in +this program. The markup for the following block starts with `​```cpp +"includes"`, which causes the (empty) value of the `includes` macro to be +overwritten. -```go "global variables" += -var namedBlockRe *regexp.Regexp +```cpp "includes" +#include ``` -```go "Initialize" += -<<>> -``` +The markup for the next code block, however, starts with `​```cpp "includes" +=`, +which causes the block to be appended to the `includes` macro. -```go "Namedblock Regex" -namedBlockRe = regexp.MustCompile("^([`]+\\s?)[\\w\\+]+[\\s]+\"(.+)\"[\\s]*([+][=])?$") +```cpp "includes" += +#include ``` -Then our parse implementation without the MustCompile is: +Its value is now: -```go "parseHeader implementation" -matches := namedBlockRe.FindStringSubmatch(line) -if matches != nil { - return "", BlockName(matches[2]), (matches[3] == "+=") -} -<<>> -return "", "", false +```cpp +#include +#include ``` -Checking a filename header is fairly simple: just make sure there's alphanumeric -characters or dots and no spaces. If it's neither, we can just return the zero -value, since the header must immediately preceed the code block according to our -specification. +(the code block above is not being tangled). -This time, we'll just go straight to declaring the regex as a global. +### Hidden content. -```go "global variables" += -var fileBlockRe *regexp.Regexp -``` +The raw markdown in this file contains a comment containing a code block with a +copyright notice. It looks a bit like this one: -```go "Initialize" += -<<>> -``` + -```go "Fileblock Regex" -fileBlockRe = regexp.MustCompile("^([`]+\\s?)[\\w\\+]+[\\s]+([\\w\\.\\-\\/]+)[\\s]*([+][=])?$") -``` +If you're reading the rendered markdown in your browser, you can't see the +*actual* comment, but it still gets tangled into the `copyright` macro, which is +substituted into hello.cpp by the `<<>>` macro reference. This +technique lets you tangle content that you don't want showing up in the +documentation. -```go "Check filename header" -matches = fileBlockRe.FindStringSubmatch(line) -if matches != nil { - return File(matches[2]), "", (matches[3] == "+=") -} + -### Outputting The Files +### What Tangles and What Doesn't. -Now, we've finally finished processing the file, all that remains is going through -the output files that were declared, expanding the macros, and writing them to -disk. Since our files is a `map[File]CodeBlock`, we can define methods on -`CodeBlock` as needed for things like expanding the macros. +We can tangle into a random data file (`​```csv data.csv`) -Let's start by just ranging through our files map, and assuming there's a method -on code block which does the replacing. -```go "Output files" -for filename, codeblock := range files { - f, err := os.Create(string(filename)) - if err != nil { - fmt.Fprintf(os.Stderr, "%v\n", err) - continue - } - fmt.Fprintf(f, "%s", codeblock.Replace()) - // We don't defer this so that it'll get closed before the loop finishes. - f.Close() - -} +```csv data.csv +foo, bar, baz, +qix, qux, quux, ``` -Now, we'll have to declare the Replace() method that we just used. The Replace() -will take a codeblock, go through it line by line, check if the current line is -a macro, and if so replace the content (recursively). We can use another regex -to determine if it's a macro line, and we can use a scanner similar to our -markdown line scanner to our previous one, +You need to specify both a language and a destination (macro or file) if +you want the code block tangled: -```go "other functions" += -<<>> -``` +No language (`​``` bar.txt`—note the space): -```go "Replace Declaration" -// Replace expands all macros in a CodeBlock and returns a CodeBlock with no -// references to macros. -func (c CodeBlock) Replace() (ret CodeBlock) { - <<>> -} +``` bar.txt +This doesn't get tangled anywhere ``` -```go "Replace codeblock implementation" -scanner := bufio.NewReader(strings.NewReader(string(c))) +No destination, but includes syntax highlighting (`​```cpp`) -for { - line, err := scanner.ReadString('\n') - // ReadString will eventually return io.EOF and this will return. - if err != nil { - return - } - <<>> -} -return +```cpp +auto x = "nor does this"; ``` -We'll have to import the strings package we just used to convert our CodeBlock -into an io.Reader: +But any language string and filename (`​```arbitrary foo.txt`) will do -```go "main.go imports" += -"strings" +```arbitrary foo.txt +This gets tangled +into foo.txt. ``` -Now, our replacement regex should be fairly simple: +Running `lmt` on this file at this point should generate the files `data.csv`, +`foo.txt`, and `hello.cpp` with the expected contents and produce no warnings. -```go "global variables" += -var replaceRe *regexp.Regexp -``` +## Building lmt from source -```go "Initialize" += -<<>> -``` +While the tangled source of `lmt` is included for bootstrapping purposes, +the markdown is considered the canonoical version. The Go source can +be re-extracted with: -```go "Replace Regex" -replaceRe = regexp.MustCompile(`^[\s]*<<<(.+)>>>[\s]*$`) -``` - -Okay, so let's do the actual line handling. If it doesn't match, add it to `ret` -and go on to the next line. If it matches, look up the part that matched in -blocks and include the replaced CodeBlock from there. (If it doesn't exist, -we'll add the line unexpanded and print a warning.) - -```go "Handle replace line" -matches := replaceRe.FindStringSubmatch(line) -if matches == nil { - ret += CodeBlock(line) - continue -} -<<>> -``` - -Looking up a replacement is fairly straight forward, since we have a map by the -time this is called. - -```go "Lookup replacement and add to ret" -bname := BlockName(matches[1]) -if val, ok := blocks[bname]; ok { - ret += val.Replace() -} else { - fmt.Fprintf(os.Stderr, "Warning: Block named %s referenced but not defined.\n", bname) - ret += CodeBlock(line) -} +```shell +lmt Implementation.md WhitespacePreservation.md SubdirectoryFiles.md LineNumbers.md IndentedBlocks.md ``` -## Fin - -And now, our tool is finally done! We've finally implemented our `lmt` tool tangle -tool, and can use it to write other literate markdown style programs with the -same syntax. +If you'd like to read the source, the order of the files and patches were +written is the same as passed on the command line. -The output of running it on itself (included [patches](#patches) and then running `go fmt`) -is in this repo to make it a go-gettable executable for bootstrapping purposes. + 1. [Basic Implementation](Implementation.md) + 2. [Whitespace Preservation](WhitespacePreservation.md) + 3. [Subdirectory Files](SubdirectoryFiles.md) + 4. [Line Numbers](LineNumbers.md) + 5. [Indented Blocks](IndentedBlocks.md) -To use it after installing it just run, for example - -```shell -lmt README.md WhitespacePreservation.md SubdirectoryFiles.md LineNumbers.md IndentedBlocks.md -``` +Small bug fixes can be contributed by modifying the prose and code in +the existing files. Larger features can be included as a patch in a +new file. -## Patches +## Credits - 1. [Whitespace Preservation](WhitespacePreservation.md) - 2. [Subdirectory Files](SubdirectoryFiles.md) - 3. [Line Numbers](LineNumbers.md) - 3. [Indented Blocks](IndentedBlocks.md) +`lmt` is primarily authored by Dave MacFarlane ([@driusan](https://github.com/driusan/)). Bryan Allred ([@bmallred](https://github.com/bmallred/)) improved the +parsing code to include the metadata in the code block header rather than a +rendered markdown header. [@mek-apelsin](https://github.com/mek-apelsin/) +wrote the patch to include pragmas for line numbers, and Dave Abrahams +([@dabrahams](https://github.com/dabrahams/)) wrote the demo of features in +this README, making it more user-focused (it previously dove straight into +implementation.)