Skip to content

Latest commit

 

History

History
377 lines (306 loc) · 16 KB

README.md

File metadata and controls

377 lines (306 loc) · 16 KB

goer: A minimal Erlang implemented in Go

The Go's Gopher mascot with the Erlang logo on its belly.

goer is a simple language inspired by Erlang. It implements a subset of Erlang's features with some modifications. I wrote it as a project intended to understand better the concepts behind Erlang, but also to compare the differences in the approach to concurrency in Erlang vs Go. Both Erlang's and Go's treat it as message passing. While the two languages are very different, and Go was designed over twenty years after Erlang, you can find many parallels in how they treat concurrency. By implementing Erlang in Go, I wanted to have a better feeling of how far-reaching are the similarities. Moreover, in the past, I implemented Scheme several times, this time, I wanted to create a parser for a language that has a syntax. Another inspiration for this was Rob Pike's talk about implementing lexers in Go.

goer is a toy language, with a very basic set of functionalities. It doesn't even implement any other numerical types than integers. The set of features is minimal by design, whatever can be implemented using goers primitives, would not be implemented as a part of the language itself. For example, strings have very limited support, if you want to manipulate them, use split to transform them into lists of character-string values.

Features

  • The language uses Erlang-like syntax, the syntax coloring for Erlang would mostly work for it out-of-the-box.
  • As in Erlang, everything is immutable.
  • There's extensive support for pattern matching.
  • The language is tail-call optimized and looping is done by making recursive function calls.
  • It has basic data types like atoms, booleans, integers, strings, lists, and tuples.
  • It has a REPL.

Differences from Erlang

  • It is a small subset of the Erlang language.
  • It is interpreted, not compiled.
  • Names of some functions (e.g. it has print) or the operators (e.g. != instead of /=) differ.
  • Erlang has two different syntaxes for defining anonymous and named functions. In goer functions are only defined using fun keywords (see below).
  • Erlang relies on linked lists, while goer uses Go slices instead, this forces using different programming patterns.
  • There are also some small differences, like _ = _ not raising an error.
  • You wouldn't be able to use it to build a distributed system running on multiple machines, as you would with Erlang.

Language tour

Expressions that are evaluated need to be finished with ., as in Erlang. They can consist of multiple sub-expressions, in such a case, they would be separated with ,.

Data types

  • Atoms are data types defined only by their names. The names are written as words starting with a lowercase letter and can have letters, numbers, _ and @ in their names. Examples are foo, other_@Atom, etc.
  • Booleans: true and false, are special kinds of atoms. There are basic boolean operations like not, and, and or1 that can be applied to booleans.
  • Integers map to Go's int type, so are at least 32-bit signed integers. Arithmetic operations +, -, *, / (integer division), and rem (reminder) can be used with integers. They can also be compared with <, <=2, >, >=, and the standard == and !=3.
  • Strings are surrounded by double quotes, like "Hello, World!". They respect the same escape characters as Go does, so "\"Hello,\nWorld!\"" is a string that has double quotes and a newline. You can use str to convert an arbitrary value to a string (including functions). With split string can be converted to a list of single-character strings, this may be used for string manipulation. Strings can be concatenated with the ++ operator.
  • Tuples like {foo, bar, 1, "hi", {}, false} can contain values of other data types. They can be accessed by pattern-matching their content.
  • Lists of other values [1, 2, [foo, {3, 4}], "bar"]. Lists can be concatenated with ++. They can be cheaply reversed with rev(Lst). Their size can be checked with len(Lst). Their last value can be accessed with last(Lst) and list containing all the values but last with rest(Lst). Additionally, nth(Lst, Idx) allows for accessing the value at the Idx position (zero-indexed) of the list Lst.

There are is_atom, is_bool, is_int, is_str, is_list, and is_tuple functions to check if a value belongs to a specific type.

Functions can have lowercase names as well, so print("hi") is a function with the "hi" argument, not an atom.

The operations in goer, like the arithmetic ones, are evaluated left-to-right but follow the standard rules of precedence, and the precedence is consistent with Erlang.

Lists are slightly different

Erlang, the same as lisps, and functional languages like OCaml or Haskell, extensively use linked lists. Many of the programming patterns in it evolved from this dependency. Linked lists however are in practice less efficient than in theory, and have better modern alternatives. goer uses Go slices instead, which leads to different programming patterns.

In Erlang, the recommended way to add an element to a list is by using the cons operator [Elem | List] (which has O(1) complexity). The syntax is not available in goer. Since underneath goer's lists are Go slices, the cheaper way is to append the new element at the back of the slice. For this reason, goer rather uses the ++ operator, which in Erlang is mostly used to combine lists, e.g. [1,2] ++ [3,4] == [1,2,3,4]. In goer, to add an element to the list, you would use List ++ Elem which adds the element at the back of the list. The ++ operator is overloaded, [foo] ++ bar is interpreted the same as [foo] ++ [bar].

For the same reason as above, head and tail functions are not as useful for dealing with lists as in Erlang. Instead, there are last and rest for accessing the last and everything but the last value.

This leads to different patterns, for example, to reverse a list in Erlang you could use a recursive function:

reverse(Lst) -> reverse(Lst,[]).
reverse([], Acc) -> Acc;
reverse([Head|Tail], Acc) -> reverse(Tail, [Head|Acc]).

in goer the same function would work in reverse (pun intended):

fun reverse
    (Lst) -> reverse(Lst, []);
    ([], Acc) -> Acc;
    (Lst, Acc) -> reverse(rest(Lst), Acc ++ last(Lst))
end.

Everything is immutable

As with every other language, goer has variables. Their names need to start with uppercase letters. The same as with Erlang, the variables however are immutable.

X = 1.
X = 8.    % ERROR: 'X' and '8' do not match
X = X+1.  % ERROR: 'X' and '2' do not match
2+5 = Y.  % also works

This prevents race conditions and promotes communicating by message passing, not sharing memory.

Pattern matching

Many functional languages heavily use pattern matching. The same applies to Erlang and goer. As shown above, assigning a value to the variable is done using the "match" operator =. X = 1 means "match X against 1", if X didn't have any value before, it now is equivalent to 1. The next attempt to match it with a different value would fail with an error. Pattern matching can also be used to access the values of tuples:

{First, 2, Third} = {1, 2, 3}.
First = 1.  % ok
Third = 3.  % ok

There is also the wildcard character _ which matches anything. For example, we could match elements of a four-element list to extract a specific one:

[_, _, X, _] = [foo, bar, baz, qux].
X = baz.  % ok

(The Erlang's [Head | Tail] matching is not implemented.)

Control flow

goer has the standard Erlang's if statement, which can consist of multiple conditional blocks (so it is more like cond in Scheme, than if in other languages).

if
    Cond1 ->
        Body1;
    ...;
    CondN ->
        BodyN
end

The conditions Cond1 ... CondN need to evaluate to booleans. For example,

if
    X > 5 -> yes;
    _ -> no
end

It also has the case statement:

case Expr of
    Pattern1 [when Guard1] ->
        Body1;
    ...;
    PatternN [when GuardN] ->
        BodyN
end

For example, FizzBuzz written using the case statement would be:

fun fizzbuzz (X) ->
    case { X rem 3, X rem 5 } of
        {0, 0} -> fizz_buzz;
        {0, _} -> fizz;
        {_, 0} -> buzz;
        _ -> X
    end
end

We could use the case with the guard (after when) to check if a value is larger than 5 in an overengineered way:

case 10 of
    X when X > 5 -> yes;
    _ -> no
end

Unlike Erlang, there are no restrictions on guard statements other than that they need to evaluate to booleans.

Both if and case would throw an error when no branch matches the conditions.

Instead of Erlang's try ... catch ... end, there is a much simpler try ... recover ... end statement. It evaluates the expression in the try block and in case of an error, and only if, it executes the expression in the recover block. It is not possible to catch specific errors. Neither of the blocks can be empty.

try
    Body1
recover
    Body2
end

Functions

Erlang has two ways of defining the functions. It allows for either anonymous functions

fun(X) -> X + 10 end

or named functions

fact(0) -> 1;
fact(N) when N > 0 ->
    N * fact(N-1).

goer combines both syntaxes. The anonymous functions are written the same way, but named functions are written like the anonymous ones, but with the (atom-like) name after the fun keyword. This syntax is similar to OCaml's function. The fact function from above in goer would be

fun fact
    (0) -> 1;
    (N) when N > 0 ->
        N * fact(N-1)
end

Every function, build-in or user-defined, needs to return something. Side-effects-only functions are not possible. When it is not possible to return anything, some functions would throw an error, for example, last([]) would fail, because an empty list does not have the last value.

Concurrency and message passing

As in Erlang, it is super-easy to start a concurrent process (in goer, there's a goroutine underneath!) with the spawn function. In the example below, it would concurrently print "hi!", which is a completely pointless thing to do. The spawn function returns the process ID (Pid below) of the process. The process ID of the current process can be obtained by calling self().

Pid = spawn(fun() -> print("hi!") end).

The process ID can be used to send messages and to communicate with the other processes. The ! operator means send Msg message to the process identified by Pid. Sending a message to a process that is not alive anymore would have no effect and no error would be shown as well, the same as in Erlang.

Pid ! Msg

The messages can and need to be received. For this, we use the receive block that pattern-matches the messages. If a message that does not match the pattern is received, it is silently dropped. It is possible to specify a timeout so that after a specific time the after block gets evaluated.

receive
    Pattern1 [when Guard1] -> Body1;
    ...;
    PatternN [when GuardN] -> BodyN;
after
    Expr -> BodyM
end

In Erlang, you could use the is_process_alive function to check if the process is still alive. There is no nice way of checking if a goroutine is alive, so such a function is not implemented. It can be done by implementing the processes in such a way that they would respond to a healthcheck query, for example:

fun loop()->
    receive
        {Sender, ping} ->
            Sender ! {self(), pong},
            loop()
    end
end,

Pid = spawn(loop),
Pid ! {self(), ping},
receive
    {Pid, pong} -> ok
after
    100 -> timeout
end.

It is also not possible to kill the goroutine from the "outside", so there's no exit/2 function in goer to do this. It could be implemented in a similar way as above, where the process would terminate itself after receiving the "terminate" message. It is similar to the link function, which makes the linked processes in Erlang die together. Linking could be implemented in goer by the linked processes keeping the list of linked processes and on exit (you could use try ... recover ... end here) they would send the "terminate" signal to the other processes.

Grammar

goer's grammar in EBNF form is:

Dummy           = '_'
Atom            = ( 'a'..'z' ) [ 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '@' ]*
Variable        = ( 'A'..'Z' ) [ 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '@' ]*
Bool            = 'true' | 'false'
Int             = ( '0'..'9' )*
String          = '"' ( Any - '\"' )* '"'
Tuple           = '{' Exprs '}'
List            = '[' Exprs ']'
Bracket         = '(' Expr ')'
Term            = Atom | Bool | Int | String | Tuple | List
Expr            = Term | Variable | Dummy | Bracket | UnaryOperation | BinaryOperation | Call | Fun
Exprs           = Expr [ ',' Expr ]*
Block           = Exprs '.'
Op              = '+' | '-' | '*' | '/' | 'rem' | '==' | '!=' | '<' | '<=' | '>' | '>=' | '=' | '!' | 'and' | 'or'
UnaryOperation  = ( '+' | '-' | 'not' ) Expr
BinaryOperation = Expr Op Expr
Call            = ( Atom | Variable | Bracket ) '(' Exprs ')'
Guard           = 'where' Exprs
FunBranch       = '(' Exprs ')' [ Guard ] '->' Exprs
Fun             = 'fun' [ Atom ] FunBranch [ ';' FunBranch ]* 'end'
IfBranch        = Expr '->' Exprs
If              = 'if' IfBranch [ ';' IfBranch ]* 'end'
CondBranch      = Expr [ Guard ] '->' Exprs
Case            = 'case' Expr 'of' CondBranch [ ';' CondBranch ]* 'end'
Receive         = 'receive' CondBranch [ ';' CondBranch ]* 'after' IfBranch 'end'
Try             = 'try' Exprs 'recover' Exprs 'end'

Footnotes

  1. They are short-circuited operators like andalso and orelse in Erlang.

  2. Erlang uses =<, but <= seems to be more prevalent in other languages.

  3. Erlang uses /=, similar, != seems to be more common.