Skip to content
Riley Peel edited this page Jan 20, 2025 · 2 revisions

What it does

The parser's job is to take a TokenStream and convert it into an abstract syntax tree (AST). The type of parser Heir implements is called a top-down parser.

AST

The AST is a representation of the code as a tree made of SyntaxNodes (the leaves of the tree) that defines the order of execution.

How it does it

The parser achieves this by consuming and matching tokens and building syntax nodes from the top down. Consuming a token is expecting it to be the next token, throwing an error if the next token is not the consumed SyntaxKind, and advancing to the next token if it is. Matching a token is checking if the next token is a given SyntaxKind, advancing and returning true if it is, returning false if it's not.

The following code snippet from the parser shows how operator precedence is handled by using the top-down parsing strategy. Specifically multiplication being grouped tighter than addition.

private Expression ParseAddition()
{
    var left = ParseMultiplication();
    while (Tokens.Match(SyntaxKind.Plus) || Tokens.Match(SyntaxKind.Minus))
    {
        var op = Tokens.Previous!;
        var right = ParseMultiplication();
        left = new BinaryOp(left, op, right);
    }

    return left;
}

private Expression ParseMultiplication()
{
    var left = ParseExponentiation();
    while (Tokens.Match(SyntaxKind.Star) || Tokens.Match(SyntaxKind.Slash) || Tokens.Match(SyntaxKind.SlashSlash) || Tokens.Match(SyntaxKind.Percent))
    {
        var op = Tokens.Previous!;
        var right = ParseExponentiation();
        left = new BinaryOp(left, op, right);
    }

    return left;
}

The following code snippet from Parser.ParseVariableDeclaration() shows how the VariableDeclaration statement is constructed by matching and consuming tokens. I did not do Tokens.Consume(SyntaxKind.LetKeyword) because I wanted the custom error message. TokenStream.Consume() will be revised in the future to accept an error message.

private Statement ParseVariableDeclaration()
{
    var isMutable = Tokens.Match(SyntaxKind.MutKeyword);
    if (!Tokens.Match(SyntaxKind.Identifier))
    {
        _diagnostics.Error(DiagnosticCode.H004C, $"Expected identifier after 'let', got {Tokens.Current}", Tokens.Previous!);
        return new NoOpStatement();
    }

    var identifier = Tokens.Previous!;
        
    TypeRef? type = null;
    if (Tokens.Match(SyntaxKind.Colon))
        type = ParseType();

    Expression? initializer = null;
    if (Tokens.Match(SyntaxKind.Equals))
        initializer = ParseExpression();

    if (initializer == null && type == null)
    {
        _diagnostics.Error(DiagnosticCode.H012, $"Cannot infer type of variable '{identifier.Text}', please add an explicit type or initializer", identifier);
        return new NoOpStatement();
    }

    return new VariableDeclaration(new IdentifierName(identifier), initializer, type, isMutable);
}

Example

From the source code 1 + 2, a syntax tree containing the following BinaryOp (binary operation) node would be created:

BinaryOp(
    Left: Literal(IntLiteral, 1),
    Operator: +,
    Right: Literal(IntLiteral, 2)
)

From something slightly more complex like 3 * 2 + 1, this is what the BinaryOp would look like:

BinaryOp(
    Left: BinaryOp(
        Left: Literal(IntLiteral, 3),
        Operator: *,
        Right: Literal(IntLiteral, 2)
    ),
    Operator: +,
    Right: Literal(IntLiteral, 1)
)
Clone this wiki locally