-
Notifications
You must be signed in to change notification settings - Fork 1
Parser
The parser's job is to take a TokenStream
and convert it into an abstract syntax tree (AST). The type of parser Heir implements is called a top-down parser.
The AST is a representation of the code as a tree made of SyntaxNode
s (the leaves of the tree) that defines the order of execution.
The parser achieves this by consuming and matching tokens and building syntax nodes from the top down. Consuming a token is expecting it to be the next token, throwing an error if the next token is not the consumed SyntaxKind
, and advancing to the next token if it is. Matching a token is checking if the next token is a given SyntaxKind
, advancing and returning true if it is, returning false if it's not.
The following code snippet from the parser shows how operator precedence is handled by using the top-down parsing strategy. Specifically multiplication being grouped tighter than addition.
private Expression ParseAddition()
{
var left = ParseMultiplication();
while (Tokens.Match(SyntaxKind.Plus) || Tokens.Match(SyntaxKind.Minus))
{
var op = Tokens.Previous!;
var right = ParseMultiplication();
left = new BinaryOp(left, op, right);
}
return left;
}
private Expression ParseMultiplication()
{
var left = ParseExponentiation();
while (Tokens.Match(SyntaxKind.Star) || Tokens.Match(SyntaxKind.Slash) || Tokens.Match(SyntaxKind.SlashSlash) || Tokens.Match(SyntaxKind.Percent))
{
var op = Tokens.Previous!;
var right = ParseExponentiation();
left = new BinaryOp(left, op, right);
}
return left;
}
The following code snippet from Parser.ParseVariableDeclaration()
shows how the VariableDeclaration
statement is constructed by matching and consuming tokens. I did not do Tokens.Consume(SyntaxKind.LetKeyword)
because I wanted the custom error message. TokenStream.Consume()
will be revised in the future to accept an error message.
private Statement ParseVariableDeclaration()
{
var isMutable = Tokens.Match(SyntaxKind.MutKeyword);
if (!Tokens.Match(SyntaxKind.Identifier))
{
_diagnostics.Error(DiagnosticCode.H004C, $"Expected identifier after 'let', got {Tokens.Current}", Tokens.Previous!);
return new NoOpStatement();
}
var identifier = Tokens.Previous!;
TypeRef? type = null;
if (Tokens.Match(SyntaxKind.Colon))
type = ParseType();
Expression? initializer = null;
if (Tokens.Match(SyntaxKind.Equals))
initializer = ParseExpression();
if (initializer == null && type == null)
{
_diagnostics.Error(DiagnosticCode.H012, $"Cannot infer type of variable '{identifier.Text}', please add an explicit type or initializer", identifier);
return new NoOpStatement();
}
return new VariableDeclaration(new IdentifierName(identifier), initializer, type, isMutable);
}
From the source code 1 + 2
, a syntax tree containing the following BinaryOp
(binary operation) node would be created:
BinaryOp(
Left: Literal(IntLiteral, 1),
Operator: +,
Right: Literal(IntLiteral, 2)
)
From something slightly more complex like 3 * 2 + 1
, this is what the BinaryOp
would look like:
BinaryOp(
Left: BinaryOp(
Left: Literal(IntLiteral, 3),
Operator: *,
Right: Literal(IntLiteral, 2)
),
Operator: +,
Right: Literal(IntLiteral, 1)
)