Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added info on symbolic tokens in design docs #2657

Merged
Merged
Changes from 4 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
0ae01ab
Added info on symbolic tokens in design docs
aswin2108 Mar 5, 2023
6b6aac3
Fixed typos!
aswin2108 Mar 5, 2023
bed7bd3
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 Mar 13, 2023
73f8d50
Fixed pre-commit issues
aswin2108 Mar 13, 2023
a05a815
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 Mar 14, 2023
7d23548
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 Mar 20, 2023
b5e73bf
Merge remote-tracking branch 'upstream/trunk' into Add-symbolic-token…
aswin2108 Apr 1, 2023
8206dab
Added reviewed changes and revamped whitespace section
aswin2108 Apr 1, 2023
3fd7770
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 Apr 18, 2023
64508f7
Resolved some reviews
aswin2108 Apr 18, 2023
581950e
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 Apr 29, 2023
f726bd1
Added missing tokens to the table
aswin2108 Apr 29, 2023
e7f2997
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 May 12, 2023
e0c7c70
Fixed the table
aswin2108 May 12, 2023
39b88f4
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 May 13, 2023
9273b02
Added missing seperators
aswin2108 May 13, 2023
9326978
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 May 24, 2023
5eba8b9
Single table row lists both delimiters
aswin2108 May 24, 2023
28b7054
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 Jun 1, 2023
4d05316
Added TODO message
aswin2108 Jun 1, 2023
bb3f2a9
Fixed punctuation mistakes
aswin2108 Jun 1, 2023
f56561d
Edited the details section
aswin2108 Jun 1, 2023
718b873
Merge branch 'carbon-language:trunk' into Add-symbolic-tokens-in-desi…
aswin2108 Jun 2, 2023
2068acd
Removed unwanted lines
aswin2108 Jun 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 45 additions & 70 deletions docs/design/lexical_conventions/symbolic_tokens.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Overview](#overview)
- [Details](#details)
- [Symbolic token list](#symbolic-token-list)
- [Whitespace](#whitespace)
- [Alternatives considered](#alternatives-considered)
- [References](#references)

Expand All @@ -23,15 +22,14 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

A _symbolic token_ is one of a fixed set of
[tokens](https://en.wikipedia.org/wiki/Lexical_analysis#Token) that consist of
characters that are not valid in identifiers, that is they are tokens consisting
of symbols, not letters or numbers. Operators are one use of symbolic tokens,
but they are also used in patterns `:`, declarations (`->` to indicate return
type, `,` to separate parameters), statements (`;`, `=`, and so on), and other
places (`,` to separate function call arguments).
characters that are not valid in identifiers. That is, they are tokens
consisting of symbols, not letters or numbers. Operators are one use of symbolic
tokens, but they are also used in patterns `:`, declarations (`->` to indicate
return type, `,` to separate parameters), statements (`;`, `=`, and so on), and
other places (`,` to separate function call arguments).

Carbon has a fixed set of tokens that represent operators, defined by the
language specification. Developers cannot define new tokens to represent new
operators.
Carbon has a fixed set of symbolic tokens, defined by the language
specification. Developers cannot define new symbolic tokens in their own code.

Symbolic tokens are lexed using a "max munch" rule: at each lexing step, the
longest symbolic token defined by the language specification that appears
Expand All @@ -49,6 +47,9 @@ follow certain rules:
must be an identifier, a literal, or any kind of opening bracket (for
example, `(`, `[`, or `{`).

These rules enable us to use a token like `*` as a prefix, infix, and postfix
operator, without creating ambiguity.

## Details

Symbolic tokens are intended to be used for widely-recognized operators, such as
Expand All @@ -62,67 +63,41 @@ overloading.
The following is the initial list of symbolic tokens recognized in a Carbon
source file:

| Token | Explanation |
| ----- | ---------------------------------------------------------------------------------------------------------- |
| `*` | Indirection, multiplication, and forming pointers |
| `&` | Address-of or Bitwise AND |
| `=` | Assignment |
| `->` | Return type and indirect member access |
| `=>` | Match syntax |
| `[` | Subscript and used for deduced parameter lists |
| `]` | Subscript and used for deduced parameter lists |
| `(` | Separate tuple and struct elements |
| `)` | Separate tuple and struct elements |
| `{` | Struct literals, blocks of control flow statements and the bodies of definitions (classes, functions, etc) |
| `}` | Struct literals, blocks of control flow statements and the bodies of definitions (classes, functions, etc) |
| `,` | Separate tuple and struct elements |
| `.` | Member access |
| `:` | Name bindings |
| `;` | Name bindings |

This list is expected to grow over time as more symbolic tokens are required by
language proposals.

Note: The above list only covers up to
[#601](https://github.com/carbon-language/carbon-lang/pull/601) and more have
been added since that are not reflected here.

### Whitespace

Carbon's rule for whitespace around operators have been designed to allow the
same symbolic token to be used as a prefix operator, infix operator, and postfix
operator in some cases. To make parsing operators unambiguous, we require
whitespace to be present or absent around the operator to indicate its fixity,
with binary operators having whitespace on both sides, and unary operators
lacking whitespace between the operator and its operand. However, there are some
cases where omitting whitespace around a binary operator can aid readability,
such as in expressions like `2*x*x + 3*x + 1`. In such cases, the operator with
whitespace on neither side is treated as binary if the token immediately before
the operator indicates the end of an operand and the token immediately after
indicates the beginning of an operand.

Identifiers, literals, and brackets of any kind, facing away from the operator,
are defined as tokens that indicate the beginning or end of an operand. For
error recovery purposes, no expression context can be preceded by a token that
looks like the end of an operand, and no expression context can be followed by a
token that looks like the start of an operand, except in function definitions
where `{}` is the body of the function.

From the perspective of token formation, there are four variants of each
symbolic token: a binary variant with whitespace on both sides, a binary variant
with whitespace on neither side, a unary variant with whitespace on neither
side, and prefix and postfix variants with whitespace on the left and right
sides, respectively. In non-operator contexts, any variant of a symbolic token
is acceptable, but in operator contexts, only the appropriate variant can be
used.

The whitespace rule was designed to strike a balance between simplicity and
expressiveness for the programmer, and simplicity and good support for error
recovery in the implementation. The rule's allowance for omitting whitespace
around binary operators aids readability, but it can cause errors if not used
carefully, particularly in function definitions. Despite this, the rule provides
the necessary cues for human readers to understand the code, while still
allowing for unambiguous parsing of operators.
| Symbolic Tokens and Explanation |
| -------------------------------------------------------------------------------------------------------------- | ------------ |
| `+` Addition |
| `-` Subtraction and negation |
| `*` Indirection, multiplication, and forming pointers |
| `/` Division |
| `%` Modulus |
| `=` Assignment |
| `^` Complementing and Bitwise XOR |
| `&` Address-of and Bitwise AND |
| ` | ` Bitwise OR |
| `<<` Arithmetic and Logical Left-shift |
| `>>` Arithmetic and Logical Right-shift |
| `==` Equality or equal to |
| `!=` Inequality or not equal to |
| `>` Greater than |
| `>=` Greater than or equal to |
| `<` Less than |
| `<=` Less than or equal to |
| `->` Return type and indirect member access |
| `=>` Match syntax |
| `[` Subscript and deduced parameter lists |
| `]` Subscript and deduced parameter lists |
| `(` Function call, function declaration and tuple literals |
| `)` Function call, function declaration and tuple literals |
| `{` Struct literals, blocks of control flow statements and the bodies of definitions (classes, functions, etc) |
| `}` Struct literals, blocks of control flow statements and the bodies of definitions (classes, functions, etc) |
| `,` Separate tuple and struct elements |
| `.` Member access |
| `:` Name bindings |
| `;` Statement separator |
| `:!` Type-checking |

TODO: Arithmetic operators, Bitwise operators, Comparison operators & :!
[#2657](https://github.com/carbon-language/carbon-lang/pull/2657/files#r1137826711)

## Alternatives considered

Expand Down