Skip to content

Commit

Permalink
smaller lineinfos (#39)
Browse files Browse the repository at this point in the history
* implements #35; WIP

* make tests green again
  • Loading branch information
Araq authored Aug 29, 2024
1 parent 0565620 commit 0c72f6a
Show file tree
Hide file tree
Showing 14 changed files with 9,058 additions and 9,041 deletions.
57 changes: 35 additions & 22 deletions doc/nif-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ In order to get a feeling for how a NIF file can look, here is a complete exampl
```nif
(.nif24)
(stmts
(imp @2,5,sysio.nim(type :File (object ..)))
(imp 2,5,sysio.nim(type :File (object ..)))
(imp (proc :write.1.sys . (pragmas varargs) (params (param f File)).))
(call write.1.sys "Hello World!\0A")
)
Expand All @@ -70,7 +70,7 @@ A generator can produce shorter code by making use of `.k` and `.i` (substitutio
(.k P pragmas)
(.i write write.1.sys)
(stmts
(I @2,5,sysio.nim(type :File (object ..)))
(I 2,5,sysio.nim(type :File (object ..)))
(I (proc :write . (P varargs) (params (param f File)).))
(call write "Hello World!\0A")
)
Expand Down Expand Up @@ -98,14 +98,14 @@ Whitespace is the set `{' ', '\t', '\n', '\r'}`.
Control characters
------------------

NIF uses some characters like `(`, `)` and `@` to describe the AST. As such these characters
NIF uses some characters like `(`, `)` and `~` to describe the AST. As such these characters
**must not** occur in string literals, char literals and comments so that a NIF parser can
skip to the enclosing `)` without complex logic.

The control characters are:

```
( ) [ ] { } @ # ' " \ :
( ) [ ] { } ~ # ' " \ :
```

Escape sequences
Expand Down Expand Up @@ -196,13 +196,16 @@ Grammar:
Digit ::= [0-9]
NumberSuffix ::= [a-z]+ [0-9a-z]* # suffixes can only contain lowercase letters
FloatingPointPart ::= ('.' Digit+ ('E' '-'? Digit+)? ) | 'E' '-'? Digit+
Number ::= '-'? Digit+ FloatingPointPart? NumberSuffix?
Number ::= ('+' | '-') Digit+ FloatingPointPart? NumberSuffix?
```

Numbers must start with a digit (or a minus) and only their decimal notation is supported. Numbers can have
Numbers must start with a plus or a minus and only their decimal notation is supported. Numbers can have
a suffix that has to start with a lowercase letter. For example Nim's `0xff'i32` would become `256i32x`.
(The `x` encodes the fact that the number was originally written in hex.)

Note that numbers that do not start with a plus nor a minus are interpreted as "line information". See
the corresponding section for more details.


### Char literals

Expand Down Expand Up @@ -348,25 +351,34 @@ Line information
Grammar:

```
LineDiff ::= Digit* | '-' Digit+
LineInfo ::= '@' LineDiff (',' LineDiff (',' EscapedData)?)?
LineDiff ::= Digit* | '~' Digit+
LineInfo ::= LineDiff (',' LineDiff (',' EscapedData)?)?
```

Every node can be prefixed with `@` to add source code information. ("This node originates from file.nim(line,col).")
Every node can be prefixed with a digit or `~` or `,` to add source code information.
("This node originates from file.nim(line,col).")
There are 3 forms:

1. `@<column-diff>`
2. `@<column-diff, line-diff>`
3. `@<column, line, filename>`
1. `<column-diff>`
2. `<column-diff, line-diff>`
3. `<column, line, filename>`

The `diff` means that the value is relative to the parent node. For example `@8` means that the node is at
the same position as the parent node except that its column is `+8` characters. Negative numbers are valid
too and usually required for "infix" nodes where the left hand operand preceeds the parent (`x + y` becomes
`(infix add @-3 x @2 y)` because `x` is written before the `+` operator).
The `diff` means that the value is relative to the parent node. For example `8` means that the node is at
the same position as the parent node except that its column is `+8` characters. Negative numbers use the tilde
and not the minus. Negative numbers are usually required for "infix" nodes where the left hand operand
preceeds the parent (`x + y` becomes
`(infix add ~3 x 2 y)` because `x` is written before the `+` operator).

The AST root node can only be annotated with the form `@<column, line, filename>` as it has no parent node
The AST root node can only be annotated with the form `<column, line, filename>` as it has no parent node
that column and line could refer to.

Note that numeric literals in NIF have to start with `+` or `-` and thus cannot cause ambiguity with line
information.

Since the information includes both lines and columns it can easily take up 10-20% of the file size.
Therefore a mere digit starts a line information and not a numeric literal. Numeric literals are not
nearly as frequent in practice.


Comments
--------
Expand Down Expand Up @@ -465,7 +477,7 @@ For example:
(.i Hello "Hello world!\0A")
(stmts
(call ECHO 1 2 3)
(call ECHO +1 +2 +3)
(C ECHO Hello)
)
```
Expand Down Expand Up @@ -499,9 +511,10 @@ forms a valid identifier (for C code generation or otherwise). The following enc
accomplishes this task:

1. Line information and comments are ignored.
2. The substring of trailing `)` is removed as there is nothing interesting about `))))`.
3. Whitespace is canonicalized to a single space.
4. The space after `)` and before `(` is removed.
2. The unary `+` for numbers are removed.
3. The substring of trailing `)` is removed as there is nothing interesting about `))))`.
4. Whitespace is canonicalized to a single space.
5. The space after `)` and before `(` is removed.


`(` is turned into `A`.
Expand Down Expand Up @@ -541,7 +554,7 @@ In summary:

For example:

`(array (range 0 9) (array (range 0 4) (i 8))))`
`(array (range +0 +9) (array (range +0 +4) (i +8))))`

Becomes:

Expand Down
Binary file modified doc/nif-spec.pdf
Binary file not shown.
8 changes: 4 additions & 4 deletions doc/nim-gear2.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ pragmas which are then attached to the sym's declaration side. For example, if `
is set to some value `4` which is not trivially recomputable it would be stored as:

```
(var :theSymbol.2 . (pragmas (position 4)) int .)
(var :theSymbol.2 . (pragmas (position +4)) int .)
```


Expand Down Expand Up @@ -66,7 +66,7 @@ Types
Lisp trees and the symbol mangling to the scheme `<name>.<number>.<module-suffix>` are
sufficient to encode any Nim type into a short descriptive unique name. Many builtin types
like `system.int` are directly mapped to node kinds. For example, like in NIFC `system.int`
becomes `(i M)` and `system.char` becomes `(c 8)`.
becomes `(i M)` and `system.char` becomes `(c +8)`.

More examples:

Expand All @@ -75,7 +75,7 @@ More examples:
| `string` | `(str)` |
| `seq` | `(seq)` |
| `typeof(nil)` | `(nilt)` |
| `array[2..6, ref MyObj]` | `(array (ref MyObj.1.msfx) (range (i M) 2 6))` |
| `array[2..6, ref MyObj]` | `(array (ref MyObj.1.msfx) (range (i M) +2 +6))` |


### Type aliases
Expand Down Expand Up @@ -186,7 +186,7 @@ SymbolDef ::= <according to NIF's spec>
Number ::= <according to NIF's spec>
CharLiteral ::= <according to NIF's spec>
StringLiteral ::= <according to NIF's spec>
IntBits ::= [0-9]+ | 'M'
IntBits ::= '+' [0-9]+ | 'M'
ExportMarker ::= Empty | 'x'
Expand Down
23 changes: 16 additions & 7 deletions src/lib/nifbuilder.nim
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ proc undoWhitespace(b: var Builder) =


const
ControlChars* = {'(', ')', '[', ']', '{', '}', '@', '#', '\'', '"', '\\', ':'}
ControlChars* = {'(', ')', '[', ']', '{', '}', '~', '#', '\'', '"', '\\', ':'}

proc escape(b: var Builder; c: char) =
const HexChars = "0123456789ABCDEF"
Expand Down Expand Up @@ -147,18 +147,23 @@ proc addCharLit*(b: var Builder; c: char) =

proc addIntLit*(b: var Builder; i: BiggestInt; suffix = "") =
addSep b
if i >= 0:
b.buf.add '+'
b.put $i
b.put suffix

proc addUIntLit*(b: var Builder; u: BiggestUInt; suffix = "") =
addSep b
b.buf.add '+'
b.put $u
b.put suffix

proc addFloatLit*(b: var Builder; f: BiggestFloat; suffix = "") =
addSep b
let myLen = b.buf.len
drainPending b
if f >= 0.0:
b.buf.add '+'
b.buf.addFloat f
for i in myLen ..< b.buf.len:
if b.buf[i] == 'e': b.buf[i] = 'E'
Expand All @@ -167,26 +172,30 @@ proc addFloatLit*(b: var Builder; f: BiggestFloat; suffix = "") =
b.buf.setLen 0
b.put suffix

proc addLine(s: var string; x: int32) =
if x < 0:
s.add '~'
s.addInt(-x)
else:
s.addInt(x)

proc addLineInfo*(b: var Builder; col, line: int32; file = "") =
addSep b
var seps = 0
if col != 0'i32:
drainPending b
b.buf.add '@'
b.buf.addInt col
b.buf.addLine col
inc seps
if line != 0'i32:
if seps == 0:
drainPending b
b.buf.add '@'
b.buf.add ','
b.buf.addInt line
b.buf.addLine line
inc seps
if file.len > 0:
if seps == 0:
drainPending b
b.buf.add "@,,"
elif seps == 1: b.buf.add "@,"
b.buf.add ",,"
else: b.buf.add ','
for c in file:
if c.needsEscape:
Expand Down
17 changes: 6 additions & 11 deletions src/lib/nifreader.nim
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import std / [memfiles, tables, parseutils]
import stringviews

const
ControlChars = {'(', ')', '[', ']', '{', '}', '@', '#', '\'', '"', ':'}
ControlChars = {'(', ')', '[', ']', '{', '}', '~', '#', '\'', '"', ':'}
ControlCharsOrWhite = ControlChars + {' ', '\n', '\t', '\r'}
HexChars* = {'0'..'9', 'A'..'F'} # lowercase letters are not in the NIF spec!
StringSuffixChars = {'A'..'Z', 'a'..'z', '_', '0'..'9'}
Expand Down Expand Up @@ -50,7 +50,7 @@ type
eof: pchar
f: MemFile
buf: string
line*: int32 # file position within the NIF file, not affected by '@' annotations
line*: int32 # file position within the NIF file, not affected by line annotations
err*: bool
trackDefs*: bool
isubs, ksubs: Table[StringView, (TokenKind, StringView)]
Expand Down Expand Up @@ -245,7 +245,7 @@ proc handleLineInfo(r: var Reader; result: var Token) =
useCpuRegisters:
var col = 0
var negative = false
if p < eof and ^p == '-':
if p < eof and ^p == '~':
inc p
negative = true
while p < eof and ^p in Digits:
Expand All @@ -260,7 +260,7 @@ proc handleLineInfo(r: var Reader; result: var Token) =

if p < eof and ^p == ',':
inc p
if p < eof and ^p == '-':
if p < eof and ^p == '~':
inc p
negative = true
while p < eof and ^p in Digits:
Expand Down Expand Up @@ -294,9 +294,8 @@ proc next*(r: var Reader): Token =
if r.p >= r.eof:
result.tk = EofToken
else:
if ^r.p == '@':
if ^r.p in {'0'..'9', ',', '~'}:
# we have node prefix
inc r.p
handleLineInfo r, result
skipWhitespace r

Expand Down Expand Up @@ -396,16 +395,12 @@ proc next*(r: var Reader): Token =
break
dec start

of '-':
of '-', '+':
result.s.p = r.p
inc r.p
inc result.s.len
handleNumber r, result

of '0'..'9':
result.s.p = r.p
handleNumber r, result

else:
useCpuRegisters:
result.s.p = p
Expand Down
2 changes: 1 addition & 1 deletion src/nifler/emitter.nim
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ type
nesting, lineLen: int

const
ControlChars* = {'(', ')', '[', ']', '{', '}', '@', '#', '\'', '"', '\\', ':'}
ControlChars* = {'(', ')', '[', ']', '{', '}', '~', '#', '\'', '"', '\\', ':'}

proc lineBreak*(r: var string; l: var int; nesting: int) =
r.add "\n"
Expand Down
Loading

0 comments on commit 0c72f6a

Please sign in to comment.