Skip to content

Commit

Permalink
feat: more of the evaluation, better names, working modifiers
Browse files Browse the repository at this point in the history
  • Loading branch information
jbee committed Dec 21, 2022
1 parent 4e491a8 commit e2a51b1
Show file tree
Hide file tree
Showing 50 changed files with 1,293 additions and 595 deletions.
77 changes: 51 additions & 26 deletions PRESENTATION.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
## DHIS2 Expression Language

What is DHIS2 expression language?

```js
// math
1 + 1

// functions
log(100)

// logic
true && 2 > 1

// dataResolver
#{abcdefghijk}
```


## Parsing - Grammars - Basics

Butchered (but useful) terminology:
Expand Down Expand Up @@ -43,7 +62,7 @@ More (usually):

## Parser Jargon

* _consume_ : moving current position forward in the input
* _consume_: moving current position forward in the input
* _gobble_: consume and discard input (ignore, like WS)


Expand Down Expand Up @@ -98,7 +117,7 @@ a + ((b * c) + d)

Parsing is a left to right process...
```
((a + b) * c)
((a + b) * c) + d
```
😩 that isn't right... lets go back and try again, this time trying something different
```
Expand All @@ -121,8 +140,9 @@ better 😌

https://en.wikipedia.org/wiki/Parsing_expression_grammar

* Unlike CFGs, PEGs cannot be ambiguous
* if a string parses, it has exactly one valid parse tree
* PEGs cannot be ambiguous (unlike CFGs)
* if a string parses, it has exactly one valid parse tree
* (presumably: OR not allowed in production rules)

The toy example again, defined slightly different
```
Expand Down Expand Up @@ -176,7 +196,7 @@ What if operators are leaves? We get:
```
a, +, b, *, c, +, d
```
Everything is in a "flat" sequence.
Everything is in a "flat" sequence of typed nodes.

Now we walk the "tree" and merge only the operator with the highest precedence
into a structured operator with children:
Expand All @@ -192,36 +212,38 @@ The time is still linear as there is a fixed number of operators to do.

## CFG vs PEG

CFGs
CFGs (ANTLR)

* bad syntax choices ("collisions") are first recognised much later (solver hides them)
* solver means a framework is used, which means limitations
* multiple transformations because of the layers of abstraction
* whitespace is hard to control as it is implicitly assumed

* worst case complexity is factorial
* => bend your problem to suit the parser

PEGs

* decidability problem forces to recognise and solve collisions are right away
* "special" handling is not different
* decidability problem forces to recognise and solve collisions right away
* language methods are the "framework"
* direct translation (as complicated as needed but not more)
* just methods calling each other with convenience layer on top
* just methods calling each other (possibly with a convenience layer on top)
* whitespace is no different but needs explicit consideration and modelling


* generally: "special" handling is not different
* worst case complexity is linear*
* => write the parser to suit the problem

## How do PEGs work?


Key idea:
```java
void what(Input in, Context ctx);
```
* _what_ is the name of the non-terminal or terminal processed
* _what_ is the name of the token/block processed
* `Input`
* is whatever is processed and "consumed"
* is whatever is processed and "consumed" while parsing
* `Context`
* is whatever is build,
* usually emitting the base data for creating nodes in an AST
* is whatever is build, the "output"
* usually emitting the base dataResolver for creating nodes in an AST
* also might hold state like lookup by name

Example Grammar:
Expand All @@ -233,24 +255,27 @@ op = '+' | '*'
PEG parser:
```java
void expr(Input in, Context ctx) {
term(in, ctx);
char c = in.lookahead();
while (c == '+' || c == '*') { // isOperator(c)
op(in, ctx);
term(in, ctx);
char c = in.peek();
while (c == '+' || c == '*') {
operator(in, ctx);
term(in, ctx);
c = in.lookahead();
}
}
void term(Input in, Context ctx) {
in.skipWhitespace();
char c = in.peek();
in.consumeWhitespace();
char c = in.lookahead();
if (isDigit(c)) {
number(in, ctx);
} else {
} else if (isLetter(c)){
constant(in, ctx);
} else {
throw in.error();
}
in.skipWhitespace();
in.consumeWhitespace();
}
void operator(Input in, Context ctx) {
void op(Input in, Context ctx) {
char op = in.consume();
ctx.emitOperator(op);
}
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Each such method accepts two arguments:

The parsing is implemented in 4 levels (high to low):
1. `ExprGrammar`: high level composition of non-terminals (functions, methods, constants)
2. `Expr`: non-terminals `expr` (operators, brackets) and `data`
2. `Expr`: non-terminals `expr` (operators, brackets) and `dataResolver`
3. `Literals`: terminals of the language; string, number, date literals etc.
4. `Chars`: named character sets of the language as used by `Literals`

Expand All @@ -29,11 +29,11 @@ expr1 = UNARY_OPERATOR expr1
| NUMBER
| DATE
| function
| data-value
| dataResolver-value
| constant
function = NAME '(' expr (',' expr )* ')'
method = '.' NAME '(' expr (',' expr )* ')'
data-value = NAME '{' reference '}'
dataResolver-value = NAME '{' reference '}'
reference = uid ( '.' uid )? ( '.' uid )?
| REF
uid = tag? UID ('&' UID)*
Expand All @@ -57,7 +57,7 @@ STRING = '"' ... '"' // ... => its complicated, escaping, unicode

While in general functions and methods have `expr` arguments each named
function has a particular sequence of parameters which might be limited to a
case like expecting a `DATE` or a `data-value` item.
case like expecting a `DATE` or a `dataResolver-value` item.

### AST
The parser builds an AST with "flat" operators. Meaning the operands are not
Expand All @@ -76,9 +76,9 @@ way that is easy to maintain:
- `d2:relationshipCount` is the only function expecting a quoted `UID`
- a `programRuleStringVariableName` is any string and only identifiable having
a special meaning by its position as argument to certain functions
- `PS_EVENTDATE:` is a tag for a `UID` for a data value but does not use the
- `PS_EVENTDATE:` is a tag for a `UID` for a dataResolver value but does not use the
`#{...}` wrapper and can therefore easily be confused for a named function
- functions accepting data item values do not accept all data item value types
- functions accepting dataResolver item values do not accept all dataResolver item value types
that can occur on top level.
- the `de:*`-functions contain `:` which is hard to distinguish from a tag
- `orgUnit.*`-functions contain `.` which is hard to distinguish from a method
8 changes: 0 additions & 8 deletions src/main/java/org/hisp/dhis/expression/Data.java

This file was deleted.

40 changes: 0 additions & 40 deletions src/main/java/org/hisp/dhis/expression/DataValue.java

This file was deleted.

128 changes: 0 additions & 128 deletions src/main/java/org/hisp/dhis/expression/EvaluateNodeTransformer.java

This file was deleted.

Loading

0 comments on commit e2a51b1

Please sign in to comment.