Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement right contexts (lookahead) #37

Merged
merged 32 commits into from
Jan 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
4a7da01
Implement right context type and parsing
osa1 Dec 9, 2021
5f2f0ac
Implement NFA right context simulation -- not tested
osa1 Dec 10, 2021
c5ec566
Implement DFA right context simulation -- not tested
osa1 Dec 10, 2021
663fd7e
Start testing
osa1 Dec 10, 2021
d3d1440
Fix NFA right context simulation, update NFA debug output
osa1 Dec 10, 2021
5ba5a08
Merge remote-tracking branch 'origin/main' into right_context
osa1 Dec 10, 2021
babb2d7
Fix DFA simulation
osa1 Dec 11, 2021
ac1d4a2
Merge remote-tracking branch 'origin/main' into right_context
osa1 Dec 12, 2021
e178bdd
Start implementing separate DFAs for right contexts
osa1 Dec 16, 2021
38167dd
Enable right ctx tests
osa1 Dec 16, 2021
a905ba2
WIP
osa1 Dec 16, 2021
a4caf92
Start simplifying right ctx DFAs
osa1 Dec 16, 2021
785f1d7
WIP: Start implementing codegen for right contexts
osa1 Dec 17, 2021
5b0c0ae
Fix a few bugs, start adding tests
osa1 Dec 17, 2021
5550e35
Start handling right contexts in codegen, add more tests
osa1 Dec 19, 2021
bf81d48
Make iter field public
osa1 Dec 19, 2021
7a942f0
More right ctx tests
osa1 Dec 19, 2021
cbd1e1a
Add a failing test
osa1 Dec 19, 2021
9bb4f97
Merge remote-tracking branch 'origin/main' into right_context
osa1 Jan 31, 2022
0789752
Implement helper fn `make_if`
osa1 Jan 31, 2022
326c9e0
Run right contexts in rest of the cases
osa1 Jan 31, 2022
eb390a1
Remove duplicate code
osa1 Jan 31, 2022
b1ffac6
Remove duplicate code
osa1 Jan 31, 2022
5badb4e
Remove unused stuff
osa1 Jan 31, 2022
fb795e1
Merge remote-tracking branch 'origin/main' into right_context
osa1 Jan 31, 2022
2fa142b
Fix a lint, remove invalid test
osa1 Jan 31, 2022
efa8cfd
Add more tests, last one reveals a compilation error
osa1 Jan 31, 2022
5bdd0a8
Generate right ctxs before search tables
osa1 Jan 31, 2022
6072e3a
More tests
osa1 Jan 31, 2022
177955e
Add more tests
osa1 Jan 31, 2022
64a37f9
Update CHANGELOG, README
osa1 Jan 31, 2022
9439412
Typo fix
osa1 Jan 31, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
}
```

- A new syntax added for right contexts. A right context is basically
lookahead, but can only be used in rules and cannot be nested inside regexes.
See README for details. (#29)

# 2021/11/30: 0.8.1

New version published to fix broken README pages for lexgen and lexgen_util in
Expand Down
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,20 @@ You can use parenthesis for grouping, e.g. `('a' | 'b')*`.

Example: `'a' 'b' | 'c'+` is the same as `(('a' 'b') | ('c'+))`.

## Right context (lookahead)

A rule in a rule set can be followed by another regex using `> <regex>` syntax,
for right context. Right context is basically a limited form of lookahead: they
can only appear after a top-level regex for a rule. They cannot be used nested
in a regex.

For example, the rule left-hand side `'a' > (_ # 'b')` matches `'a'` as long as
it's not followed by `'b'`.

See also [right context tests] for more examples.

[right context tests]: https://github.com/osa1/lexgen/blob/main/tests/right_ctx.rs

## Built-in regular expressions

lexgen comes with a set of built-in regular expressions. Regular
Expand Down
42 changes: 33 additions & 9 deletions crates/lexgen/src/ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ pub struct Lexer {

pub enum Rule {
/// `let <ident> = <regex>;`
Binding { var: Var, re: Regex },
Binding { var: Var, re: RegexCtx },

/// `type Error = UserError;`
ErrorType {
Expand All @@ -41,10 +41,17 @@ pub enum Rule {
}

pub struct SingleRule {
pub lhs: Regex,
pub lhs: RegexCtx,
pub rhs: SemanticActionIdx,
}

/// Regular expression with optional right context (lookahead)
#[derive(Debug, Clone)]
pub struct RegexCtx {
pub re: Regex,
pub right_ctx: Option<Regex>,
}

#[derive(Debug, Clone)]
pub enum RuleRhs {
None,
Expand Down Expand Up @@ -135,13 +142,30 @@ pub enum CharOrRange {
Range(char, char),
}

/// Parses a regex terminated with: `=>` (used in rules with RHSs), `,` (used in rules without
/// RHSs), or `;` (used in let bindings)
/// Parses a regex with optional right context: `re_ctx -> re [> re]`
fn parse_regex_ctx(input: ParseStream) -> syn::Result<RegexCtx> {
let re = parse_regex(input)?;
if input.peek(syn::token::Gt) {
input.parse::<syn::token::Gt>()?;
let right_ctx = parse_regex(input)?;
Ok(RegexCtx {
re,
right_ctx: Some(right_ctx),
})
} else {
Ok(RegexCtx {
re,
right_ctx: None,
})
}
}

/// Parses a regex
fn parse_regex(input: ParseStream) -> syn::Result<Regex> {
parse_regex_0(input)
}

// re_0 -> re_1 | re_1 `|` re_1 (alternation)
// re_0 -> re_1 | re_0 `|` re_1 (alternation)
fn parse_regex_0(input: ParseStream) -> syn::Result<Regex> {
let mut re = parse_regex_1(input)?;

Expand All @@ -154,7 +178,7 @@ fn parse_regex_0(input: ParseStream) -> syn::Result<Regex> {
Ok(re)
}

// re_1 -> re_2 | re_2 re_2
// re_1 -> re_2 | re_1 re_2 (concatenation)
fn parse_regex_1(input: ParseStream) -> syn::Result<Regex> {
let mut re = parse_regex_2(input)?;

Expand Down Expand Up @@ -213,7 +237,7 @@ fn parse_regex_4(input: ParseStream) -> syn::Result<Regex> {
if input.peek(syn::token::Paren) {
let parenthesized;
syn::parenthesized!(parenthesized in input);
parse_regex(&parenthesized)
parse_regex(&parenthesized) // no right ctx
} else if input.peek(syn::token::Dollar) {
let _ = input.parse::<syn::token::Dollar>()?;
if input.parse::<syn::token::Dollar>().is_ok() {
Expand Down Expand Up @@ -269,7 +293,7 @@ fn parse_single_rule(
input: ParseStream,
semantic_action_table: &mut SemanticActionTable,
) -> syn::Result<SingleRule> {
let lhs = parse_regex(input)?;
let lhs = parse_regex_ctx(input)?;

let rhs = if input.parse::<syn::token::Comma>().is_ok() {
RuleRhs::None
Expand Down Expand Up @@ -308,7 +332,7 @@ fn parse_rule(
input.parse::<syn::token::Let>()?;
let var = input.parse::<syn::Ident>()?;
input.parse::<syn::token::Eq>()?;
let re = parse_regex(input)?;
let re = parse_regex_ctx(input)?;
input.parse::<syn::token::Semi>()?;
Ok(Rule::Binding {
var: Var(var.to_string()),
Expand Down
20 changes: 11 additions & 9 deletions crates/lexgen/src/dfa.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ pub mod simplify;
pub mod simulate;

use crate::collections::{Map, Set};
use crate::nfa::AcceptingState;
use crate::range_map::{Range, RangeMap};

use std::convert::TryFrom;
Expand Down Expand Up @@ -38,7 +39,7 @@ pub struct State<T, A> {
range_transitions: RangeMap<T>,
any_transition: Option<T>,
end_of_input_transition: Option<T>,
accepting: Option<A>,
accepting: Vec<AcceptingState<A>>,
// Predecessors of the state, used to inline code for a state with one predecessor in the
// predecessor's code
predecessors: Set<StateIdx>,
Expand All @@ -52,7 +53,7 @@ impl<T, A> State<T, A> {
range_transitions: Default::default(),
any_transition: None,
end_of_input_transition: None,
accepting: None,
accepting: vec![],
predecessors: Default::default(),
}
}
Expand Down Expand Up @@ -81,12 +82,8 @@ impl<A> DFA<StateIdx, A> {
StateIdx(0)
}

pub fn make_state_accepting(&mut self, state: StateIdx, value: A) {
// Give first rule priority
let accepting = &mut self.states[state.0].accepting;
if accepting.is_none() {
*accepting = Some(value);
}
pub fn make_state_accepting(&mut self, state: StateIdx, accept: AcceptingState<A>) {
self.states[state.0].accepting.push(accept);
}

pub fn new_state(&mut self) -> StateIdx {
Expand All @@ -95,6 +92,11 @@ impl<A> DFA<StateIdx, A> {
new_state_idx
}

#[cfg(test)]
pub fn is_accepting_state(&self, state: StateIdx) -> bool {
!self.states[state.0].accepting.is_empty()
}

pub fn add_char_transition(&mut self, state: StateIdx, char: char, next: StateIdx) {
let old = self.states[state.0].char_transitions.insert(char, next);
assert!(
Expand Down Expand Up @@ -238,7 +240,7 @@ impl<A> Display for DFA<StateIdx, A> {
predecessors: _,
} = state;

if accepting.is_some() {
if !accepting.is_empty() {
if *initial {
write!(f, "{:>5}:", format!("i*{}", state_idx))?;
} else {
Expand Down
Loading