Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement right contexts (lookahead) #37

Merged
merged 32 commits into from
Jan 31, 2022
Merged

Implement right contexts (lookahead) #37

merged 32 commits into from
Jan 31, 2022

Conversation

osa1
Copy link
Owner

@osa1 osa1 commented Dec 10, 2021

Fixes #29

@osa1 osa1 mentioned this pull request Dec 10, 2021
@osa1
Copy link
Owner Author

osa1 commented Dec 11, 2021

I think I will have to generate separate NFA/DFAs for right contexts. Currently in DFA simulation I use this to run right contexts:

// Similar to `simulate`, but does not keep track of the last match as we don't need "longest
// match" semantics and backtracking
fn simulate_right_ctx<A>(
    dfa: &DFA<StateIdx, A>,
    init: StateIdx,
    accept: StateIdx,
    mut char_indices: std::str::CharIndices,
) -> bool {
    if init == accept {
        return true;
    }

    let mut state = init;

    while let Some((_, char)) = char_indices.next() {
        match next(dfa, state, char) {
            None => {
                // Stuck
                return false;
            }
            Some(next_state) => {
                if next_state == accept {
                    return true;
                }

                state = next_state;
            }
        }
    }

    match next_end_of_input(dfa, state) {
        None => false,
        Some(next_state) => next_state == accept,
    }
}

In the generated code, for next above we will have a match state { ... } where in each alternative we will have a match char { ... }. These matches will duplicate the code for the main DFA next method, for each right context. That's a lot of duplication.

If we maintain separate DFAs for right contexts we can generate smaller code for next that doesn't have states of the main DFA.

@osa1
Copy link
Owner Author

osa1 commented Jan 31, 2022

So one of the tests I'd added last time I worked on this is this:

lexer! {
    Lexer -> u32;

    // Per longest match we "a" should return 2, not 1
    'a' = 1,
    'a' > $ = 2,
}

let mut lexer = Lexer::new("a");
assert_eq!(next(&mut lexer), Some(Ok(2)));
assert_eq!(next(&mut lexer), None);

However as I think about this again now I realize that this is not a good idea. For this semantics we need to run all right contexts of a state, even after finding on that matches. I'm not sure if this semantics is useful, and it can certainly be slower than needed. Instead I think it should be good enough to try the rules in order and run semantic action of the first one that matches.

This means if there's a rule without a right context in the same state, then the ones with the right context will never be tried. We should probably start generating warnings in these cases, but maybe not in this PR.

@osa1 osa1 marked this pull request as ready for review January 31, 2022 17:10
@osa1 osa1 merged commit ed05fec into main Jan 31, 2022
@osa1 osa1 deleted the right_context branch January 31, 2022 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lookahead could be useful
1 participant