-
I wanted to write a lexer which matches one of my defined tokens, and otherwise matches a catchall token, while still having a relatively clean codebase. I have currently solved this by doing fn lex_delimiter<'s>(input: &mut &'s str) -> PResult<Token<'s>> {
let checkpoint = input.checkpoint();
let (next_token, full_match) =
repeat_till(1.., any, alt((lex_non_delimiter.map(|t| t.into()), eof)))
.map(|((), next)| next)
.with_recognized()
.parse_next(&mut *input)?;
let text = &full_match[0..(full_match.len() - next_token.len())];
input.reset(&checkpoint);
*input = &input[text.len()..];
Ok(Token::Delimiter(Delimiter { text }))
} The obvious problem is that I'm throwing away the 'till' part of the Other potentially relevant parts of the (very simple) code fn lex_non_delimiter<'s>(input: &mut &'s str) -> PResult<Token<'s>> {
alt((lex_word, lex_tag, lex_num)).parse_next(input)
}
fn lex_text<'s>(input: &mut &'s str) -> PResult<Vec<Token<'s>>> {
trace("test", repeat(1.., alt((lex_non_delimiter, lex_delimiter)))).parse_next(input)
} (I'll probably move away from using the delimiter token as my catchall, but even then I'll still need an unknown token or something like that because I must be able to successfully parse input) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Looks like there are two points of complication
You could This will clean up the code but I'm unsure if it will affect performance. Depending on where you land between performance and clean code,
|
Beta Was this translation helpful? Give feedback.
Looks like there are two points of complication
recognize
d theterminator
but don't want toterminator
, rather than after itYou could
peek
yourterminator
. It will match but not advanceinput
. This means you willrecognize
only theparse
and allow parsing to pick back up at theterminator
.This will clean up the code but I'm unsure if it will affect performance.
Depending on where you land between performance and clean code,
repeat_till
(especially perchar
parse
) and all of thosealt
s will kill your performance (along with parsing&str
instead of&[u8]
).dispatch!
as a first pass between all of your token types