-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to print better parse errors #3700
Conversation
bc52ee6
to
2c1546b
Compare
@Baltoli this is the discussion that started this work: https://runtimeverification.slack.com/archives/C7E701MFG/p1694474725891869 |
@Baltoli I don't have a minimal example, but I can provide real examples from KMIR parsing failures, k v6.0.133 error message
branch
|
I have improved the output from this diagnostic; now instead of using only the immediate predecessor parse state, we use the most recent spanning intermediate state. As an example, consider the following production being added to the
In the initial version Daniel tries above, the following parse error would be produced: $ kast <(echo -n 'AlignOf(3+)') --no-exc-wrap
[Error] Inner Parser: Parse error: unexpected token ')' following token '+'. Additional parsing diagnostic information:
No top-level production could apply to this input.
Source(/dev/fd/11)
Location(1,11,1,12)
1 |
. ^ because no production includes the terminal sequence $ kast <(echo -n 'AlignOf(3+)') --no-exc-wrap
[Error] Inner Parser: Parse error: unexpected token ')' following token '+'. Additional parsing diagnostic information:
Attempting to apply production:
syntax Exp ::= "AlignOf" "(" Exp ")"
produced partial term:
`AlignOf(_)_ARITHMETIC-SYNTAX_Exp_Exp`(#token("<error>","<invalid>"))
Source(/dev/fd/11)
Location(1,11,1,12)
1 |
. ^ In the interests of getting this change merged, and keeping the diff small, this is probably as involved of a heuristic as I'm prepared to implement for the time being. If in the future we identify a case where this heuristic is deficient, we can open a new PR to extend the existing diagnostic implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java part lgtm.
I would like it if Daniel also approves this since he's the one who reported the issue in the first place.
Quick question. What happens if the term is very large? Is it going to pretty print everything?
Daniel has tested the latest version of the output and confirmed to me that it's useful via DM; if the term is large then the AST printed will indeed also be large. |
This PR fixes #3672 by providing a best-effort rendering of the partial state of the parser when a parse error happens. To do this, we grab the Earley state corresponding to the span
[0, token_index_before_error]
and inspect the partial parse trees available, as well as the production being attempted at this point.Consider as a simple example the following standard K definition for arithmetic expressions:
On trying to
kast
an invalid term (for example,1 * 2 + a * 4
-a
lexes because we've importedID-SYNTAX
, but terms of sortId
can't appear in a valid derivation ofExp
), we see the following message from the new component:we make a best-effort attempt to construct what the top-level
_+_
term might have looked like, placing an<error>
token where we expected to see a valid term of sortExp
.The changes made in this PR are as follows:
EarleyParser
to conditionally perform the rendering as above.--debug-parse
through to the parser when appropriate.I have additionally verified with @dkcumming that the implementation as reviewed produces useful information that would have been useful when addressing the original problem; these cases come from the MIR semantics and are therefore too large to include here as tests.