Works well, should be the default in https://github.com/antlr/antlr4/runtime/TypeScript. #10

kaby76 · 2023-11-10T23:47:05Z

kaby76
Nov 10, 2023

Excellent work! This runtime should be the default for the "official" Antlr. Updating the .d.ts files in https://github.com/antlr/antlr4/tree/dev/runtime/JavaScript/src/antlr4 with the correct type declarations has been a pain.

Please publish a new version soon. ConsoleErrorListener is not exported in the published version 2.0.1 package, but it is available in the current version in the repo.

I tested the runtime with a large subset of grammars in grammars-v4, specifically those without actions. I've added to trgen templates for the target "Antlr4ng".

Of the 340 grammars in grammars-v4, 251 were tested with your runtime (via for i in `find . -name desc.xml | grep -v Generated`; do dirname $i; x=`dirname $i`; no=0; for j in $x/*.g4; do got=`trparse -t ANTLRv4 $j | trxgrep ' //actionBlock' | trtext -c`; if [ $got -gt 0 ]; then no=1; fi; done; if [ $no -eq 0 ]; then pushd $x; trgen -t Antlr4ng --force; cd Generated-Antlr4ng; make; make test; popd; fi; done).

Of those, almost all passed. Of the 10 or so that failed, the problems fall into one of four categories.

The first category of tests that fail happen with "symbol conflicts". One particularly annoying conflict involves the parser start symbol, e.g., start symbol module of grammar clu. The "symbol conflict avoidance" implementation renames "module" to "module_" in the generated parser code. Unfortunately, we have no way to predict a priori that the start name has to be "module_". The compilation fails because I don't have any idea when to rename the start symbol in the driver to "module_". I noted the problem long ago, but no solution has been suggested. Other grammars with this problem are: esolang, haskell, kuka, and oberon (found via for i in `find . -name desc.xml | grep -v Generated`; do trparse `dirname $i`/*.g4 | trxgrep ' /grammarSpec[grammarDecl[not(grammarType/LEXER)]] //parserRuleSpec[ruleBlock//TOKEN_REF/text()="EOF"]/RULE_REF[text() = "module"]' | trtext; done).

The parser symbol "constructor" cannot be used with your runtime. It causes a compilation error, so this symbol needs to be entered in the table for the code generator to rename.

The second category involves differences in error reporting output, in kotlin. I haven't yet tried to figure out why there is a diff, but it doesn't look like it's a runtime issue, but an issue with the driver I create in trgen.

The third category are for grammars that never really worked (pike). These grammars should be fixed.

The fourth category is a problem with compiling the generated code for rego. Somehow RegoLexer.ts does not contain a definition for channel COMMENTS_AND_FORMATTING. This seems to be a problem with codegen in the Antlr tool templates for the target.

mike-lischke · 2023-11-11T10:53:05Z

mike-lischke
Nov 11, 2023
Maintainer

Thanks @kaby76. I just released a new version with the additional exports. For the inclusion in main ANTLR: I asked Ter to wait, but Eric rushed in his type definition stuff and I had no chance. But that's not a big deal actually, since I plan for a long time to fork ANTLR4 and create a new one (ANTLRng) by porting it to TypeScript in a way that hopefully makes it easy for others to port that over to their preferred language. I want to allow ANTLR itself to progress again, introduce new features and so on. Some of the things you do in your tools would be highly welcome!

I want to fix most of the open bugs of my ANTLR VS Code extension now and then start the next step by porting the runtime tests to TypeScript in a way that allows much quicker execution (currently they take between 2:30 and 10:00 mins for me, which is way too slow for safe refactoring). The slow execution was a major pain in the port from JS to TS. During this step I will lay ground for a new testing framework, which moves responsibility from the tool to the target authors, who can then optimize test execution that fits their platform better (e.g. avoid frequent builds for the C++ target). Neither targets nor their testing (and release process) should be the responsibility of the ANTLR tool.

I love the work you have done to verify the new runtime with various grammars from the grammar directory. That gives additional confidence it is good for production use. Not sure if we can fix the failing grammars from the tool side, but maybe it would help to define a test harness that all grammars have to execute successfully before they are added to the grammar directory. But because of actions that might be difficult.

I also plan to replace the STG based code generation with a plugin based approach (have no details yet), where target writers get code completion/inline help for each of the template rules that must be implemented. This can then also check for bad words, according to the target language. I have written now 2 targets and still don't know all of the mandatory target STG rules and which are optional or not being used at all, and in which context. That's scary somehow. A template editor should provide a list of these names, their parameters with types and help with the overall syntax.

2 replies

lppedd Nov 23, 2023

Thank you. Coming from the Flex world, I see ANTLR is stagnating especially on the lexing side. I've noticed version 4 removed functionalities which probably shouldn't have been removed, like syntactic predicates, or the capability of rewinding the lexer state to the beginning of the rule.

This means every grammar implementation carries its own set of non portable workarounds.

mike-lischke Nov 23, 2023
Maintainer

@lppedd There's a good reason why syntactic predicates have been removed: they were not needed anymore. With the new LL(*) algorithm (unlimited lookahead), which is used for both parsers and lexers, there's simply no need to guide the prediction process using these predicates.

It will certainly take several months until something useful comes out from my efforts, but you are invited to take part in the discussions about what can be improved then and open PRs with your additions.

mike-lischke · 2023-11-23T13:06:25Z

mike-lischke
Nov 23, 2023
Maintainer

@kaby76 I just got an idea for the conflicting-rule-names problem: what if we had an option for the antlr tool to specify a common prefix or suffix for all generated rules? This could be $ or _ or even a valid Unicode char (sequence). The generated lookup tables (for the vocabulary) still can have the original names. But that requires having control over the tool code, which is something for later.

3 replies

kaby76 Nov 23, 2023
Author

@mike-lischke I always thought that was what was going to be implemented in the tool for symbols conflicts: just rename all symbols generated to have a prefix or suffix. It's predicable. But, we ended up with the current situation. Not sure what the argument was for this as opposed to just appending a prefix or suffix for all symbols. The current implementation does the rename, but conditionally on the symbol for the target. So, to use the grammar across targets, I have to rename the start rule otherwise I won't know what it was renamed.

mike-lischke Nov 23, 2023
Maintainer

Right, it's better to rename all symbols the same way, but make it configurable. For example leading and trailing underscores might conflict with linter settings etc.

backsapce Dec 20, 2023

"I have written now 2 targets and still don't know all of the mandatory target STG rules and which are optional or not being used at all"
Couldn't agree more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Works well, should be the default in https://github.com/antlr/antlr4/runtime/TypeScript. #10

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Works well, should be the default in https://github.com/antlr/antlr4/runtime/TypeScript. #10

kaby76 Nov 10, 2023

Replies: 2 comments · 5 replies

mike-lischke Nov 11, 2023 Maintainer

lppedd Nov 23, 2023

mike-lischke Nov 23, 2023 Maintainer

mike-lischke Nov 23, 2023 Maintainer

kaby76 Nov 23, 2023 Author

mike-lischke Nov 23, 2023 Maintainer

backsapce Dec 20, 2023

kaby76
Nov 10, 2023

Replies: 2 comments 5 replies

mike-lischke
Nov 11, 2023
Maintainer

mike-lischke Nov 23, 2023
Maintainer

mike-lischke
Nov 23, 2023
Maintainer

kaby76 Nov 23, 2023
Author

mike-lischke Nov 23, 2023
Maintainer