Input file processing should continue if an individual game cannot be parsed #9

rpdelaney · 2018-01-04T17:55:55Z

Background
python-chess is generally rather lax about enforcing consistency in the objects it provides. Illegal inputs are typically accepted without raising an exception, even if the call corrupts the working board, GameNode, or Game. (My understanding is that this is for a combination of reasons, including performance and general flexibility.)

Its PGN parsing behavior follows this philosophy: if python-chess fails to parse a PGN game, it will still attempt to create and return a Game, even if that Game includes illegal headers, illegal moves, or other garbage. However, it will also attempt to store a list of parsing errors, which can be retrieved in Game.errors.

python-chess-annotator loops through the games in a PGN file. Because python-chess does not raise exceptions on PGN parsing errors, and to avoid burning up CPU time on corrupted games (which often leads to an engine crash or other nasty stuff), it performs a check for parsing errors using checkgame(). This function checks for Game.errors and raises a RuntimeError if any are found. Additionally, it checks that Game.end().parent is not None (a common corruption state even when Game.errors is None), and raises another RuntimeError if that check fails.

Note that I don't consider this an exhaustive list of methods for verifying that a Game is clean, as I don't fully understand all the ways that a Game can be dirtied up. These are just the two most common that I've found and they seem to cover the great majority of cases.

Problem
The intended behavior of python-chess-annotator is to analyze all the games in a given PGN file, returning annotated games one by one. However, if any one of the games in the file cannot be parsed, python-chess-annotator will halt with a RuntimeError when it is encountered. That may frustrate a user who sets up analysis of a large PGN file only to return later and find that the job was partially completed.

Preferably, python-chess-annotator should skip over the unparseable game and continue to the next one. Ideally, it would print some debugging info via logger.critical so that the user is notified that there is a problem with their PGN file.

Solution
I think the way to achieve this is to define a custom PgnParseError exception, and refactor checkgame() to rase that instead of a RuntimeError. Then the the try/except block in the main() function would be refactored to catch the PgnParseError, print the debugging info, and continue the loop.

The text was updated successfully, but these errors were encountered:

ddugovic · 2018-01-04T19:07:29Z

I am interested in improving parsing of invalid or ambiguous moves (I hope one day to use OCR to recognize moves on a scoresheet) but I lack examples.

rpdelaney · 2018-01-04T21:40:15Z

That would be a good thing to work on, but any such improvements should be made to python-chess.

rpdelaney · 2018-01-11T20:25:28Z

Also, the same idea should apply to variant games that aren't supported by the engine. We should print a log warning and skip over those games rather than halting with an exception.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input file processing should continue if an individual game cannot be parsed #9

Input file processing should continue if an individual game cannot be parsed #9

rpdelaney commented Jan 4, 2018

ddugovic commented Jan 4, 2018

rpdelaney commented Jan 4, 2018

rpdelaney commented Jan 11, 2018

Input file processing should continue if an individual game cannot be parsed #9

Input file processing should continue if an individual game cannot be parsed #9

Comments

rpdelaney commented Jan 4, 2018

ddugovic commented Jan 4, 2018

rpdelaney commented Jan 4, 2018

rpdelaney commented Jan 11, 2018