Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input file processing should continue if an individual game cannot be parsed #9

Open
rpdelaney opened this issue Jan 4, 2018 · 3 comments

Comments

@rpdelaney
Copy link
Collaborator

Background
python-chess is generally rather lax about enforcing consistency in the objects it provides. Illegal inputs are typically accepted without raising an exception, even if the call corrupts the working board, GameNode, or Game. (My understanding is that this is for a combination of reasons, including performance and general flexibility.)

Its PGN parsing behavior follows this philosophy: if python-chess fails to parse a PGN game, it will still attempt to create and return a Game, even if that Game includes illegal headers, illegal moves, or other garbage. However, it will also attempt to store a list of parsing errors, which can be retrieved in Game.errors.

python-chess-annotator loops through the games in a PGN file. Because python-chess does not raise exceptions on PGN parsing errors, and to avoid burning up CPU time on corrupted games (which often leads to an engine crash or other nasty stuff), it performs a check for parsing errors using checkgame(). This function checks for Game.errors and raises a RuntimeError if any are found. Additionally, it checks that Game.end().parent is not None (a common corruption state even when Game.errors is None), and raises another RuntimeError if that check fails.

Note that I don't consider this an exhaustive list of methods for verifying that a Game is clean, as I don't fully understand all the ways that a Game can be dirtied up. These are just the two most common that I've found and they seem to cover the great majority of cases.

Problem
The intended behavior of python-chess-annotator is to analyze all the games in a given PGN file, returning annotated games one by one. However, if any one of the games in the file cannot be parsed, python-chess-annotator will halt with a RuntimeError when it is encountered. That may frustrate a user who sets up analysis of a large PGN file only to return later and find that the job was partially completed.

Preferably, python-chess-annotator should skip over the unparseable game and continue to the next one. Ideally, it would print some debugging info via logger.critical so that the user is notified that there is a problem with their PGN file.

Solution
I think the way to achieve this is to define a custom PgnParseError exception, and refactor checkgame() to rase that instead of a RuntimeError. Then the the try/except block in the main() function would be refactored to catch the PgnParseError, print the debugging info, and continue the loop.

@ddugovic
Copy link
Contributor

ddugovic commented Jan 4, 2018

I am interested in improving parsing of invalid or ambiguous moves (I hope one day to use OCR to recognize moves on a scoresheet) but I lack examples.

@rpdelaney
Copy link
Collaborator Author

That would be a good thing to work on, but any such improvements should be made to python-chess.

@rpdelaney
Copy link
Collaborator Author

Also, the same idea should apply to variant games that aren't supported by the engine. We should print a log warning and skip over those games rather than halting with an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants