-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some parsing errors trigger a cascade of failures #23
Comments
Could you provide the revision id number for one of the pages that is parsing poorly? |
Hi @kjschiroo, thanks for the quick reply. Here's an example In this case, maybe it has something to do with the fact that the final comment in the thread was added after the fact? That's the only edit to this page that wasn't made by a lowercase sigmabot. |
@jtmorgan I think this has been fixed with the current version of the parser. I'm guessing it had something to do with the date formatting, but I don't remember exactly. Using the current version from this input: I get this output: Please let me know if this is not the case. |
Revising my previous statement. The error does occur with the current version. It appears to be an issue with
Gives this as output:
I will raise an issue with them. |
For reference the issue with them is: earwig/mwparserfromhell#160 |
I haven't been able to 100% identify the cause here, but it looks like in some cases failure to parse a particular comment causes WikiChatter to fail when parsing the rest of the comments on the page, even those in different threads. It will split up text blocks in subsequent comments, but won't extract author or timestamp, or try to reconstruct the thread structure.
May be related to this issue (@yuvipanda and I are working on the same dataset). I can provide examples, but wanted to check whether the cause was known, before I do more spot checks. Anyone else experienced this issue?
The text was updated successfully, but these errors were encountered: