-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No handler for status code != 200 #62
Comments
lots of good stuff in here! thanks |
Abnormal status will be returned with the status message string ContentMine/quickscrape#62
The fix was actually for thresher and not quickscrape. I pushed the changes and it seem to have merged them with my last pull request for another bug, I'm a totally noob so I might have gotten the procedure wrong. Please let me know if I need to make any changes on my end. |
Thanks for this @lanzer and sorry for the slow reply - I've been away at various events. I will be incorporating these fixes in new releases in the next few days. |
I'm going to take over having a look at this in the next few days; I also wrote a patch to fix this because I didn't realised there had been one in the pipeline for a while. |
When a status code other than "200 OK" is received, the process would halt. This can be caused by a "404 not found" or server side problem such as exceeded bandwidth, or permission error. It's a problem for me as I am working with a big list of URL with entries that are potentially outdated.
I noticed that under the basic renderer (there is a headless renderer, but it isn't called even with the -h parameter), it doesn't listen for status code other than 200:
basic.js (14)
Also scraper.js does not have a listener for abnormal status:
scraper.js (252)
I've added a few lines to make things work for me
basic.js (14)
scraper.js (252)
Quickscrape does not read the result as an error and would report "0/0 elements captured (0 capture failed)", when it should read "0/2 elements" or whatever number configured in the JSON. Haven't looked into how reporting is handled.
For the time being, I noticed someting thresher.js
thresher.js (75)
That should probably be a comparison operator.
Hope this helps!
The text was updated successfully, but these errors were encountered: