-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime error at AWS #16
Comments
hi @mbnik are you still experiencing this error? |
Hi @chrismattmann , I still have the problem, please let me know if you need further details. |
I am also receiving the error (nutch.nutch.NutchException: Unexpected server response: 204) It's perhaps worth noting that the latest nutch-python release on pip doesn't even get as far as a successful Injector job. |
thanks @mbnik and @valencik I'll investigate this. I am planning some student projects in this area over the next few months that I believe will significantly enhance this. @sujen1412 FYI can you look at this? |
I am looking into this. Initial finding for the job not getting past inject could be due to the Seed API being refactored in Nutch. Look at #17 |
Hi,
I was able to run the following code on my own linux machine without a problem:
however, when I run the same code on AWS (ubuntu 14.04), it gives a runtime error. here is the runtime log of the code:
nutch.py: Response status: 200
nutch.py: Response JSON: {u'crawlId': u'test', u'args': {u'url_dir': u'/tmp/1456875353316-0'}, u'state': u'IDLE', u'result': None, u'msg': u'idle', u'type': u'GENERATE', u'id': u'test-default-GENERATE-1140031758', u'confId': u'default'}
nutch.py: GET Endpoint: /job/test-default-GENERATE-1140031758
nutch.py: GET Request data: {}
nutch.py: GET Request headers: {'Accept': 'application/json'}
nutch.py: Response headers: {'Date': 'Tue, 01 Mar 2016 23:36:35 GMT', 'Content-Length': '0', 'Server': 'Jetty(8.1.15.v20140411)'}
nutch.py: Response status: 204
Traceback (most recent call last):
File "main.py", line 22, in
job = cc.progress() # gets the current job if no progress, else iterates and makes progress
File "/usr/local/lib/python2.7/dist-packages/nutch/nutch.py", line 531, in progress
jobInfo = currentJob.info()
File "/usr/local/lib/python2.7/dist-packages/nutch/nutch.py", line 201, in info
return self.server.call('get', '/job/' + self.id)
File "/usr/local/lib/python2.7/dist-packages/nutch/nutch.py", line 160, in call
raise error
nutch.nutch.NutchException: Unexpected server response: 204
in order to run the python code, I was running nutch as: /bin/nutch startserver, here is the run the
Injector: starting at 2016-03-01 23:35:53
Injector: crawlDb: test/crawldb
Injector: urlDir: /tmp/1456875353316-0
Injector: Converting injected urls to crawl db entries.
Injector: overwrite: false
Injector: update: false
Injector: Total number of urls rejected by filters: 0
Injector: Total number of urls after normalization: 2
Injector: Total new urls injected: 2
Injector: finished at 2016-03-01 23:36:34, elapsed: 00:00:40
Generator: starting at 2016-03-01 23:36:35
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: running in local mode, generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: test/segments/20160301233638
Generator: finished at 2016-03-01 23:36:40, elapsed: 00:00:05
I would appreciate if you can help.
Thanks
The text was updated successfully, but these errors were encountered: