-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spider error processing #59
Comments
I have the same error, do you solve this problem ? |
in my case it returns (spiders-env) gvak@gvak-H61M-D2-B3:~/spiders-env/fbcrawl-master/fbcrawl/spiders$ scrapy crawl fb -a email="@gmail.com" -a password="*" -a page="DonaldTrump" -a lang="it" -o test.csv During handling of the above exception, another exception occurred: Traceback (most recent call last): any help please? |
my machine is 32 bit, i saw in terminal it says does the fbcrawl runs on 32bit machines? |
I have the same problem. Then I tried to delete space between 'def' and 'parse_page' in 'def parse_page' and add a space character between them later. And it worked with me :) |
Pls help me, i do the same way to you but error still is error. Can you get an email to me pls, |
Open fbcrawl.py and reformat the code ( ctrl+alt+L), this will solve the issue |
It didn't for me, still have the same issue Any piece of advice? |
Try this, In function parse_page in fbcrawler change row |
@talhalatiforakzai i tried but for non public pages (that i am a member of) it doesn't work as before |
it worked for me, thank you. |
I'm having the same issue as those in this post, although the row change from //div to //article hasn't fixed the issue even when I am plugging in a public page. @talhalatiforakzai any help would be much appreciated. I have pasted my terminal output below.
`2020-06-12 13:48:56 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: fbcrawl) During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Sorry for this issue, I've tried Google it but still can't find a solution.
when I try follow command
scrapy crawl fb -a email="@gmail.com" -a password="_" -a page="DonaldTrump" -a date="2018-01-01" -a lang="en" -o output.csv
but I got this error
2020-03-27 19:34:59 [scrapy.utils.log] INFO: Scrapy 2.0.1 started (bot: fbcrawl)
2020-03-27 19:34:59 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 22:45:29) [MSC v.1916 32 bit (Intel)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1d 10 Sep 2019), cryptography 2.8, Platform Windows-10-10.0.18362-SP0
2020-03-27 19:34:59 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'fbcrawl',
'DOWNLOAD_DELAY': 3,
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
'FEED_EXPORT_ENCODING': 'utf-8',
'FEED_EXPORT_FIELDS': ['source',
'shared_from',
'date',
'text',
'reactions',
'likes',
'ahah',
'love',
'wow',
'sigh',
'grrr',
'comments',
'post_id',
'url'],
'FEED_FORMAT': 'csv',
'FEED_URI': 'DUMPFILE.csv',
'LOG_LEVEL': 'INFO',
'NEWSPIDER_MODULE': 'fbcrawl.spiders',
'SPIDER_MODULES': ['fbcrawl.spiders'],
'URLLENGTH_LIMIT': 99999,
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
2020-03-27 19:34:59 [scrapy.extensions.telnet] INFO: Telnet Password: 41ca2711c3d9f1ce
2020-03-27 19:34:59 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2020-03-27 19:34:59 [fb] INFO: Email and password provided, will be used to log in
2020-03-27 19:34:59 [fb] INFO: Date attribute provided, fbcrawl will stop crawling at 2018-01-01
2020-03-27 19:34:59 [fb] INFO: Language attribute recognized, using "en" for the facebook interface
2020-03-27 19:35:00 [scrapy.core.engine] INFO: Spider opened
2020-03-27 19:35:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-27 19:35:00 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-03-27 19:35:07 [fb] INFO: Going through the "save-device" checkpoint
2020-03-27 19:35:15 [fb] INFO: Scraping facebook page https://mbasic.facebook.com/DonaldTrump
2020-03-27 19:35:18 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump> (referer: https://mbasic.facebook.com/?_rdr)
Traceback (most recent call last):
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\core\downloader\middleware.py", line 42, in process_request
defer.returnValue((yield download_func(request=request, spider=spider)))
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\twisted\internet\defer.py", line 1362, in returnValue
raise _DefGen_Return(val)
twisted.internet.defer._DefGen_Return: <200 https://mbasic.facebook.com/DonaldTrump>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
result = f(*args, **kw)
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\core\spidermw.py", line 60, in process_spider_input
return scrape_func(response, request, spider)
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\core\scraper.py", line 148, in call_spider
warn_on_generator_with_return_value(spider, callback)
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\misc.py", line 202, in warn_on_generator_with_return_value
if is_generator_with_return_value(callable):
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\site-packages\scrapy\utils\misc.py", line 187, in is_generator_with_return_value
tree = ast.parse(dedent(inspect.getsource(callable)))
File "c:\users\lam vien\appdata\local\programs\python\python38-32\lib\ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "", line 1
def parse_page(self, response):
^
IndentationError: unexpected indent
2020-03-27 19:35:18 [scrapy.core.engine] INFO: Closing spider (finished)
2020-03-27 19:35:18 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 3867,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 4,
'downloader/request_method_count/POST': 2,
'downloader/response_bytes': 57037,
'downloader/response_count': 6,
'downloader/response_status_count/200': 4,
'downloader/response_status_count/302': 2,
'elapsed_time_seconds': 18.180147,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 3, 27, 12, 35, 18, 403258),
'log_count/ERROR': 1,
'log_count/INFO': 12,
'request_depth_max': 3,
'response_received_count': 4,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
'spider_exceptions/IndentationError': 1,
'start_time': datetime.datetime(2020, 3, 27, 12, 35, 0, 223111)}
2020-03-27 19:35:18 [scrapy.core.engine] INFO: Spider closed (finished)
as I said, I think the problem start from the line: ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump> (referer: https://mbasic.facebook.com/?_rdr)
Could anyone please tell me how to fix it? Many thanks!
The text was updated successfully, but these errors were encountered: