You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, when running scrapy using default arguments, I am prompted with an Indentation Error, scrapy outputs the task is completed, and I'm left with a blank csv file. Does anyone know how I can troubleshoot this issue?
_scrapy crawl fb -a email="@gmail.com" -a password="pwd" -a page="DonaldTrump" -a lang="en" -o Trump.csv_
`2020-06-12 10:49:57 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: fbcrawl)
2020-06-12 10:49:57 [scrapy.utils.log] INFO: Versions: lxml 4.5.1.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.7 (default, Mar 10 2020, 15:43:03) - [Clang 11.0.0 (clang-1100.0.33.17)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Darwin-18.7.0-x86_64-i386-64bit
2020-06-12 10:49:57 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'fbcrawl',
'DOWNLOAD_DELAY': 3,
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
'FEED_EXPORT_ENCODING': 'utf-8',
'FEED_EXPORT_FIELDS': ['source',
'shared_from',
'date',
'text',
'reactions',
'likes',
'ahah',
'love',
'wow',
'sigh',
'grrr',
'comments',
'post_id',
'url'],
'LOG_LEVEL': 'INFO',
'NEWSPIDER_MODULE': 'fbcrawl.spiders',
'SPIDER_MODULES': ['fbcrawl.spiders'],
'URLLENGTH_LIMIT': 99999,
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
2020-06-12 10:49:57 [scrapy.extensions.telnet] INFO: Telnet Password: 9110026fee2dbf75
2020-06-12 10:49:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2020-06-12 10:49:57 [fb] INFO: Email and password provided, will be used to log in
2020-06-12 10:49:57 [fb] INFO: Date attribute not provided, scraping date set to 2004-02-04 (fb launch date)
2020-06-12 10:49:57 [fb] INFO: Language attribute recognized, using "en" for the facebook interface
2020-06-12 10:49:57 [scrapy.core.engine] INFO: Spider opened
2020-06-12 10:49:57 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-06-12 10:49:57 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-06-12 10:50:04 [fb] INFO: Going through the "save-device" checkpoint
2020-06-12 10:50:13 [fb] INFO: Scraping facebook page https://mbasic.facebook.com/DonaldTrump
2020-06-12 10:50:16 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump> (referer: https://mbasic.facebook.com/?_rdr)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
StopIteration: <200 https://mbasic.facebook.com/DonaldTrump>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/defer.py", line 55, in mustbe_deferred
result = f(*args, **kw)
File "/usr/local/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 60, in process_spider_input
return scrape_func(response, request, spider)
File "/usr/local/lib/python3.7/site-packages/scrapy/core/scraper.py", line 152, in call_spider
warn_on_generator_with_return_value(spider, callback)
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 202, in warn_on_generator_with_return_value
if is_generator_with_return_value(callable):
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 187, in is_generator_with_return_value
tree = ast.parse(dedent(inspect.getsource(callable)))
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "", line 1
def parse_page(self, response):
^
IndentationError: unexpected indent
2020-06-12 10:50:16 [scrapy.core.engine] INFO: Closing spider (finished)
2020-06-12 10:50:16 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 3855,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 4,
'downloader/request_method_count/POST': 2,
'downloader/response_bytes': 53212,
'downloader/response_count': 6,
'downloader/response_status_count/200': 4,
'downloader/response_status_count/302': 2,
'elapsed_time_seconds': 19.003456,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 6, 12, 14, 50, 16, 482042),
'log_count/ERROR': 1,
'log_count/INFO': 12,
'memusage/max': 52682752,
'memusage/startup': 52682752,
'request_depth_max': 3,
'response_received_count': 4,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
'spider_exceptions/IndentationError': 1,
'start_time': datetime.datetime(2020, 6, 12, 14, 49, 57, 478586)}
2020-06-12 10:50:16 [scrapy.core.engine] INFO: Spider closed (finished)`
The text was updated successfully, but these errors were encountered:
Try removing the commented lines in fbcrawl.py line 141 in the body of the parse_page() function
if that does not work try removing the comments from the body of the parse_page() function in these files :
fbcrawl.py
comments.py
events.py
profiles.py
it's a bug in the Scraper library mentioned in this issue here
Di-ref
added a commit
to Di-ref/fbcrawl
that referenced
this issue
Jul 26, 2020
Hello, when running scrapy using default arguments, I am prompted with an Indentation Error, scrapy outputs the task is completed, and I'm left with a blank csv file. Does anyone know how I can troubleshoot this issue?
_scrapy crawl fb -a email="@gmail.com" -a password="pwd" -a page="DonaldTrump" -a lang="en" -o Trump.csv_
`2020-06-12 10:49:57 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: fbcrawl)
2020-06-12 10:49:57 [scrapy.utils.log] INFO: Versions: lxml 4.5.1.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.7 (default, Mar 10 2020, 15:43:03) - [Clang 11.0.0 (clang-1100.0.33.17)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Darwin-18.7.0-x86_64-i386-64bit
2020-06-12 10:49:57 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'fbcrawl',
'DOWNLOAD_DELAY': 3,
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
'FEED_EXPORT_ENCODING': 'utf-8',
'FEED_EXPORT_FIELDS': ['source',
'shared_from',
'date',
'text',
'reactions',
'likes',
'ahah',
'love',
'wow',
'sigh',
'grrr',
'comments',
'post_id',
'url'],
'LOG_LEVEL': 'INFO',
'NEWSPIDER_MODULE': 'fbcrawl.spiders',
'SPIDER_MODULES': ['fbcrawl.spiders'],
'URLLENGTH_LIMIT': 99999,
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
2020-06-12 10:49:57 [scrapy.extensions.telnet] INFO: Telnet Password: 9110026fee2dbf75
2020-06-12 10:49:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2020-06-12 10:49:57 [fb] INFO: Email and password provided, will be used to log in
2020-06-12 10:49:57 [fb] INFO: Date attribute not provided, scraping date set to 2004-02-04 (fb launch date)
2020-06-12 10:49:57 [fb] INFO: Language attribute recognized, using "en" for the facebook interface
2020-06-12 10:49:57 [scrapy.core.engine] INFO: Spider opened
2020-06-12 10:49:57 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-06-12 10:49:57 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-06-12 10:50:04 [fb] INFO: Going through the "save-device" checkpoint
2020-06-12 10:50:13 [fb] INFO: Scraping facebook page https://mbasic.facebook.com/DonaldTrump
2020-06-12 10:50:16 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump> (referer: https://mbasic.facebook.com/?_rdr)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
StopIteration: <200 https://mbasic.facebook.com/DonaldTrump>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/defer.py", line 55, in mustbe_deferred
result = f(*args, **kw)
File "/usr/local/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 60, in process_spider_input
return scrape_func(response, request, spider)
File "/usr/local/lib/python3.7/site-packages/scrapy/core/scraper.py", line 152, in call_spider
warn_on_generator_with_return_value(spider, callback)
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 202, in warn_on_generator_with_return_value
if is_generator_with_return_value(callable):
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 187, in is_generator_with_return_value
tree = ast.parse(dedent(inspect.getsource(callable)))
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "", line 1
def parse_page(self, response):
^
IndentationError: unexpected indent
2020-06-12 10:50:16 [scrapy.core.engine] INFO: Closing spider (finished)
2020-06-12 10:50:16 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 3855,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 4,
'downloader/request_method_count/POST': 2,
'downloader/response_bytes': 53212,
'downloader/response_count': 6,
'downloader/response_status_count/200': 4,
'downloader/response_status_count/302': 2,
'elapsed_time_seconds': 19.003456,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 6, 12, 14, 50, 16, 482042),
'log_count/ERROR': 1,
'log_count/INFO': 12,
'memusage/max': 52682752,
'memusage/startup': 52682752,
'request_depth_max': 3,
'response_received_count': 4,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
'spider_exceptions/IndentationError': 1,
'start_time': datetime.datetime(2020, 6, 12, 14, 49, 57, 478586)}
2020-06-12 10:50:16 [scrapy.core.engine] INFO: Spider closed (finished)`
The text was updated successfully, but these errors were encountered: