Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"IndentationError: Unexpected Indent" when running scrapy, blank csv file #62

Open
TheSeedMan opened this issue Jun 12, 2020 · 2 comments

Comments

@TheSeedMan
Copy link

Hello, when running scrapy using default arguments, I am prompted with an Indentation Error, scrapy outputs the task is completed, and I'm left with a blank csv file. Does anyone know how I can troubleshoot this issue?

_scrapy crawl fb -a email="@gmail.com" -a password="pwd" -a page="DonaldTrump" -a lang="en" -o Trump.csv_

`2020-06-12 10:49:57 [scrapy.utils.log] INFO: Scrapy 2.1.0 started (bot: fbcrawl)
2020-06-12 10:49:57 [scrapy.utils.log] INFO: Versions: lxml 4.5.1.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.7 (default, Mar 10 2020, 15:43:03) - [Clang 11.0.0 (clang-1100.0.33.17)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 2.9.2, Platform Darwin-18.7.0-x86_64-i386-64bit
2020-06-12 10:49:57 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'fbcrawl',
'DOWNLOAD_DELAY': 3,
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
'FEED_EXPORT_ENCODING': 'utf-8',
'FEED_EXPORT_FIELDS': ['source',
'shared_from',
'date',
'text',
'reactions',
'likes',
'ahah',
'love',
'wow',
'sigh',
'grrr',
'comments',
'post_id',
'url'],
'LOG_LEVEL': 'INFO',
'NEWSPIDER_MODULE': 'fbcrawl.spiders',
'SPIDER_MODULES': ['fbcrawl.spiders'],
'URLLENGTH_LIMIT': 99999,
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
2020-06-12 10:49:57 [scrapy.extensions.telnet] INFO: Telnet Password: 9110026fee2dbf75
2020-06-12 10:49:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2020-06-12 10:49:57 [fb] INFO: Email and password provided, will be used to log in
2020-06-12 10:49:57 [fb] INFO: Date attribute not provided, scraping date set to 2004-02-04 (fb launch date)
2020-06-12 10:49:57 [fb] INFO: Language attribute recognized, using "en" for the facebook interface
2020-06-12 10:49:57 [scrapy.core.engine] INFO: Spider opened
2020-06-12 10:49:57 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-06-12 10:49:57 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-06-12 10:50:04 [fb] INFO: Going through the "save-device" checkpoint
2020-06-12 10:50:13 [fb] INFO: Scraping facebook page https://mbasic.facebook.com/DonaldTrump
2020-06-12 10:50:16 [scrapy.core.scraper] ERROR: Spider error processing <GET https://mbasic.facebook.com/DonaldTrump> (referer: https://mbasic.facebook.com/?_rdr)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
StopIteration: <200 https://mbasic.facebook.com/DonaldTrump>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/defer.py", line 55, in mustbe_deferred
result = f(*args, **kw)
File "/usr/local/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 60, in process_spider_input
return scrape_func(response, request, spider)
File "/usr/local/lib/python3.7/site-packages/scrapy/core/scraper.py", line 152, in call_spider
warn_on_generator_with_return_value(spider, callback)
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 202, in warn_on_generator_with_return_value
if is_generator_with_return_value(callable):
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 187, in is_generator_with_return_value
tree = ast.parse(dedent(inspect.getsource(callable)))
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "", line 1
def parse_page(self, response):
^
IndentationError: unexpected indent
2020-06-12 10:50:16 [scrapy.core.engine] INFO: Closing spider (finished)
2020-06-12 10:50:16 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 3855,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 4,
'downloader/request_method_count/POST': 2,
'downloader/response_bytes': 53212,
'downloader/response_count': 6,
'downloader/response_status_count/200': 4,
'downloader/response_status_count/302': 2,
'elapsed_time_seconds': 19.003456,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 6, 12, 14, 50, 16, 482042),
'log_count/ERROR': 1,
'log_count/INFO': 12,
'memusage/max': 52682752,
'memusage/startup': 52682752,
'request_depth_max': 3,
'response_received_count': 4,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
'spider_exceptions/IndentationError': 1,
'start_time': datetime.datetime(2020, 6, 12, 14, 49, 57, 478586)}
2020-06-12 10:50:16 [scrapy.core.engine] INFO: Spider closed (finished)`

@b-girma
Copy link

b-girma commented Jun 22, 2020

if you downgrade scrapy framework to version 1.5.0 it works fine

@Di-ref
Copy link

Di-ref commented Jul 26, 2020

Try removing the commented lines in fbcrawl.py line 141 in the body of the parse_page() function
if that does not work try removing the comments from the body of the parse_page() function in these files :

  • fbcrawl.py
  • comments.py
  • events.py
  • profiles.py

it's a bug in the Scraper library mentioned in this issue here

Di-ref added a commit to Di-ref/fbcrawl that referenced this issue Jul 26, 2020
@Di-ref Di-ref mentioned this issue Jul 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants