Amazon Review Scrapper

Introduction

This is a simple spider written using the beautiful Scrapy a Python framework for website scrapping.

How to Run the program

As mentioned above this run on Scrapy framework so make sure you have scrapy installed on your machine. Before that you should have Python 2.7 and above on your machine.

Note These instructions are written for Mac. You will have to search for similar command for Windows and Linux

Setup

$ brew install python

$ pip install Scrapy

If you run into error while installing please resolve it by checking Google or Stackoverflow. (I had to do this when testing it on Early 2008 Mac)

Execution

Now run the below command and see if you are able to see the amazon_review spider listed after the command as below.

$ scrapy list
amazon_review

Populate the amazonreviewpages.txt with all the pages of the reviews. To do this I have a simple python script you can run as below.

$ python generateInputLinks.py https://www.amazon.com/Muse-Brain-Sensing-Headband-Black/dp/B00LOQR37C 33

$ python generateInputLinks.py https://www.amazon.com/<Product Name>/dp/<Product ID> <Number of Pages of Review on the site>

This command will populate the amazonreviewpages.txt with all the necessary values.

Now for the command that will bring all this together. Note If the want the output in output.json then make sure it is empty before running the below command. If not then you can run the command with your own output filename.
```
$ scrapy crawl -o output.json amazon_review
```
```
$ scrapy crawl -o <your output filename> amazon_review
```

Disclaimer: Amazon may blacklist your IP address. Jayanth Kanive is not responsible for any damage that may be caused by using this application.

Note You can vary the download_delay in the ./spider/amazon_review.py to process the request a little slower.

download_delay = 10

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
AmazonReviewScrapper		AmazonReviewScrapper
README.md		README.md
amazonreviewpages.txt		amazonreviewpages.txt
generateInputLinks.py		generateInputLinks.py
output.json		output.json
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Review Scrapper

Introduction

How to Run the program

Setup

Execution

About

Releases

Packages

Languages

jkanive/AmazonReviewScrapper

Folders and files

Latest commit

History

Repository files navigation

Amazon Review Scrapper

Introduction

How to Run the program

Setup

Execution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages