project/
├── scraping/
│ ├── amazon_reviews_scraper.ipynb
│ └── selectors.yml
├── data/
│ ├── books_category_reviews.csv
│ └── beauty_category_reviews.csv
├── analysis/
│ └── analysis.ipynb
└── requirements.txt
- Web scraping of Amazon product reviews
- Text preprocessing and cleaning
- Fake review detection using machine learning
- Comparative analysis between product categories
- Statistical analysis and visualization
- Sentiment analysis
- Pattern detection in review behaviors
git clone https://github.com/yourusername/Amazon_fake_review_impact_analysis.git
cd Amazon_fake_review_impact_analysis
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
- Open
amazon_reviews_scraper.ipynb
- Update product URLs in the notebook
- Run all cells to collect reviews
- Scraped data will be saved in
data
- Open
analysis.ipynb
- Run the notebook to:
- Process the collected reviews
- Detect potential fake reviews
- Generate visualizations
- View comparative statistics
The analysis provides insights into:
- Fake review patterns across categories
- Impact on product ratings
- Sentiment distribution
- Review length and content patterns
- Temporal patterns in review posting
- Python: 3.9+
- Jupyter Notebooks
- Key Libraries:
pandas
scikit-learn
NLTK
TextBlob
Matplotlib/Seaborn
Requests
Selectorlib
- Respect Amazon's
robots.txt
and rate limiting - Results are probabilistic and should be interpreted accordingly
- The model's accuracy depends on training data quality
- Regular updates may be needed as review patterns evolve
This project is licensed under the MIT License - see the MIT License file for details.
This Amazon scraper is intended for educational purposes only. Please ensure that your use of this tool complies with Amazon's terms of service and any applicable laws and regulations.
- Fork the repository
- Create a feature branch
- Commit changes
- Push to the branch
- Open a pull request