Skip to content
This repository has been archived by the owner on Sep 11, 2022. It is now read-only.

Releases: danthespal/webScraper

webScraper v0.4 - alpha

20 Feb 10:11
Compare
Choose a tag to compare
  • added requirements.txt file so that installing same libraries will be easier and updated README file
  • corrected pep8 warnings
  • added dynamic user-agents to settings file
  • added additional info about a product
  • added loop until last page to extract all elements
  • addapt css to xpath, see more info in comments
  • store items informations into sqlite database
  • fix some problems in saving items
  • increase column product_name string
  • check for duplicate product_name

webScraper v0.3 - alpha

19 Feb 18:01
01ecf99
Compare
Choose a tag to compare
  • download from multiple pages

webScraper v0.2 - alpha

19 Feb 14:58
58d10bd
Compare
Choose a tag to compare
  • base for downloading items [crawler.py]
  • little customization for product_name [items.py]
  • add user agent (you need to install 'scrapy-user-agents 0.1.1') [settings.py]
  • enabled download_delay and auto_throttle for better scraping [settings.py]

webScraper v0.1 - alpha

19 Feb 13:03
50ee439
Compare
Choose a tag to compare

just the basics scrapy framework generated content