Template files and instructions for facilitating the creation of custom scrapers.
-
Supply the url to be scraped and the selector of the website element to be scraped into the
check
dictionary inConfig
.- To obtain the item selector, right-click the element in your browser and select inspect, then right-click the element in the inspection widget and select copy css selector or copy full xpath.
-
Ensure that the
mail_info
attribute in theConfig
class contains the following attributes:addr
: gmail address with working application password. See google help for the steps in setting application passwords.app_pw
: application password.
-
Change the pre-supplied runtime configuration in
Config
to your needs. -
Define how the browser obtains the text, and possible images, to be scraped in
Browser._check_update_of_url()
. Code section here. -
Define how the main scraper class passes the scraped data from the browser to the mailer class in
TemplateScraper._check_update_of_url()
. Code section here. -
Define how the mailer creates the email body from the received data in
Mailer._make_msg()
. Code section here. -
Run the /template/main.py script.
-
browser.py contains browser classes utilizing either
puppeteer
(Chromium) orselenium
(Firefox) libraries. -
main.py contains
Config
andTemplateScraper
class. -
mailer.py contains
Mailer
class. -
make_executable.py contains
PyInstaller
class. Running this script automatically creates an executable of the program to be run as a standalone package without Python.- If the custom program requires use of other custom data files: The path to these must be supplied in the
PyInstaller._local_files
attribute to allow for bundling of those files with the created standalone program. --onefile
script argument: Generates a single executable file. See /applications/PyInstaller/main.py for other PyInstaller options.
- If the custom program requires use of other custom data files: The path to these must be supplied in the