GitHub - csse-uoft/website-scraper

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
PyWebScraper.yml		PyWebScraper.yml
README.md		README.md
main.py		main.py

Repository files navigation

Install Dependencies

conda env create -f PyWebScraper.yml

Run

conda activate PyWebScraper
python main.py -u http://domain.com

Paramters

-u : Starting URL to parse (e.g. http://main.com).
-m : XPath to look for main content (e.g. 'div.main', 'div[id="main"]').
-n : XPath to look for site navigation links (e.g. 'div.nav a').
-js : Whether to run JavaScript on page or not (0=False, 1=True (default)).

About

No description, website, or topics provided.

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages

Python 100.0%