Proxy Scrapper CLI is a command line interface application written in Python that extracts information about proxies from this website and save them as a json
or xml
file in the local export folder.
Due to the web scrapping sensitivity nature of such web site, the chosen approach was to simulate a human navigation pattern by opening requests through a web browser client (selenium + geckodriver) and firing click events on page links (<a>
tags) in order to extract information from the other pages of the web site.
prox_crapper, when set locally, requires the following dependencies:
- Python 3.9+
- Firefox
- Python venv
- Firefox geckodriver (*download the geckodriver according to your platform and save it under
/usr/bin
for Linux machine or%USERPROFILE%\AppData\Local\Programs\Python\Python39\Scripts\
for Windows machine. To further reading: geckodriver binary setup)
Execute the commands below to setup the application according to your platform; Linux or Windows only.
The following setup was successfully run on a Windows 10 Pro 64-bit machine
- Clone this repository:
git clone https://github.com/krauss/prox_crapper.git
- Change directory:
cd prox_crapper
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
.\venv\Scripts\activate
- Install prox_crapper dependencies:
pip install -r requirements.txt
- Run prox_crapper application:
python src\main.py
- When you're done, to exit the virtual environment:
deactivate
The following setup was successfully run on a Linux Fedora 33 64-bit machine
- Clone this repository:
git clone https://github.com/krauss/prox_crapper.git
- Change directory:
cd prox_crapper
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
source venv/bin/activate
- Install prox_crapper dependencies:
pip install -r requirements.txt
- Run prox_crapper application:
python src/main.py
- When you're done, to exit the virtual envirnoment:
deactivate
In order to quickly try this out, follow the steps below to build the container and run it:
- Build the container using the Dockerfile file provided
docker build -t prox_crapper .
- [ Linux ] Run the container specifying a volume for the resulting json file
docker run -it -v $PWD/export:/usr/src/app/export prox_crapper
- [ Windows ] Run the container specifying a volume for the resulting json file
docker run -it -v %USERPROFILE%\export:/usr/src/app/export prox_crapper