S(uper)Scraper

This is software created to learn what sensitive information can be extracted from a website.

This software work simply: you insert the URL of the website then the software will automatically extract the data and store it in a log file. The log file is structured in the way explained below.

Data

The data that this software will extract are:

Metadata
Technologies used
Contacts
Name of people (With the relative role if specified)
Pages with forms

Responsibility

This software is created with the only purpose of learning, I am not responsible for any malicious usage of the software.

How to use

You have to install Python on your device if you haven't already.
You have to install Nmap
Download this repository and extract it into a folder
Use the command pip install -r requirements.txt to install the dependencies needed
Run the main.py file running the command py main.py in the terminal
You are asked to insert a URL of the website, insert the URL of the home page

Logs files

The logs are stored in the folder /logs inside the project directory.

Logs are stored in md files cause this format allows to create easy to read documents.

Structure

Files

This section will include all the sensitive files that can be founded on a website such as sitemap.xml and robots.txt.

Target Information

Whois

A whois of the site

Ip

Ip address

Operating System

OS and other information about the target

Technologies Used

All the technologies that can be detected, for example, GSAP, ReactJS

Pages

Each page found on the website will be analyzed then the information will be reported in the log file as follows:

Title | Url

Metadata

All the metadata of the current page

Links

List of all the links present on the page

Forms and Inputs

If the page includes forms/inputs and information about these forms/inputs

Cookies

The cookies that are saved when you are on the page

Contacts

By contacts, we mean emails, social links, etc. All the information that regard the contacts that are found on the page

General Information

Names of people (With the relative role if specified)

TODOs

Create the logs folder automatically
Update requirements.txt and remove unused dependencies
Update README.md to make it more concise and clear
Add template to create pull requests and add documentation for contributors
Create proper error handling
Implement something like DirBuster
Implement other useful Nmap scans
Improve / add new features to contacts function

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
files		files
lists		lists
mutils		mutils
mvalidators		mvalidators
net		net
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S(uper)Scraper

Data

Responsibility

How to use

Logs files

Structure

Files

Target Information

Whois

Ip

Operating System

Technologies Used

Pages

Title | Url

Metadata

Links

Forms and Inputs

Cookies

Contacts

General Information

TODOs

About

Languages

License

Samuele1818/super-scraper

Folders and files

Latest commit

History

Repository files navigation

S(uper)Scraper

Data

Responsibility

How to use

Logs files

Structure

Files

Target Information

Whois

Ip

Operating System

Technologies Used

Pages

Title | Url

Metadata

Links

Forms and Inputs

Cookies

Contacts

General Information

TODOs

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages