Skip to content

Commit

Permalink
easy to use
Browse files Browse the repository at this point in the history
  • Loading branch information
lindsey98 committed Jan 26, 2024
1 parent d73d54c commit 0f4adf9
Show file tree
Hide file tree
Showing 19 changed files with 5,156 additions and 90 deletions.
76 changes: 21 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

## Framework

<img src="phishpedia/big_pic/overview.png" style="width:2000px;height:350px"/>
<img src="./datasets/overview.png" style="width:2000px;height:350px"/>

```Input```: A URL and its screenshot ```Output```: Phish/Benign, Phishing target
- Step 1: Enter <b>Deep Object Detection Model</b>, get predicted logos and inputs (inputs are not used for later prediction, just for explanation)
Expand All @@ -40,88 +40,54 @@

## Project structure
```
- src
- adv_attack: adversarial attacking scripts
- detectron2_pedia: training script for object detector
|_ output
|_ rcnn_2
|_ rcnn_bet365.pth
- siamese_pedia: inference script for siamese
|_ siamese_retrain: training script for siamese
|_ expand_targetlist
|_ 1&1 Ionos
|_ ...
|_ domain_map.pkl
|_ resnetv2_rgb_new.pth.tar
- siamese.py: main script for siamese
- pipeline_eval.py: evaluation script for general experiment
- tele: telegram scripts to vote for phishing
- phishpedia_config.py: config script for phish-discovery experiment
- phishpedia_main.py: main script for phish-discovery experiment
- logo_recog.py: Deep Object Detection Model
- logo_matching.py: Deep Siamese Model
- configs.yaml: Configuration file
- phishpedia.py: Main script
```

## Instructions
Requirements:
- CUDA 11
- Anaconda installed, please refer to the official installation guide: https://docs.anaconda.com/free/anaconda/install/index.html

1. Create a local clone of Phishpedia
```
```bash
git clone https://github.com/lindsey98/Phishpedia.git
```

2. Setup
```
cd Phishpedia/
```bash
chmod +x ./setup.sh
./setup.sh
```
If you encounter any problem in downloading the models, you can manually download them from here https://huggingface.co/Kelsey98/Phishpedia. And put them into the corresponding conda environment.

3.
```
conda activate myenv
conda activate phishpedia
```

Run in Python to test a single website
```python
from phishpedia.phishpedia_main import test
import matplotlib.pyplot as plt
from phishpedia.phishpedia_config import load_config

url = open("phishpedia/datasets/test_sites/accounts.g.cdcde.com/info.txt").read().strip()
screenshot_path = "phishpedia/datasets/test_sites/accounts.g.cdcde.com/shot.png"
ELE_MODEL, SIAMESE_THRE, SIAMESE_MODEL, LOGO_FEATS, LOGO_FILES, DOMAIN_MAP_PATH = load_config(None)

phish_category, pred_target, plotvis, siamese_conf, pred_boxes = test(url=url, screenshot_path=screenshot_path,
ELE_MODEL=ELE_MODEL,
SIAMESE_THRE=SIAMESE_THRE,
SIAMESE_MODEL=SIAMESE_MODEL,
LOGO_FEATS=LOGO_FEATS,
LOGO_FILES=LOGO_FILES,
DOMAIN_MAP_PATH=DOMAIN_MAP_PATH
)

print('Phishing (1) or Benign (0) ?', phish_category)
print('What is its targeted brand if it is a phishing ?', pred_target)
print('What is the siamese matching confidence ?', siamese_conf)
print('Where is the predicted logo (in [x_min, y_min, x_max, y_max])?', pred_boxes)
plt.imshow(plotvis[:, :, ::-1])
plt.title("Predicted screenshot with annotations")
plt.show()
4. Run in bash
```bash
python phishpedia.py --folder <folder you want to test e.g. ./datasets/test_sites>
```

Or run in bash
The testing folder should be in the structure of:

```
python run.py --folder <folder you want to test e.g. phishpedia/datasets/test_sites> --results <where you want to save the results e.g. test.txt>
test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
......
```

## Miscellaneous
- In our paper, we also implement several phishing detection and identification baselines, see [here](https://github.com/lindsey98/PhishingBaseline)
- The logo targetlist described in our paper includes 181 brands, we have further expanded the targetlist to include 277 brands in this code repository
- For the phish discovery experiment, we obtain feed from [Certstream phish_catcher](https://github.com/x0rz/phishing_catcher), we lower the score threshold to be 40 to process more suspicious websites, readers can refer to their repo for details
- We use Scrapy for website crawling [Repo here](https://github.com/lindsey98/MyScrapy.git)
- We use Scrapy for website crawling

## Citation
If you find our work useful in your research, please consider citing our paper by:
Expand Down
63 changes: 63 additions & 0 deletions configs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Global configuration
import subprocess
from typing import Union
import yaml
from logo_matching import cache_reference_list, load_model_weights
from logo_recog import config_rcnn
import os
import numpy as np

def get_absolute_path(relative_path):
base_path = os.path.dirname(__file__)
return os.path.abspath(os.path.join(base_path, relative_path))

def load_config(reload_targetlist=False):

with open(os.path.join(os.path.dirname(__file__), 'configs.yaml')) as file:
configs = yaml.load(file, Loader=yaml.FullLoader)

# Iterate through the configuration and update paths
for section, settings in configs.items():
for key, value in settings.items():
if 'PATH' in key and isinstance(value, str): # Check if the key indicates a path
absolute_path = get_absolute_path(value)
configs[section][key] = absolute_path

ELE_CFG_PATH = configs['ELE_MODEL']['CFG_PATH']
ELE_WEIGHTS_PATH = configs['ELE_MODEL']['WEIGHTS_PATH']
ELE_CONFIG_THRE = configs['ELE_MODEL']['DETECT_THRE']
ELE_MODEL = config_rcnn(ELE_CFG_PATH,
ELE_WEIGHTS_PATH,
conf_threshold=ELE_CONFIG_THRE)

# siamese model
SIAMESE_THRE = configs['SIAMESE_MODEL']['MATCH_THRE']

print('Load protected logo list')
targetlist_zip_path = configs['SIAMESE_MODEL']['TARGETLIST_PATH']
targetlist_dir = os.path.dirname(targetlist_zip_path)
zip_file_name = os.path.basename(targetlist_zip_path)
targetlist_folder = zip_file_name.split('.zip')[0]
full_targetlist_folder_dir = os.path.join(targetlist_dir, targetlist_folder)

if reload_targetlist or targetlist_zip_path.endswith('.zip') and not os.path.isdir(full_targetlist_folder_dir):
os.makedirs(full_targetlist_folder_dir, exist_ok=True)
subprocess.run(f'unzip -o "{targetlist_zip_path}" -d "{full_targetlist_folder_dir}"', shell=True)

SIAMESE_MODEL = load_model_weights( num_classes=configs['SIAMESE_MODEL']['NUM_CLASSES'],
weights_path=configs['SIAMESE_MODEL']['WEIGHTS_PATH'])

if reload_targetlist or (not os.path.exists(os.path.join(os.path.dirname(__file__), 'LOGO_FEATS.npy'))):
LOGO_FEATS, LOGO_FILES = cache_reference_list(model=SIAMESE_MODEL,
targetlist_path=full_targetlist_folder_dir)
print('Finish loading protected logo list')
np.save(os.path.join(os.path.dirname(__file__),'LOGO_FEATS.npy'), LOGO_FEATS)
np.save(os.path.join(os.path.dirname(__file__),'LOGO_FILES.npy'), LOGO_FILES)

else:
LOGO_FEATS, LOGO_FILES = np.load(os.path.join(os.path.dirname(__file__),'LOGO_FEATS.npy')), \
np.load(os.path.join(os.path.dirname(__file__),'LOGO_FILES.npy'))

DOMAIN_MAP_PATH = configs['SIAMESE_MODEL']['DOMAIN_MAP_PATH']

return ELE_MODEL, SIAMESE_THRE, SIAMESE_MODEL, LOGO_FEATS, LOGO_FILES, DOMAIN_MAP_PATH
11 changes: 11 additions & 0 deletions configs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
ELE_MODEL: # element recognition model -- logo only
CFG_PATH: models/faster_rcnn.yaml # os.path.join(os.path.dirname(__file__), xxx)
WEIGHTS_PATH: models/rcnn_bet365.pth
DETECT_THRE: 0.05

SIAMESE_MODEL:
NUM_CLASSES: 277 # number of brands, users don't need to modify this even the targetlist is expanded
MATCH_THRE: 0.87 # FIXME: threshold is 0.87 in phish-discovery?
WEIGHTS_PATH: models/resnetv2_rgb_new.pth.tar
TARGETLIST_PATH: models/expand_targetlist.zip
DOMAIN_MAP_PATH: models/domain_map.pkl
Binary file added datasets/.DS_Store
Binary file not shown.
Binary file added datasets/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added datasets/test_sites/.DS_Store
Binary file not shown.
Binary file not shown.
4,106 changes: 4,106 additions & 0 deletions datasets/test_sites/accounts.g.cdcde.com/html.txt

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions datasets/test_sites/accounts.g.cdcde.com/info.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://accounts.g.cdcde.com/ServiceLogin?passive=1209600&osid=1&continue=https://plus.g.cdcde.com/&followup=https://plus.g.cdcde.com/
Binary file added datasets/test_sites/accounts.g.cdcde.com/shot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 0f4adf9

Please sign in to comment.