This is a project that using Machine Learning(CNN model) to break simple captcha. To make it easy, we choose Really Simple Captcha plugin, one of the most popular WordPress captcha plugin, as example.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software and how to install them
- Local WordPress environment to run the plugin
- WP CLI
- Install the plugins by using WP CLI within your WP folder
wp plugin install contact-form-7 --activate wp plugin install really-simple-captcha --activate
- Python 3.*
- Pip
A step by step series of examples that tell you how to get a development env running
- Clone the repo to your local server
git clone [email protected]:spiderPan/breaking-captcha.git
- Install requirements packages within the repo
cd breaking-captcha pip install -r requirments.txt
- Move
wp_prepare.php
into your WordPress Themecp wp_prepare.php LOCAL_WP/wp-content/themes/YOUR_THEME/
- Adding the following line to the begin of your WordPress theme's
functions.php
include wp_prepare.php
- Start generating captcha images by calling CLI
When it's done, all 20000 captcha images will be generated in
wp
wp-content/plugins/really-simple-captcha/tmp
- Copy all captchas into repo's
captcha_imgs
foldercp -r LOCAL_WP/wp-content/plugins/really-simple-captcha/tmp/*.png breaking-captcha/captcha_imgs/
Here is a break down in run.py
- Split captcha images into letters
captcha = Captcha() captcha.split_all_captchas()
- Train the CNN model
recognizer = Recognizer() recognizer.load_captcha_folder('./letter_imgs') recognizer.train_model()
- Make prediction for individual image
recognizer.predict_model(IMAGE_FILE)
- OR, instead of doing #3, repeat Installing #5 and coping all fresh captcha images into a new testing folder within this repo, for example,
captcha_test_imgs
and thenrecognizer.run_in_test_folder('./captcha_test_imgs')
If it's first time running it, just comment out #4 and run python run.py
If model already been trained, then comment out #2 and choose one of #3 and #4 to predict
Currently the CNN model can reach 99% accuracy within 10 epoch. There are two things I'm thinking to work on in the future
-
Since the project is using OpenCV to split letters from captcha images, sometimes captcha file is not correctly split into four letters, which is the bottleneck of recognitions.
-
Automate HTTP request to sending out email through contact form 7 site by using the model to break the captcha site.