CNN to break captcha

This is a project that using Machine Learning(CNN model) to break simple captcha. To make it easy, we choose Really Simple Captcha plugin, one of the most popular WordPress captcha plugin, as example.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

What things you need to install the software and how to install them

Local WordPress environment to run the plugin
WP CLI

Install the plugins by using WP CLI within your WP folder

wp plugin install contact-form-7 --activate
wp plugin install really-simple-captcha --activate

Python 3.*
Pip

Installing

A step by step series of examples that tell you how to get a development env running

Clone the repo to your local server

git clone [email protected]:spiderPan/breaking-captcha.git

Install requirements packages within the repo

cd breaking-captcha
pip install -r requirments.txt

Move wp_prepare.php into your WordPress Theme

cp wp_prepare.php LOCAL_WP/wp-content/themes/YOUR_THEME/

Adding the following line to the begin of your WordPress theme's functions.php
```
include wp_prepare.php
```
Start generating captcha images by calling CLI
```
wp
```
When it's done, all 20000 captcha images will be generated in wp-content/plugins/really-simple-captcha/tmp

Copy all captchas into repo's captcha_imgs folder

cp -r LOCAL_WP/wp-content/plugins/really-simple-captcha/tmp/*.png breaking-captcha/captcha_imgs/

Running the tests

Here is a break down in run.py

Split captcha images into letters

captcha = Captcha()
captcha.split_all_captchas()

Train the CNN model

recognizer = Recognizer()
recognizer.load_captcha_folder('./letter_imgs')
recognizer.train_model()

Make prediction for individual image
```
recognizer.predict_model(IMAGE_FILE)
```
OR, instead of doing #3, repeat Installing #5 and coping all fresh captcha images into a new testing folder within this repo, for example, captcha_test_imgs and then
```
recognizer.run_in_test_folder('./captcha_test_imgs')
```

If it's first time running it, just comment out #4 and run python run.py

If model already been trained, then comment out #2 and choose one of #3 and #4 to predict

Future Plan

Currently the CNN model can reach 99% accuracy within 10 epoch. There are two things I'm thinking to work on in the future

Since the project is using OpenCV to split letters from captcha images, sometimes captcha file is not correctly split into four letters, which is the bottleneck of recognitions.
Automate HTTP request to sending out email through contact form 7 site by using the model to break the captcha site.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
captcha_imgs		captcha_imgs
letter_imgs		letter_imgs
model		model
.gitignore		.gitignore
README.md		README.md
captcha.py		captcha.py
recognizer.py		recognizer.py
requirements.txt		requirements.txt
run.py		run.py
wp_prepare.php		wp_prepare.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNN to break captcha

Getting Started

Prerequisites

Installing

Running the tests

Future Plan

About

Releases

Packages

Languages

spiderPan/breaking-captcha

Folders and files

Latest commit

History

Repository files navigation

CNN to break captcha

Getting Started

Prerequisites

Installing

Running the tests

Future Plan

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages