Book Image

Learning Data Mining with Python

Book Image

Learning Data Mining with Python

Overview of this book

Table of Contents (20 chapters)
Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 8 – Beating CAPTCHAs with Neural Networks


Better (worse?) CAPTCHAs

http://scikit-image.org/docs/dev/auto_examples/applications/plot_geometric.html

The CAPTCHAs we beat in this example were not as complex as those normally used today. You can create more complex variants using a number of techniques as follows:

  • Applying different transformations such as the ones in scikit-image (see the link above)

  • Using different colors and colors that don't translate well to graeyscale

  • Adding lines or other shapes to the image: http://scikit-image.org/docs/dev/api/skimage.draw.html

Deeper networks

These techniques will probably fool our current implementation, so improvements will need to be made to make the method better. Try some of the deeper networks we used in Chapter 11, Classifying Objects in Images Using Deep Learning.

Larger networks need more data, though, so you will probably need to generate more than the few thousand samples we did in this chapter in order to get good performance. Generating these datasets is a good candidate for parallelization—lots of small tasks that can be performed independently.

Reinforcement learning

http://pybrain.org/docs/tutorial/reinforcement-learning.html

Reinforcement learning is gaining traction as the next big thing in data mining—although it has been around a long time! PyBrain has some reinforcement learning algorithms that are worth checking out with this dataset (and others!).