Book Image

Web Scraping with Python

By : Richard Penman
Book Image

Web Scraping with Python

By: Richard Penman

Overview of this book

Table of Contents (16 chapters)

Chapter 7. Solving CAPTCHA

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. As the acronym suggests, it is a test to determine whether the user is human or not. A typical CAPTCHA consists of distorted text, which a computer program will find difficult to interpret but a human can (hopefully) still read. Many websites use CAPTCHA to try and prevent bots from interacting with their website. For example, my bank website forces me to pass a CAPTCHA every time I log in, which is a pain. This chapter will cover how to solve a CAPTCHA automatically, first through Optical Character Recognition (OCR) and then with a CAPTCHA solving API.