For the example search web page, we were able to easily reverse engineer how it works. However, some websites will be very complex and difficult to understand, even with a tool like Firebug. For example, if the website has been built with Google Web Toolkit (GWT), the resulting JavaScript code will be machine-generated and minified. This generated JavaScript code can be cleaned with a tool such as JS beautifier
, but the result will be verbose and the original variable names will be lost, so it is difficult to work with. With enough effort, any website can be reverse engineered. However, this effort can be avoided by instead using a browser rendering engine, which is the part of the web browser that parses HTML, applies the CSS formatting, and executes JavaScript to display a web page as we expect. In this section, the WebKit rendering engine will be used, which has a convenient Python interface through the Qt framework.

Web Scraping with Python
By :

Web Scraping with Python
By:
Overview of this book
Table of Contents (16 chapters)
Web Scraping with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Introduction to Web Scraping
Scraping the Data
Caching Downloads
Concurrent Downloading
Dynamic Content
Interacting with Forms
Solving CAPTCHA
Scrapy
Index
Customer Reviews