Book Image

Web Scraping with Python

By : Richard Penman
Book Image

Web Scraping with Python

By: Richard Penman

Overview of this book

Table of Contents (16 chapters)

Summary


This chapter covered two approaches to scrape data from dynamic web pages. It started with reverse engineering a dynamic web page with the help of Firebug Lite, and then moved on to using a browser renderer to trigger JavaScript events for us. We first used WebKit to build our own custom browser, and then reimplemented this scraper with the high-level Selenium framework.

A browser renderer can save the time needed to understand how the backend of a website works, however, there are disadvantages. Rendering a web page adds overhead and so is much slower than just downloading the HTML. Additionally, solutions using a browser renderer often require polling the web page to check whether the resulting HTML from an event has occurred yet, which is brittle and can easily fail when the network is slow. I typically use a browser renderer for short term solutions where the long term performance and reliability is less important; then for long term solutions, I make the effort to reverse engineer...