Book Image

Web Scraping with Python

By : Richard Penman
Book Image

Web Scraping with Python

By: Richard Penman

Overview of this book

Table of Contents (16 chapters)

An example dynamic web page


Let's look at an example dynamic web page. The example website has a search form, which is available at http://example.webscraping.com/search, that is used to locate countries. Let's say we want to find all the countries that begin with the letter A:

If we right-click on these results to inspect them with Firebug (as covered in Chapter 2, Scraping the Data), we would find that the results are stored within a div element of ID "results":

Let's try to extract these results using the lxml module, which was also covered in Chapter 2, Scraping the Data, and the Downloader class from Chapter 3, Caching Downloads:

>>> import lxml.html
>>> from downloader import Downloader
>>> D = Downloader() 
>>> html = D('http://example.webscraping.com/search')
>>> tree = lxml.html.fromstring(html)
>>> tree.cssselect('div#results a')
[]

The example scraper here has failed to extract results. Examining the source code of this web page...