-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Matplotlib for Python Developers
Often, the information we need is not distributed in an easy-to-use format such as XML or a database export but for example only on web sites.
More and more often we find interesting data on a web page, and in that case we have to parse it to extract that information: this is called web scraping.
In this example, we will parse a Wikipedia article to extracts some data to plot. The article is at http://it.wikipedia.org/wiki/Demografia_d'Italia and contains lots of information about Italian demography (it's in Italian because the English version lacks a lot of data); in particular, we are interested in the population evolution over the years.
Probably the best known Python module for web scraping is BeautifulSoup ( http://www.crummy.com/software/BeautifulSoup/). It's a really nice library that gets the job done quickly, but there are situations (in particular with JavaScript embedded in the web page, such as for Wikipedia) that prevent it from working.
As an alternative...