-
Book Overview & Buying
-
Table Of Contents
Python Web Scraping
By :
To test the performance of concurrent downloading, it would be preferable to have a larger target website. For this reason, we will use the Alexa list in this chapter, which tracks the top 1 million most popular websites according to users who have installed the Alexa Toolbar. Only a small percentage of people use this browser plugin, so the data is not authoritative, but is fine for our purposes.
These top 1 million web pages can be browsed on the Alexa website at http://www.alexa.com/topsites. Additionally, a compressed spreadsheet of this list is available at http://s3.amazonaws.com/alexa-static/top-1m.csv.zip, so scraping Alexa is not necessary.
The Alexa list is provided in a spreadsheet with columns for the rank and domain:

Extracting this data requires a number of steps, as follows:
.zip file..zip file.Here is an implementation...
Change the font size
Change margin width
Change background colour