Book Image

Web Scraping with Python

By : Richard Penman
Book Image

Web Scraping with Python

By: Richard Penman

Overview of this book

Table of Contents (16 chapters)

Sequential crawler


Here is the code to use AlexaCallback with the link crawler developed earlier to download sequentially:

scrape_callback = AlexaCallback()
link_crawler(seed_url=scrape_callback.seed_url, 
    cache_callback=MongoCache(),
    scrape_callback=scrape_callback)

This code is available at https://bitbucket.org/wswp/code/src/tip/chapter04/sequential_test.py and can be run from the command line as follows:

$ time python sequential_test.py
...
26m41.141s

This time is as expected for sequential downloading with an average of ~1.6 seconds per URL.