In the preceding chapter, we learned how to scrape data from crawled web pages and save the results to a spreadsheet. What if we now want to scrape an additional field, such as the flag URL? To scrape additional fields, we would need to download the entire website again. This is not a significant obstacle for our small example website. However, other websites can have millions of web pages that would take weeks to recrawl. The solution presented in this chapter is to cache all the crawled web pages so that they only need to be downloaded once.
Web Scraping with Python
By :
Web Scraping with Python
By:
Overview of this book
Table of Contents (16 chapters)
Web Scraping with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Introduction to Web Scraping
Scraping the Data
Caching Downloads
Concurrent Downloading
Dynamic Content
Interacting with Forms
Solving CAPTCHA
Scrapy
Index
Customer Reviews