To support caching, the
download function developed in Chapter 1, Introduction to Web Scraping, needs to be modified to check the cache before downloading a URL. We also need to move throttling inside this function and only throttle when a download is made, and not when loading from a cache. To avoid the need to pass various parameters for every download, we will take this opportunity to refactor the
download function into a class, so that parameters can be set once in the constructor and reused multiple times. Here is the updated implementation to support this:
class Downloader: def __init__(self, delay=5, user_agent='wswp', proxies=None, num_retries=1, cache=None): self.throttle = Throttle(delay) self.user_agent = user_agent self.proxies = proxies self.num_retries = num_retries self.cache = cache def __call__(self, url): result = None if self.cache: try: ...