Scraper would be a system of copying content of other websites using web scraping. First, we want to state a few of the things that we want to accomplish:
Downloading a web page
Parsing HTML
Cherry-picking attributes from the HTML
Saving the results
For a modern way to fetch content from the web, we will avoid the standard urllib
library and go directly with the nicer requests
library from the Python community.
For parsing and drilling into web pages, we'll use the almost de-facto library for this in the Python world—BeautifulSoup
.
Let's fetch these via pip
:
$ pip install requests beautifulsoup Requirement already satisfied (use --upgrade to upgrade): requests in /Library/Python/2.7/site-packages/requests-2.2.1-py2.7.egg Downloading/unpacking beautifulsoup Downloading BeautifulSoup-3.2.1.tar.gz Running setup.py (path:/private/var/folders/gw/xp4xsqt97957cc7hcgxd0w0c0000gn/T/pip_build_dotan/beautifulsoup/setup.py) egg_info for package beautifulsoup Installing collected...