Book Image

Web Scraping with Python

By : Richard Penman
Book Image

Web Scraping with Python

By: Richard Penman

Overview of this book

Table of Contents (16 chapters)

Google search engine


According to the Alexa data used in Chapter 4, Concurrent Downloading, google.com is the world's most popular website, and conveniently, its structure is simple and straightforward to scrape.

Note

International Google

Google may redirect to a country-specific version, depending on your location. To use a consistent Google search wherever you are in the world, the international English version of Google can be loaded at http://www.google.com/ncr. Here, ncr stands for no country redirect.

Here is the Google search homepage loaded with Firebug to inspect the form:

We can see here that the search query is stored in an input with name q, and then the form is submitted to the path /search set by the action attribute. We can test this by doing a test search to submit the form, which would then be redirected to a URL like https://www.google.com/searchq=test&oq=test&es_sm=93&ie=UTF-8. The exact URL will depend on your browser and location. Also note that if you have...