Book Image

Web Scraping with Python

By : Richard Penman
Book Image

Web Scraping with Python

By: Richard Penman

Overview of this book

Table of Contents (16 chapters)

Analyzing a web page


To understand how a web page is structured, we can try examining the source code. In most web browsers, the source code of a web page can be viewed by right-clicking on the page and selecting the View page source option:

The data we are interested in is found in this part of the HTML:

<table>
<tr id="places_national_flag__row"><td class="w2p_fl"><label for="places_national_flag" id="places_national_flag__label">National Flag: </label></td><td class="w2p_fw"><img src="/places/static/images/flags/gb.png" /></td><td class="w2p_fc"></td></tr>
...
<tr id="places_neighbours__row"><td class="w2p_fl"><label for="places_neighbours" id="places_neighbours__label">Neighbours: </label></td><td class="w2p_fw"><div><a href="/iso/IE">IE </a></div></td><td class="w2p_fc"></td></tr></table>

This lack of whitespace and formatting...