A web page is just a text file that contains some special markup elements (sometimes called HTML tags) intended to indicate to a web browser how the page should look when displayed to the user, for example, if we want a particular word to be displayed in a way that indicates emphasis, we can surround it with <em>
tags like this:
It is <em>very important</em>
that you follow these instructions.
All web pages have these same features; they are made up of text and the text may include tags. There are two main mental models we can employ to extract data from web pages. Both models have their useful aspects. In this section, we will describe the two structural models, and then in the next section, we will use three different tools for extracting data.