You have seen an example of reading the HTML source code as a text vector in the Extracting unstructured text data from a plain web page recipe in this chapter. In this recipe, further processing is not straightforward because the output object contains plain text as well as HTML code tags. It is a time-consuming task to clean up the HTML tags from plain text.
In this recipe, you will read the same web page from the following link:
However, this time, you will use a different strategy so that you can play with HTML tags.