-
Book Overview & Buying
-
Table Of Contents
Getting Started with Beautiful Soup
By :
As already explained, every HTML/XML document will be written in a specific character set encoding, for example, UTF-8, and Latin-1. In an HTML page, this is represented using the meta tag as shown in the following example:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Beautiful Soup uses the UnicodeDammit library to automatically detect the encoding of the document. Beautiful Soup converts the content to Unicode while creating soup objects from the document.
Unicode is a character set, which is a list of characters with unique numbers. For example, in the Unicode character set, the number for the character B is 42. UTF-8 encoding is an algorithm that is used to convert these numbers into a binary representation.
In the previous example, Beautiful Soup converts the document to Unicode.
html_markup = """<p> The Spanish language is written using the Spanish alphabet, which is the Latin alphabet with one additional letter, eñe ⟨ñ⟩...
Change the font size
Change margin width
Change background colour