The main function of a character filter is to convert the original input text into a stream of characters and then preprocess it before passing it as an input to the tokenizer. Three built-in character filters are supported: html_strip, mapping, and pattern_replace. We'll practice each one using the same input text string as in the previous section.
Character filters
The html_strip filter
This character filter removes the HTML tags (for more information about HTML tags and entities, you can refer to https://www.w3schools.com/html/default.asp). The HTML entities are replaced by the corresponding decoded UTF-8 characters. The contents stay the same by default, but the whole HTML comment will be removed. Let's suppose...