-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Natural Language Processing with Flair
By :
While Flair ships with several tokenizers that support the most commonly spoken languages, it is entirely possible you will be working with a language that uses tokenization rules currently not covered by Flair. Luckily, Flair offers a simple interface that allows us to implement our tokenizers or use third-party libraries.
The TokenizerWrapper class provides an easy interface for building custom tokenizers. To build one, you simply need to instantiate the class by passing the tokenizer_func parameter. The parameter is a function that receives the entire sentence text as input and returns a list of token strings.
As an exercise, let's try to implement a custom tokenizer that splits the text into characters. This tokenizer will treat every character as a token:
from flair.data import Token from flair.tokenization import TokenizerWrapper def char_splitter(sentence): return list(sentence...